[Objective] In order to reveal the relationships between contents, topics and authors of documents, this paper presents the Dynamic Author Topic (DAT) model which extends LDA model. [Context] Extracting features from large-scale texts is an important job for informatics researchers. [Methods] Firstly, collect the NIPS conference papers as data set and make preprocessing with them. Then divide data set into parts by published time, which forms a first-order Markov-chain. Then use perplexity to ensure the number of topics. At last, use Gibbs sampling to estimate the author-topic and topic-words distributions in each time slice. [Results] The results of experiments show that the document is represented as probability distributions of topics-words and authors-topics. On the dimension of time, the revolution of authors and topics can be observed. [Conclusions] DAT model can integrate contents and extra-features efficiently and accomplish text mining.
杨如意,刘东苏,李慧. 一种融合外部特征的改进主题模型*[J]. 现代图书情报技术, 2016, 32(1): 48-54.
Ruyi Yang,Dongsu Liu,Hui Li. An Improved Topic Model Integrating Extra-Features. New Technology of Library and Information Service, 2016, 32(1): 48-54.
(Xu Ge, Wang Houfeng.The Development of Topic in Natural Language Processing[J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.)
[2]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[3]
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235.
[4]
Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
[5]
Li W, McCallum A. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations [C]. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 557-584.
[6]
Rosen-Zvi M, Griffths T, Steyvers M, et al.The Author Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494.
[7]
Wang X, McCallum A. Topic Over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2006: 424-433.
(Wang Ping.Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.)
(Hu Jiming, Chen Guo.Mining and Evolution of Content Topics Based on Dynamic LDA[J]. Library and Information Service, 2014, 58(2): 138-142.)
[11]
Blei D M, Lafferty J D.Dynamic Topic Models [C]. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 113-120.
[12]
Azzopardi L, Girolami M, Van Risjbergen K, et al.Investigating the Relationship Between Language Model Perplexity and IR Precision-Recall Measure [C]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York: ACM Press, 2003: 369-370.
[13]
Minka T, Lafferty J.Expectation Propagation for the Generative Aspect Model [C]. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence.Morgan Kaufmann Publishers Inc., 2002: 352-359.
[14]
Teh Y W, Newman D, Welling M.A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation [C]. In: Proceedings of the Neural Information Processing Systems Conference.2006.
(Shi Qingwei, Qiao Xiaodong, Xu Shuo, et al.Author-Topic Evolution Model and Its Application in Analysis of Research Interests Evolution[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 912-919.)