|
|
An Improved Topic Model Integrating Extra-Features |
Ruyi Yang( ),Dongsu Liu,Hui Li |
School of Economics and Management, Xidian University, Xi’an 710126, China |
|
|
Abstract [Objective] In order to reveal the relationships between contents, topics and authors of documents, this paper presents the Dynamic Author Topic (DAT) model which extends LDA model. [Context] Extracting features from large-scale texts is an important job for informatics researchers. [Methods] Firstly, collect the NIPS conference papers as data set and make preprocessing with them. Then divide data set into parts by published time, which forms a first-order Markov-chain. Then use perplexity to ensure the number of topics. At last, use Gibbs sampling to estimate the author-topic and topic-words distributions in each time slice. [Results] The results of experiments show that the document is represented as probability distributions of topics-words and authors-topics. On the dimension of time, the revolution of authors and topics can be observed. [Conclusions] DAT model can integrate contents and extra-features efficiently and accomplish text mining.
|
Received: 17 July 2015
Published: 04 February 2016
|
[1] | 徐戈, 王厚峰.自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436. | [1] | (Xu Ge, Wang Houfeng.The Development of Topic in Natural Language Processing[J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.) | [2] | Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. | [3] | Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235. | [4] | Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35. | [5] | Li W, McCallum A. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations [C]. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 557-584. | [6] | Rosen-Zvi M, Griffths T, Steyvers M, et al.The Author Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494. | [7] | Wang X, McCallum A. Topic Over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2006: 424-433. | [8] | 李文波, 孙乐, 张大鲲. 基于Labeled-LDA模型的文本分类新算法[J].计算机学报, 2008, 31(4): 620-627. | [8] | (Li Wenbo, Sun Le, Zhang Dakun.Text Classification Based on Labeled-LDA Model[J]. Chinese Journal of Computers, 2008, 31(4): 620-627.) | [9] | 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6): 583-590. | [9] | (Wang Ping.Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.) | [10] | 胡吉明, 陈果. 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58(2): 138-142. | [10] | (Hu Jiming, Chen Guo.Mining and Evolution of Content Topics Based on Dynamic LDA[J]. Library and Information Service, 2014, 58(2): 138-142.) | [11] | Blei D M, Lafferty J D.Dynamic Topic Models [C]. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 113-120. | [12] | Azzopardi L, Girolami M, Van Risjbergen K, et al.Investigating the Relationship Between Language Model Perplexity and IR Precision-Recall Measure [C]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York: ACM Press, 2003: 369-370. | [13] | Minka T, Lafferty J.Expectation Propagation for the Generative Aspect Model [C]. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence.Morgan Kaufmann Publishers Inc., 2002: 352-359. | [14] | Teh Y W, Newman D, Welling M.A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation [C]. In: Proceedings of the Neural Information Processing Systems Conference.2006. | [15] | 史庆伟, 乔晓东, 徐硕, 等.作者主题演化模型及其在研究兴趣演化分析中的应用[J].情报学报, 2013, 32(9): 912-919. | [15] | (Shi Qingwei, Qiao Xiaodong, Xu Shuo, et al.Author-Topic Evolution Model and Its Application in Analysis of Research Interests Evolution[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 912-919.) |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|