[Objective]This paper aims to improve the effectiveness of extracting Chinese literature topics with the help of LDA model and co-word network analysis. [Methods] First, we added keywords to the word segmentation dictionary for the abstracts, which improved the semantic recognition of topic analysis. Second, we proposed a Latent Dirichlet Allocation Model with Co-word Analysis (CA-LDA) to control the topic distribution generated by the weight of co-word network topology parameters (i.e. Betweenness Centrality). Finally, we extracted the words with high connectivity (Betweenness Centrality) and frequency. [Results] The CA-LDA model retrieved high frequency and high connectivity words simultaneously, which were important for subject analysis. The proposed algorithm could also identify key node technical vocabularies with the help of co-word analysis. [Limitations] The K value (number of topics) was obtained by cross validation with perplexity. Thus, it was difficult to classify the document topics with larger K value. More research is needed to deal with this issue. [Conclusions] The proposed model effectively analyzes the topics of Chinese literature on transportation laws, which could also process literature data from other fields automatically.
马红, 蔡永明. 共词网络LDA模型的中文文本主题分析: 以交通法学文献(2000-2016)为例*[J]. 数据分析与知识发现, 2016, 32(12): 17-26.
Hong Ma, Yongming Cai. A CA-LDA Model for Chinese Topic Analysis: Case Study of Transportation Law Literature. Data Analysis and Knowledge Discovery, DOI：10.11925/infotech.1003-3513.2016.12.03.
(Pang Jianfeng, Bu Dongbo, Bai Shuo.Research and Implementation of Text Categorization System Based on VSM[J]. Application Research of Computers, 2001, 27(9): 23-26.)
Porteous I, Newman D, Ihler A, et al.Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation [C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2008: 569-577.
Newman D, Asuncion A, Smyth P, et al.Distributed Inference for Latent Dirichlet Allocation [C]. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems. 2007: 1081-1088.
Asuncion A U,Smyth P, Welling M.Asynchronous Distributed Learning of Topic Models [C]. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems.2008: 81-88.
Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
Sato I, Nakagawa H.Topic Models with Power-law Using Pitman-Yor Process [C]. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2010: 673-682.
Teh Y W.Dirichlet Process [A]. //Sammut C, Webb G I. Encyclopedia of Machine Learning[M]. Springer US, 2011: 280-287.
Callon M, Courtial J P, Turner W, et al.From Translations to Problematic Networks: An Introduction to Co-word Analysis[J]. Social Science Information, 1983, 22(2): 191-235.
Callon M, Courtial J P, Laville F.Co-word Analysis as a Tool for Describing the Network of Interactions Between Basic and Technological Research: The Case of Polymer Chemsitry[J]. Scientometrics, 1991, 22(1): 155-205.
Coulter N, Monarch I, Konda S.Software Engineering as Seen Through Its Research Literature: A Study in Co-word Analysis[J]. Journal of the American Society for Information Science, 1998, 49(13): 1206-1223.
(Zhang Xiaodong, Zhou Hongli, Hu Yang, et al.Research Hotspots of Computer Integrated Manufacturing of China Based on Co-word Analysis and Social Network Analysis[J]. Science and Technology Management Research, 2016(11): 145-149.)
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
Newman D, Bonilla E V, Buntine W.Improving Topic Coherence with Regularized Topic Models [C]. In: Proceedings of the 24th International Conference on Neural Information Processing Systems.2011: 496-504.
Jordan M I, Ghahramani Z, Jaakkola T S, et al.An Introduction to Variational Methods for Graphical Models[J]. Machine Learning, 1999, 37(2): 183-233.
Hoffman M, Blei D, Wang C, et al.Stochastic Variational Inference[J]. Journal of Machine Learning Research, 2013, 14(1): 1303-1347.
Brandes U.A Faster Algorithm for Betweenness Centrality[J]. Journal of Mathematical Sociology, 2001, 25(2): 163-177.
Newman M E J. The Structure and Function of Complex Networks[J]. SIAM Review, 2003, 45(2): 167-256.