[Objective] This paper builds a co-topics network to analyze the relationship among the topics of research articles and then optimize terms representing these topics. [Methods] First, we transformed the “document-topics” bipartite Graph to co-topics networks in accordance with weighted projection rules. Second, we identified the key topics with the combination of betweenness centrality and topic probability. Third, we divided the co-topics network community with the GN algorithm. Finally we optimized topic terms with relevance method. [Results] We compared the co-topics networks and the K-means based on JSD by testing optimal topic number (28) and random subjective topic numbers(20, 30). Their clustering numbers were the same and the consistent degree of clustering content reached 100%, 95% and 87%. [Limitations] We did not include other community partition methods with the proposed co-topics networks. [Conclusions] The co-topics network meets the demands of high-dimensional data and identifies the key topics and the closely linked topics of the target documents.
钮亮. 共主题网络方法及应用*[J]. 现代图书情报技术, 2016, 32(7-8): 137-146.
Niu Liang. New Research and Application with Co-topics Network. New Technology of Library and Information Service, 2016, 32(7-8): 137-146.
(Tang Guoyuan, Zhang Wei.Development and Analysis of Co-word Analysis Method at Home and Abroad[J]. Library and Information Service, 2014, 58(22): 138-145.)
[2]
Blei D M, Ng A Y, Jordan M I, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[3]
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(1): 5228-5235.
[4]
Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the American Society for Information Science and Technology, 2011, 62(1): 85-204.
[5]
Rosen-Zvi M, Griffths T, Steyvers M, et al.The Author-topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Arti?cial Intelligence. 2004.
[6]
Wang X, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[7]
Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
[8]
Mimno D.Computational Historiography: Data Mining in a Century of Classics Journals[J]. Journal on Computing and Cultural Heritage, 2012, 5(1): 1-19.
[9]
Sievert C, Shirley K E.LDAvis: A Method for Visualizing and Interpreting Topics [C]. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 2014.
[10]
Zhang H, Qiu B, Giles C L, et al.An LDA-based Community Structure Discovery Approach for Large-scale Social Networks [C]. In: Proceedings of the 2007 IEEE International Conference on Intelligence and Security Informatics. 2007.
[11]
Wang X, Zhang K, Jin X, et al.Mining Common Topics from Multiple Asynchronous Text Streams[C]. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009.
[12]
Newman D, Asuncion A, Smyth P, et al.Distributed Algorithms for Topic Models[J]. Journal of Machine Learning Research, 2009, 10(12): 1801-1828.
[13]
Gretarsson B, O’Donovan J, Bostandjiev S, et al. TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling[J]. Transactions on Intelligent Systems & Technology, 2012, 3(2): 565-582.
[14]
He Q, Chen B, Pei J, et al.Detecting Topic Evolution in Scientific Literature: How Can Citations Help? [C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009.
[15]
Cha Y, Cho J.Social-network Analysis Using Topic Models [C]. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2012.
[16]
Li D, He B, Ding Y, et al.Community-based Topic Modeling for Social Tagging [C]. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010.
[17]
Chuang J, Ramage D, Manning C D, et al.Interpretation and Trust: Designing Model-Driven Visualizations for Text Analysis [C]. In: Proceedings of the 2012 SIGCHI Conference on Human Factors in Computing Systems. 2012: 443-452.
[18]
Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models [C]. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 2008.
[19]
Chang J, Boyd-Graber J, Wang C, et al. Reading Tea Leaves: How Humans Interpret Topic Models [R]. Advances in Neural Information Processing Systems 22 (NIPS2009).
[20]
Mimno D, Wallach H M, Talley M, et al.Optimizing Semantic Coherence in Topic Models [C]. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011.
[21]
Latapy M, Magnien C, Del Vecchio N.Basic Notions for the Analysis of Large Two-mode Networks[J]. Social Networks, 2008, 30(1): 31-48.
[22]
Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
[23]
Zhou T, Ren J, Medo M, et al.Bipartite Network Projection and Personal Recommendation[J]. Physical Review E, 2007, 76(4): 046115.
(Ren Xiaolong, Lv Linyuan.Review of Ranking Nodes in Complex Networks[J]. Chinese Science Bulletin, 2014, 59(13): 1175-1197.)
[25]
Newman M E J. Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E, 2004, 69(6): 066133.
[26]
Girvan M, Newman M.Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences, 2002, 99(12): 7821-7826.
[27]
Clauset A, Newman M E J, Moore C. Finding Community Structure in Very Large Networks[J]. Phyisical Review E, 2004, 70(6): 066111.
[28]
Newman M E J. Modularity and Community Structure in Networks [OL].ArXiv: physics/0602124v1.
[29]
Brandes U, Delling D, Gaertler M, et al.Maximizing Modularity is Hard [OL]. arXiv: Physics/0608255.
[30]
Taddy M A.On Estimation and Selection for Topic Models [C]. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics.2015.
[31]
Bischof J M, Airoldi E M.Summarizing Topical Content with Word Frequency and Exclusivity [C]. In: Proceedings of the 29th International Conference on Machine Learning. Omnipress. 2012.
[32]
Arun R, Suresh V, Madhavan V C E, et al. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations [A]. // Advances in Knowledge Discovery and Data Mining[M]. Springer Berlin Heidelberg, 2010: 391-402.
[33]
Cao J, Xia T, Li J, et al.A Density-based Method for Adaptive LDA Model Selection[J]. Neurocomputing, 2008, 72(7-9): 1775-1781.
[34]
Deveaud R, SanJuan E, Bellot P. Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval[J]. Document Numérique, 2014, 17(1): 61-84.
[35]
Kim D, Oh A.Topic Chains for Understanding a News Corpus [C]. In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing.2011.
(Zhu Lianjiang, Ma Bingxian, Zhao Xuequan.Clusting Validity Analysis Based on Silhouette Coefficient[J]. Journal of Computer Application, 2010, 32(S2): 139-141.)
(Wang Xiaoguang.Formation and Evolution of Science Knowledge Network (I): A New Research Method Based on Co-word Network[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(4): 599-605.)