共主题网络方法及应用<sup>*</sup>

doi:10.11925/infotech.1003-3513.2016.07.17

现代图书情报技术

2016, Vol. 32

Issue (7-8): 137-146 https://doi.org/10.11925/infotech.1003-3513.2016.07.17

本期目录 | 过刊浏览 | 高级检索

共主题网络方法及应用^*

钮亮(

)

中国计量大学经济与管理学院杭州 310018

New Research and Application with Co-topics Network

Niu Liang(

)

School of Economics & Management, China Jiliang University, Hangzhou 310018, China

摘要
参考文献
补充材料
相关文章
Metrics

全文: PDF (2567 KB) HTML ( 56 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】通过构建共主题网络, 对主题之间的关系进行分析, 优化主题包含的词项。【方法】将“文档-主题”二分图依照加权投影规则生成共主题网络, 使用介数中心性和主题概率结合的方法测度共主题网络中重点主题, 通过GN算法对主题网络进行社区分割, 使用相关度方法优化主题词项。【结果】将共主题网络与基于JSD的K-means方法进行比较发现, 两者在三种主题数(最优主题数28和随机主观主题数20, 30)测试下产生的聚类数目都相同, 聚类内容的一致程度分别达到100%、95%、87%。【局限】其他社区分割方法共主题网络未能全面涉及。【结论】共主题网络照顾到了高维数据的需要, 能够探查出文档中哪些主题是重要主题, 哪些主题联系紧密。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	钮亮

关键词 ：共主题网络, LDA, 社区分割, K-means

Abstract：

[Objective] This paper builds a co-topics network to analyze the relationship among the topics of research articles and then optimize terms representing these topics. [Methods] First, we transformed the “document-topics” bipartite Graph to co-topics networks in accordance with weighted projection rules. Second, we identified the key topics with the combination of betweenness centrality and topic probability. Third, we divided the co-topics network community with the GN algorithm. Finally we optimized topic terms with relevance method. [Results] We compared the co-topics networks and the K-means based on JSD by testing optimal topic number (28) and random subjective topic numbers(20, 30). Their clustering numbers were the same and the consistent degree of clustering content reached 100%, 95% and 87%. [Limitations] We did not include other community partition methods with the proposed co-topics networks. [Conclusions] The co-topics network meets the demands of high-dimensional data and identifies the key topics and the closely linked topics of the target documents.

Key words： Co-Topics network LDA Community partition K-means

收稿日期: 2016-03-09 出版日期: 2016-09-29

基金资助:*本文系国家自然科学基金项目“碳排放规则下供应链成员企业行为及网络均衡协调研究”(项目编号: 71402173)、浙江省高校人文社会科学重点研究基地“决策科学与创新管理”项目“物流配送VRP模型、算法及其在GIS中的应用研究”(项目编号: RWSKZD03-201207)和浙江省产业发展政策研究中心、浙江省标准化与知识产权管理研究基地项目“FDI视角下浙江省物流产业竞争力的提升策略研究”(项目编号: SIPM3222)的研究成果之一

引用本文:

钮亮. 共主题网络方法及应用^*[J]. 现代图书情报技术, 2016, 32(7-8): 137-146.
Niu Liang. New Research and Application with Co-topics Network. New Technology of Library and Information Service, 2016, 32(7-8): 137-146.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.07.17 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I7-8/137

[1]	唐果媛, 张薇. 国内外共词分析法研究的发展与分析[J]. 图书情报工作, 2014, 58(22): 138-145.
[1]	(Tang Guoyuan, Zhang Wei.Development and Analysis of Co-word Analysis Method at Home and Abroad[J]. Library and Information Service, 2014, 58(22): 138-145.)
[2]	Blei D M, Ng A Y, Jordan M I, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[3]	Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(1): 5228-5235.
[4]	Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the American Society for Information Science and Technology, 2011, 62(1): 85-204.
[5]	Rosen-Zvi M, Griffths T, Steyvers M, et al.The Author-topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Arti?cial Intelligence. 2004.
[6]	Wang X, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[7]	Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
[8]	Mimno D.Computational Historiography: Data Mining in a Century of Classics Journals[J]. Journal on Computing and Cultural Heritage, 2012, 5(1): 1-19.
[9]	Sievert C, Shirley K E.LDAvis: A Method for Visualizing and Interpreting Topics [C]. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 2014.
[10]	Zhang H, Qiu B, Giles C L, et al.An LDA-based Community Structure Discovery Approach for Large-scale Social Networks [C]. In: Proceedings of the 2007 IEEE International Conference on Intelligence and Security Informatics. 2007.
[11]	Wang X, Zhang K, Jin X, et al.Mining Common Topics from Multiple Asynchronous Text Streams[C]. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009.
[12]	Newman D, Asuncion A, Smyth P, et al.Distributed Algorithms for Topic Models[J]. Journal of Machine Learning Research, 2009, 10(12): 1801-1828.
[13]	Gretarsson B, O’Donovan J, Bostandjiev S, et al. TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling[J]. Transactions on Intelligent Systems & Technology, 2012, 3(2): 565-582.
[14]	He Q, Chen B, Pei J, et al.Detecting Topic Evolution in Scientific Literature: How Can Citations Help? [C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009.
[15]	Cha Y, Cho J.Social-network Analysis Using Topic Models [C]. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2012.
[16]	Li D, He B, Ding Y, et al.Community-based Topic Modeling for Social Tagging [C]. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010.
[17]	Chuang J, Ramage D, Manning C D, et al.Interpretation and Trust: Designing Model-Driven Visualizations for Text Analysis [C]. In: Proceedings of the 2012 SIGCHI Conference on Human Factors in Computing Systems. 2012: 443-452.
[18]	Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models [C]. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 2008.
[19]	Chang J, Boyd-Graber J, Wang C, et al. Reading Tea Leaves: How Humans Interpret Topic Models [R]. Advances in Neural Information Processing Systems 22 (NIPS2009).
[20]	Mimno D, Wallach H M, Talley M, et al.Optimizing Semantic Coherence in Topic Models [C]. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011.
[21]	Latapy M, Magnien C, Del Vecchio N.Basic Notions for the Analysis of Large Two-mode Networks[J]. Social Networks, 2008, 30(1): 31-48.
[22]	Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
[23]	Zhou T, Ren J, Medo M, et al.Bipartite Network Projection and Personal Recommendation[J]. Physical Review E, 2007, 76(4): 046115.
[24]	任晓龙, 吕琳媛. 网络重要节点排序方法综述[J]. 科学通报, 2014, 59(13): 1175-1197.
[24]	(Ren Xiaolong, Lv Linyuan.Review of Ranking Nodes in Complex Networks[J]. Chinese Science Bulletin, 2014, 59(13): 1175-1197.)
[25]	Newman M E J. Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E, 2004, 69(6): 066133.
[26]	Girvan M, Newman M.Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences, 2002, 99(12): 7821-7826.
[27]	Clauset A, Newman M E J, Moore C. Finding Community Structure in Very Large Networks[J]. Phyisical Review E, 2004, 70(6): 066111.
[28]	Newman M E J. Modularity and Community Structure in Networks [OL].ArXiv: physics/0602124v1.
[29]	Brandes U, Delling D, Gaertler M, et al.Maximizing Modularity is Hard [OL]. arXiv: Physics/0608255.
[30]	Taddy M A.On Estimation and Selection for Topic Models [C]. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics.2015.
[31]	Bischof J M, Airoldi E M.Summarizing Topical Content with Word Frequency and Exclusivity [C]. In: Proceedings of the 29th International Conference on Machine Learning. Omnipress. 2012.
[32]	Arun R, Suresh V, Madhavan V C E, et al. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations [A]. // Advances in Knowledge Discovery and Data Mining[M]. Springer Berlin Heidelberg, 2010: 391-402.
[33]	Cao J, Xia T, Li J, et al.A Density-based Method for Adaptive LDA Model Selection[J]. Neurocomputing, 2008, 72(7-9): 1775-1781.
[34]	Deveaud R, SanJuan E, Bellot P. Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval[J]. Document Numérique, 2014, 17(1): 61-84.
[35]	Kim D, Oh A.Topic Chains for Understanding a News Corpus [C]. In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing.2011.
[36]	朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 32(S2): 139-141.
[36]	(Zhu Lianjiang, Ma Bingxian, Zhao Xuequan.Clusting Validity Analysis Based on Silhouette Coefficient[J]. Journal of Computer Application, 2010, 32(S2): 139-141.)
[37]	王晓光. 科学知识网络的形成与演化(I): 共词网络方法的提出[J]. 情报学报, 2009, 28(4): 599-605.
[37]	(Wang Xiaoguang.Formation and Evolution of Science Knowledge Network (I): A New Research Method Based on Co-word Network[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(4): 599-605.)

[1]		Download
[2]		Download
[3]		Download
[4]		Download

[1]	李跃艳,王昊,邓三鸿,王伟. 近十年信息检索领域的研究热点与演化趋势研究——基于SIGIR会议论文的分析[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[2]	伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3]	王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析^*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[4]	蔡永明,刘璐,王科唯. 网络虚拟学习社区重要用户与核心主题联合分析*[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[5]	叶光辉,曾杰妍,胡婧岚,毕崇武. 城市画像视角下的社会公众情感演化研究*[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[6]	潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[7]	刘玉文,王凯. 面向地域的网络话题识别方法^*[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[8]	叶光辉,徐彤,毕崇武,李心悦. 基于多维度特征与LDA模型的城市旅游画像演化分析*[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[9]	黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[10]	王晰巍,张柳,黄博,韦雅楠. *基于LDA的微博用户主题图谱构建及实证研究^——以“埃航空难”为例**[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[11]	邵云飞,刘东苏. 基于类别特征扩展的短文本分类方法研究 ^*[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[12]	孙明珠,马静,钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究 ^*[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[13]	夏立新,曾杰妍,毕崇武,叶光辉. 基于LDA主题模型的用户兴趣层级演化研究 ^*[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[14]	关鹏,王曰芬,傅柱. *基于LDA的主题语义演化分析方法研究 ^ ——以锂离子电池领域为例**[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[15]	温廷新,李洋子,孙静霜. 基于多因素特征选择与AFOA/K-means的新闻热点发现方法^*[J]. 数据分析与知识发现, 2019, 3(4): 97-106.

Viewed

Full text

Abstract

Cited

Shared

Discussed