Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (1): 104-117    DOI: 10.11925/infotech.2096-3467.2018.0394
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于LDA主题模型与链路预测的新兴主题关联机会发现研究*
刘俊婉(),龙志昕,王菲菲
北京工业大学经济与管理学院 北京 100022
Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction
Junwan Liu(),Zhixin Long,Feifei Wang
School of Economics and Management, Beijing University of Technology, Beijing 100022, China
全文: PDF(3869 KB)   HTML ( 5
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】对新兴主题关联机会的发现方法进行实验性研究, 提供一种有效的新兴主题关联机会发现方法。【方法】以深度学习研究领域发表的文献集合为研究对象, 通过LDA主题模型方法挖掘文献内在特征, 进而以主题为节点, 通过链路预测对新兴主题关联机会进行预测。【结果】深度学习研究领域主题共现网络的最优指标为AA指标; 未来深度学习领域的大数据分析研究最有可能与生物医疗领域主题研究及深度学习算法自身机理改进主题研究产生关联。【局限】链路预测方法对连通性较差的网络预测结果欠佳。【结论】利用主题模型与链路预测相结合的方法进行未来新兴主题关联机会发现具有一定的有效性与可靠性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘俊婉
龙志昕
王菲菲
关键词 新兴主题关联LDA主题模型链路预测    
Abstract

[Objective] This paper proposes a new method to discover collaboration opportunities from emerging issues. [Methods] We used literature corpus of deep learning as the research object. Firstly, we explored the intrinsic characteristics of these literature with the LDA topic model. Then, we calculated their weights, and used topics as nodes to build topic co-occurrence network. Finally, we applied link prediction to find the potential opportunities. [Results] The optimal index of topic co-occurrence network in deep learning was AA. The big data analysis research in deep learning were more likely associated with the biomedical studies and the improvement of related algorithms. [Limitations] Link prediction generated poor results for badly connected networks. [Conclusions] The LDA topic model and link prediction method could help us find new collaboration opportunities from emerging issues.

Key wordsEmerging Topic Association    LDA Topic Model    Link Prediction
收稿日期: 2018-04-09     
基金资助:*本文系国家自然科学基金青年项目“共生视角下的院士科学合作网络结构与演化趋势研究: 以中美两国科学院院士为例”(项目编号: 71603015)和北京市自然科学基金项目“基于技术共生网络结构探测和演化的新兴趋势识别研究”(项目编号: 9182001)的研究成果之一
引用本文:   
刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0394.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0394
[1] 田瑞强, 姚长青, 潘云涛. 关联文献的知识发现与创新研究进展[J]. 情报理论与实践, 2013, 36(8): 117-123.
[1] (Tian Ruiqiang, Yao Changqing, Pan Yuntao.Progress in Research on Literature-Related Discovery and Innovation[J]. Information Studies: Theory & Application, 2013, 36(8): 117-123.)
[2] 范云满, 马建霞, 曾苏. 基于知识图谱的领域新兴主题研究现状分析[J]. 情报杂志, 2013, 32(9): 88-94.
[2] (Fan Yunman, Ma Jianxia, Zeng Su.The Analysis for the Study of the Field Emerging Topic Based on Knowledge Mapping[J]. Journal of Intelligence, 2013, 32(9): 88-94.)
[3] Tu Y N, Seng J L.Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2):303-325.
[4] Small H.Tracking and Predicting Growth Areas in Science[J]. Scientometrics, 2006, 68(3): 595-610.
[5] Morris S A, Yen G, Wu Z, et al.Time Line Visualization of Research Fronts[J]. Journal of the American Society for Information Science and Technology, 2003, 54(5): 413-422.
[6] 张晗, 崔雷. 生物信息学的共词分析研究[J]. 情报学报, 2003, 22(5): 613-617.
[6] (Zhang Han, Cui Lei.Study of Bioinformatics Through Co-word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2003, 22(5): 613-617.)
[7] Chen C.CiteSpace II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3):359-377.
[8] 吴霞, 冷伏海. 基于文献的知识挖掘: 概念、关键技术与应用[OL]. .
[8] (Wu Xia, Leng Fuhai. Knowledge Mining Based on Document: Concept, Key Technology and Application[OL].
[9] 殷蜀梅. 判断新兴研究趋势的技术方法分析[J].情报科学, 2008, 26(4): 536-540.
[9] (Yin Shumei.Analysis of the Methods for Detecting Emerging Trend[J]. Information Science, 2008, 26(4): 536-540.)
[10] 靖继鹏, 马费成, 张向先. 情报科学理论[M]. 北京: 科学出版社, 2009.
[10] (Jing Jipeng, Ma Feicheng, Zhang Xiangxian.Information Science Theory[M]. Beijing: Science Press, 2009.)
[11] Glänzel W.Bibliometric Methods for Detecting and Analyzing Emerging Research Topics[J]. EI Professional de la Informacion, 2012, 21(2): 194-201.
[12] Guo H, Weingart S, Börner K.Mixed-indicators Model for Identifying Emerging Research Areas[J]. Scientometrics, 2011, 89(1):421-435.
[13] 黄鲁成, 唐月强, 吴菲菲, 等. 基于文献多属性测度的新兴主题识别方法研究[J]. 科学学与科学技术管理, 2015, 36(2): 34-43.
[13] (Huang Lucheng, Tang Yueqiang, Wu Feifei, et al.Research on Identification of Emerging Topics Based on Muti-Attribute Measurement of Literature[J]. Science of Science and Management of S.& T., 2015, 36(2): 34-43.)
[14] Saracevic T.Relevance: A Review of and a Framework for the Thinking on the Notion in Information Science[J].Journal of the American Society for Information Science, 1975, 26(6): 321-343.
[15] 雷雪, 侯人华, 曾建勋. 关联规则在领域知识推荐中的应用研究[J]. 情报理论与实践, 2014, 37(12): 67-70.
[15] (Lei Xue, Hou Renhua, Zeng Jianxun.Research on the Domain Knowledge Recommendation Based on Association Rules[J]. Information Studies: Theory & Application, 2014, 37(12): 67-70.)
[16] 高继平, 丁堃, 潘云涛, 等. 知识关联研究述评[J].情报理论与实践, 2015, 38(8): 135-140.
[16] (Gao Jiping, Ding Kun, Pan Yuntao, et al.Review of the Knowledge Interaction[J]. Information Studies: Theory & Application, 2015, 38(8): 135-140.)
[17] 吴一占. 基于Web知识关联挖掘的本体进化研究[D]. 南京: 南京航空航天大学, 2011.
[17] (Wu Yizhan.Research on Ontology Evolution Based on Web Knowledge Association Mining[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2011.)
[18] 文庭孝, 龚蛟腾, 张蕊, 等. 知识关联:内涵、特征与类型[J]. 图书馆, 2011(4): 32-35.
[18] (Wen Tingxiao, Gong Jiaoteng, Zhang Rui, et al.Knowledge Connection: Meaning, Characteristic and Type[J]. Library, 2011(4): 32-35.)
[19] 温有奎, 成鹏. 基于知识单元间隐含关联的知识发现[J]. 情报学报, 2007, 26(5): 653-658.
[19] (Wen Youkui, Cheng Peng.A New Knowledge Discover Based on Knowledge Element[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(5): 653-658.)
[20] 张玲玲, 周全亮, 唐广文, 等. 基于领域知识和聚类的关联规则深层知识发现研究[J]. 中国管理科学, 2015, 23(2): 154-161.
[20] (Zhang Lingling, Zhou Quanliang, Tang Guangwen, et al.Research on Algorithm of Post-processing Association Rules Based on Clustering and Domain Knowledge[J]. Chinese Journal of Management Science, 2015, 23(2): 154-161.)
[21] 郭秋萍, 梁梦丽, 刘秀丽, 等. 基于作者-关键词-引文多重共现的超网络知识关联研究[J]. 情报理论与实践, 2016, 39(7): 20-26.
[21] (Guo Qiuping, Liang Mengli, Liu Xiuli, et al.Research on the Knowledge Association of Super-network Based on the Multiple Co-occurrence of Author, Keywords and Citation[J]. Information Studies: Theory & Application, 2016, 39(7): 20-26.)
[22] Ramage D, Dumais S T, Liebling D J.Characterizing Microblogs with Topic Models[C]// Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010: 130-137.
[23] 张明慧, 王红玲, 周国栋. 基于LDA主题特征的自动文摘方法[J]. 计算机应用与软件, 2011, 28(10): 20-22.
[23] (Zhang Minghui, Wang Hongling, Zhou Guodong.An Automatic Summarization Approach Based on LDA Topic Feature[J]. Computer Applications and Software, 2011, 28(10): 20-22.)
[24] 邸亮, 杜永萍. LDA模型在微博用户推荐中的应用[J]. 计算机工程, 2014, 40(5): 1-6.
[24] (Di Liang, Du Yongping.Application of LDA Model in Microblog User Recommendation[J]. Computer Engineering, 2014, 40(5): 1-6.)
[25] Ding Y.Scientific Collaboration and Endorsement: Network Analysis of Coauthorship and Citation Networks[J]. Journal of Informetrics, 2011, 5(1): 187-203.
[26] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[27] Borgatti S P, Everett M G, Freeman L C.UCINET[A]// Alhajj R, Rokne J. Encyclopedia of Social Network Analysis & Mining[M]. 2014: 2261-2267.
[28] 刘宏鲲, 吕琳媛, 周涛. 利用链路预测推断网络演化机制[J]. 中国科学: 物理学力学天文学, 2011, 41(7): 816-823.
[28] (Liu Hongkun, Lü Linyuan, Zhou Tao.Uncovering the Network Evolution Mechanism by Link Prediction[J]. Scientia Sinica: Physica, Mechanica & Astronomica, 2011, 41(7): 816-823.)
[29] Liben-Nowell D, Kleinberg J.The Link Prediction Problem for Social Networks[C]// Proceedings of the 12th International Conference on Information and Knowledge Management. 2003: 556-559.
[30] Lü L, Zhou T.Link Prediction in Complex Networks: A Survey[J]. Physica A: Statistical Mechanics & Its Applications, 2010, 390(6): 1150-1170.
[31] Liu W, Lü L. Link Prediction Based on Local Random Walk[J]. Europhysics Letters, 2010, 89(5): Article No. 58007.
[32] 刘竟, 孙薇. 基于链路预测的潜在科研合作关系发现研究[J]. 情报理论与实践, 2017, 40(7): 88-92, 121.
[32] (Liu Jing, Sun Wei.Discovery of Potential Scientific and Technical Collaborative Relationship Based on Link Prediction[J]. Information Studies: Theory & Application, 2017, 40(7): 88-92, 121.)
[33] Lorrain F, White H C.Structural Equivalence of Individuals in Social Networks[J]. Social Networks, 1971, 1(1): 49-80.
[34] Salton G, Mcgill M J.Introduction to Modern Information Retrieval[M]. Auckland: MuGraw-Hill, 1986.
[35] Jaccard P.Etude Comparative de la Distribution Florale Dans Une Portion des Alpes et des Jura[J]. Bulletin de la Société Vaudoise des Science Naturelles, 1901, 37: 547-579.
[36] Sorensen T.A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons[J]. Biologiske Skrifter, 1948, 5(4): 1-34.
[37] Ravasz E, Somera A L, Mongru D A, et al.Hierarchical Organization of Modularity in Metabolic Networks[J]. Science, 2002, 297(5586): 1553-1555.
[38] Zhou T, Lü L, Zhang Y C.Predicting Missing Links via Local Information[J]. The European Physical Journal B, 2009, 71(4): 623-630.
[39] Leicht E A, Holme P, Newman M E. Vertex Similarity in Networks[J]. Physical Review E, Statistical, Nonlinear & Soft Matter Physics, 2006, 73(2): Article No. 026120.
[40] Barabasi A-L, Albert R.Emergence of Scaling in Random Networks[J]. Science, 1999, 286(5439): 509-512.
[41] Adamic L A, Adar E.Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
[42] Sun D, Zhou T, Liu J G, et al. Information Filtering Based on Transferring Similarity[J]. Physical Review E, Statistical, Nonlinear & Soft Matter Physics, 2009, 80(1): Article No. 017101.
[43] 吕琳媛, 周涛. 链路预测[M]. 北京: 高等教育出版社, 2013.
[43] (Lü Linyuan, Zhou Tao.Link Prediction[M]. Beijing: Higher Education Press, 2013.)
[44] Lü L, Jin C H, Zhou T. Similarity Index Based on Local Paths for Link Prediction of Complex Networks[J]. Physical Review E, Statistical, Nonlinear & Soft Matter Physics, 2009, 80(4): Article No. 046122.
[45] Katz L.A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1): 39-43.
[46] Klein D J, Randic M, Resistance Distance[J]. Journal of Mathematical Chemistry,1993, 12(1): 81-95.
[47] Fouss F, Pirotte A, Renders J M, et al.Random-Walk Computation of Similarities Between Nodes of a Graph with Application to Collaborative Recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 355-369.
[48] Jeh G, Widom J.SimRank: A Measure of Structural-context Similarity[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2002: 538-543.
[49] Chebotarev P, Shamis E.The Matrix-Forest Theorem and Measuring Relations in Small Social Groups[J]. Automation & Remote Control, 2006, 58(9): 1505-1514.
[50] Hinton G E, Osindero S, Teh Y W.A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation, 2006, 18(7): 1527-1554.
[51] 2017全国深度学习技术应用大会回顾[OL]. [2018-03-09]..
[51] (Review of the 2017 National Conference on Application of Deep Learning Technology[OL]. [2018-03-09]. .)
[52] Wei X, Bruce Croft W.LDA-Based Document Models for Ad-hoc Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006: 178-185.
[53] Liu Z, Zhang Q M, Lü L, et al. Link Prediction in Complex Networks: A Local Naive Bayes Model[J]. Europhysics Letters, 2011, 96(4): Article No. 48007.
[1] 席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[2] 张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[3] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[4] 王丽,邹丽雪,刘细文. 基于LDA主题模型的文献关联分析及可视化研究[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[5] 李贺,祝琳琳,闫敏,刘金承,洪闯. 开放式创新社区用户信息有用性识别研究*[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[6] 曲佳彬,欧石燕. 基于主题过滤与主题关联的学科主题演化分析*[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[7] 吕伟民,王小梅,韩涛. 结合链路预测和ET机器学习的科研合作推荐方法研究*[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[8] 关鹏,王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究*[J]. 现代图书情报技术, 2016, 32(9): 42-50.
[9] 张群, 王红军, 王伦文. 词向量与LDA相融合的短文本分类方法*[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
[10] 魏静,朱恒民,宋瑞晓,蒋世兵. 个体视角下的网络舆情传递链路预测分析*[J]. 现代图书情报技术, 2016, 32(1): 55-64.
[11] 卓可秋, 虞为, 苏新宁. 突发事件检测的MapReduce并行化实现[J]. 现代图书情报技术, 2015, 31(2): 46-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn