Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (4): 38-45     https://doi.org/10.11925/infotech.2096-3467.2017.04.05
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
结合链路预测和ET机器学习的科研合作推荐方法研究*
吕伟民1,2, 王小梅3(), 韩涛1
1中国科学院文献情报中心 北京 100190
2中国科学院大学 北京 100049
3中国科学院科技战略咨询研究院 北京 100190
Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm
Lv Weimin1,2, Wang Xiaomei3(), Han Tao1
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (525 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

目的】结合链路预测与机器学习, 提出推荐未来科研合作的新方法, 以提高单独基于链路预测方法的推荐精确度。【方法】构建加权作者合作网, 以不同的链路预测指标作为特征输入, 运用极端随机树(Extremely Randomized Trees, ET)机器学习算法训练分类, 并利用遍历算法求取分类结果的最优权重组合, 选取TOP准确度的预测作为合作推荐结果。【结果】选取纳米科技领域2008年-2010年SCI论文数据进行实证。在城市合作推荐中, 改进的ET方法优于已有方法, 有良好的推荐成功率; 预测方法受网络结构等因素影响较小, 适用范围更广泛。【局限】科研合作受合作动机、地域、语言等诸多因素影响, 加权作者合作网没有反映在一篇论文中同城市、同机构的多个作者, 也没有反映上述因素。【结论】改进算法能够比单个预测指标产生更准确的合作推荐建议, 也为推广到大学等机构、个人等更微观的应用层面提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吕伟民
王小梅
韩涛
关键词 科研合作网络链路预测机器学习随机森林极端随机树推荐    
Abstract

[Objective] This paper proposes a method to recommend scientific research collaborators based on link prediction and machine learning, which improves the precision of traditional method. [Methods] First, we used Link Prediction Algorithm index to build the feature input, and adopted the Extremely Randomized Trees Algorithm to train the classifier. Then, we obtained the optimal weight combination with the traversal algorithm to combine the classification results linearly. Finally, we received the best recommendation of collaborators. [Results] The improved ET method had better performance than the existing ones in recommending the collaboration cities. Besides, the proposed method was less affected by factors such as the network structure, and could be used with more applications. [Limitations] Scientific research collaboration is affected by the cooperation motivation, geographical, language and many other factors. The weighted author network did not examine authors from the same cities or with the same organizations. [Conclusions] The propsoed method could produce better recommendation results, which might help universities, institutions and individuals identify academic collabortors.

Key wordsScientific Research Collaboration Network    Link Prediction    Machine Learning    Random Forest    Extremely Randomized Trees    Recommendation
收稿日期: 2017-01-16      出版日期: 2017-05-24
ZTFLH:  G350  
基金资助:*本文系国家自然科学基金面上项目“科学结构特征及其演化动力学分析方法与应用研究”(项目编号: 71173211)的研究成果之一
引用本文:   
吕伟民, 王小梅, 韩涛. 结合链路预测和ET机器学习的科研合作推荐方法研究*[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
Lv Weimin,Wang Xiaomei,Han Tao. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(4): 38-45.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.04.05      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I4/38
主要方法 代表性研究
指标加权 Guns[18]以安德鲁大学学院合作网络以及计量情报学领域的合作网络为例, 得出加权的链路预测指标比不加权指标预测效果要好。
基于时序分析 Tylenda等[19]考虑时间进化对预测结果的影响, 在Wang等[20]提出的局部概率模型基础上, 推导出考虑时间信息的最大熵原则方法, 把作者a、b最后一次合作到现在间隔的时间长度融入到加权的链路预测指标中, 提升链路预测的预测成功率。
不同层面网络对比 Yan等[13]从作者、机构、国家三个层面构造合作网络进行研究, 对比三个层面合作网络在8种独立预测指标下的预测结果, 发现越高层面预测精确度越高, 即国家层面高于机构层面高于个人层面。
加权网络 Liben-Nowell等[16]提出, 可以利用网络拓扑结构特征, 将论文标题、作者所在机构和地理位置信息加入到计算中, 对链路预测方法进行微调。具体实施时, Guns[21]将这些信息以不同层面的网络形式表现出来, 提出一种Multi-Input方法, 构建作者合作网络、部门网络和物理位置网络, 将三个子网络线性加权构成训练集。
  链路预测在科研合作网络中的研究现状
数据说明 2008年 2009年 2010年
论文数/篇 120 027 139 810 148 426
点个数 4 638 5 088 5 400
边条数 39 712 47 689 53 073
  数据说明以及每个时期的Article论文数
精确度 指标(Weighted)
AA CN GD Katz RA SimRank RF
方法
ET
方法
$n=5$ 80% 80% 80% 60% 80% 60% 60% 60%
$n=10$ 80% 80% 90% 80% 90% 40% 60% 80%
$n=\text{20}$ 85% 80% 90% 80% 85% 30% 62% 80%
  城市层面推荐精确度
  推荐精确度对比
Accuracy
([AA, CN, GD, Katz, RA])
$n=5$ $n=10$ $n=\text{20}$
[0.0, 0.0, 1.0, 0.0, 0.0] 100% 97% 85%
[0.05, 0.0, 0.85, 0.0, 0.1] 100% 90% 90%
[0.0, 0.05, 0.85, 0.0, 0.1] 100% 90% 90%
[0.0, 0.0, 0.9, 0.0, 0.1] 100% 90% 90%
[0.0, 0.0, 0.85, 0.05, 0.1] 96% 90% 90%
  不同权重下改进ET的推荐精确度
[1] 张斌, 马费成. 科学知识网络中的链路预测研究述评[J]. 中国图书馆学报, 2015, 41(3): 99-113.
doi: 10.13530/j.cnki.jlis.150016
[1] (Zhang Bin, Ma Feicheng.A Review on Link Prediction of Scientific Knowledge Network[J]. Journal of Library Science in China, 2015, 41(3): 99-113.)
doi: 10.13530/j.cnki.jlis.150016
[2] Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
doi: 10.1103/PhysRevE.64.016131 pmid: 11461355
[3] Newman M E J. Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality[J]. Physical Review E, 2001, 64(1): 016132.
doi: 10.1109/AUTEST.2006.283755
[4] Newman M E J. The Structure of Scientific Collaboration Networks[J]. Proceedings of the National Academy of Sciences, 2001, 98(2): 404-409.
[5] Barabási A L, Jeong H, Néda Z, et al.Evolution of the Social Network of Scientific Collaborations[J]. Physica A: Statistical Mechanics and Its Applications, 2002, 311(3-4): 590-614.
doi: 10.1016/S0378-4371(02)00736-7
[6] De Solla Price D J. Little Science, Big Science… and Beyond[M]. New York: Columbia University Press, 1986.
[7] Zuckerman H A.Patterns of Name Ordering Among Authors of Scientific Papers: A Study of Social Symbolism and Its Ambiguity[J]. American Journal of Sociology, 1968, 74(3): 276-291.
doi: 10.1086/224641
[8] Kretschmer H.Author Productivity and Geodesic Distance in Bibliographic Co-authorship Networks, and Visibility on the Web[J]. Scientometrics, 2004, 60(3): 409-420.
doi: 10.1023/B:SCIE.0000034383.86665.22
[9] Lü L, Zhou T.Link Prediction in Complex Networks: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2011, 390(6): 1150-1170.
doi: 10.1016/j.physa.2010.11.027
[10] Zhu B, Xia Y. An Information-theoretic Model for Link Prediction in Complex Networks[J]. Scientific Reports, 2015, 5: Article No. 13707.
doi: 10.1038/srep13707 pmid: 4558573
[11] Guns R, Rousseau R.Predicting and Recommending Potential Research Collaborations[C]//Proceedings of ISSI. 2013: 1409-1418.
[12] Guns R, Rousseau R.Recommending Research Collaborations Using Link Prediction and Random Forest Classifiers[J]. Scientometrics, 2014, 101(2): 1461-1473.
doi: 10.1007/s11192-013-1228-9
[13] Yan E, Guns R.Predicting and Recommending Collaborations: An Author-, Institution-, and Country-level Analysis[J]. Journal of Informetrics, 2014, 8(2): 295-309.
doi: 10.1016/j.joi.2014.01.008
[14] 张斌, 李亚婷. 知识网络演化模型研究述评[J]. 中国图书馆学报, 2016, 42(5): 85-101.
[14] (Zhang Bin, Li Yating.A Review of the Evolution Model of Scientific Knowledge Network[J]. Journal of Library Science in China, 2016, 42(5): 85-101.)
[15] Getoor L, Diehl C P.Link Mining: A Survey[J]. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
[16] Liben-Nowell D, Kleinberg J.The Link Prediction Problem for Social Networks[J]. Journal of the Association for Information Science and Technology, 2007, 58(7): 1019-1031.
doi: 10.1002/asi.20591
[17] 吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010, 39(5): 651-661.
[17] (Lv Linyuan.Link Prediction on Complex Networks[J]. Journal of University of Electronic Science and Technology of China, 2010, 39(5): 651-661.)
[18] Guns R.Missing Links: Predicting Interactions Based on a Multi-relational Network Structure with Applications in Informetrics [A]. // Missing Links: Predicting Interactions Based on a Multi-relational Network Structure with Applications in Informetrics[M]. Universiteit Antwerpen (Belgium). 2012.
[19] Tylenda T, Angelova R, Bedathur S.Towards Time-aware Link Prediction in Evolving Social Networks[C]//Proceedings of the 3rd Workshop on Social Network Mining and Analysis. ACM, 2009: 1-10.
[20] Wang C, Satuluri V, Parthasarathy S.Local Probabilistic Models for Link Prediction[C]//Proceedings of the 7th IEEE International Conference on Data Mining. IEEE, 2007: 322-331.
[21] Guns R.Generalizing Link Prediction: Collaboration at the University of Antwerp as a Case Study[J]. Proceedings of the American Society for Information Science and Technology, 2009, 46(1): 1-15.
doi: 10.1002/meet.2009.1450460225
[22] Mitchell T M. Machine Learning.1997[J]. Burr Ridge, IL: McGraw Hill, 1997, 45(37): 870-877.
[23] Backstrom L, Leskovec J.Supervised Random Walks: Predicting and Recommending Links in Social Networks[C]// Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, 2011: 635-644.
[24] Arora S K, Porter A L, Youtie J, et al.Capturing New Developments in an Emerging Technology: An Updated Search Strategy for Identifying Nanotechnology Research Outputs[J]. Scientometrics, 2013, 95(1): 351-370.
doi: 10.1007/s11192-012-0903-6
[25] Guns R.Bipartite Networks for Link Prediction: Can They Improve Prediction Performance[C]//Proceedings of ISSI. 2011: 249-260.
[26] Adamic L A, Adar E.Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
doi: 10.1016/S0378-8733(03)00009-1
[27] Katz L.A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1): 39-43.
doi: 10.1007/BF02289026
[28] Guns R.Link Prediction[A]//Measuring Scholarly Impact [M]. Springer International Publishing, 2014: 35-55.
[29] Pedregosa F, Varoquaux G, Gramfort A, et al.Scikit-learn: Machine Learning in Python[J]. Journal of Machine Learning Research, 2013, 12(10): 2825-2830.
doi: 10.1524/auto.2011.0951
[30] Hanley J A, McNeil B J. The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve[J]. Radiology, 1982, 143(1): 29-36.
[31] Herlocker J L, Konstan J A, Terveen L G, et al.Evaluating Collaborative Filtering Recommender Systems[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1): 5-53.
[32] Zhou T, Ren J, Medo M, et al.Bipartite Network Projection and Personal Recommendation[J]. Physical Review E, 2007, 76(2): 046115.
[33] Breiman L, Friedman J, Stone C J, et al.Classification and Regression Trees[M]. CRC Press, 1984.
[34] Schubert T, Sooryamoorthy R.Can the Centre-periphery Model Explain Patterns of International Scientific Collaboration Among Threshold and Industrialised Countries? The Case of South Africa and Germany[J]. Scientometrics, 2010, 83(1): 181-203.
doi: 10.1007/s11192-009-0074-2
[35] Boshoff N.South-South Research Collaboration of Countries in the Southern African Development Community (SADC)[J]. Scientometrics, 2010, 84(2): 481-503.
doi: 10.1007/s11192-009-0120-0
[36] Pavlov M, Ichise R.Finding Experts by Link Prediction in Co-authorship Networks[C]//Proceedings of the 2nd International Conference on Finding Experts on the Web with Semantics. 2007.
[1] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] 叶佳鑫,熊回香,童兆莉,孟秋晴. 在线医疗社区中面向医生的协同标注研究*[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[4] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[5] 苏庆,陈思兆,吴伟民,李小妹,黄佃宽. 基于学习情况协同过滤算法的个性化学习推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[6] 郑淞尹,谈国新,史中超. 基于分段用户群与时间上下文的旅游景点推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 92-104.
[7] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[8] 潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[9] 叶佳鑫,熊回香,蒋武轩. 一种融合患者咨询文本与决策机理的医生推荐算法*[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[10] 魏伟,郭崇慧,邢小宇. 基于语义关联规则的试题知识点标注及试题推荐*[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[11] 熊回香,李晓敏,李跃艳. 基于图书评论属性挖掘的群组推荐研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[12] 倪维健,郭浩宇,刘彤,曾庆田. 基于多头自注意力神经网络的购物篮推荐方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 68-77.
[13] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[14] 王若佳,张璐,王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[15] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn