Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (4): 38-45    DOI: 10.11925/infotech.2096-3467.2017.04.05
Orginal Article Current Issue | Archive | Adv Search |
Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm
Weimin Lv1,2,Xiaomei Wang3(),Tao Han1
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF(525 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      

[Objective] This paper proposes a method to recommend scientific research collaborators based on link prediction and machine learning, which improves the precision of traditional method. [Methods] First, we used Link Prediction Algorithm index to build the feature input, and adopted the Extremely Randomized Trees Algorithm to train the classifier. Then, we obtained the optimal weight combination with the traversal algorithm to combine the classification results linearly. Finally, we received the best recommendation of collaborators. [Results] The improved ET method had better performance than the existing ones in recommending the collaboration cities. Besides, the proposed method was less affected by factors such as the network structure, and could be used with more applications. [Limitations] Scientific research collaboration is affected by the cooperation motivation, geographical, language and many other factors. The weighted author network did not examine authors from the same cities or with the same organizations. [Conclusions] The propsoed method could produce better recommendation results, which might help universities, institutions and individuals identify academic collabortors.

Key wordsScientific Research Collaboration Network      Link Prediction      Machine Learning      Random Forest      Extremely Randomized Trees      Recommendation     
Received: 16 January 2017      Published: 24 May 2017

Cite this article:

Weimin Lv,Xiaomei Wang,Tao Han. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(4): 38-45.

URL:     OR

[1] 张斌, 马费成. 科学知识网络中的链路预测研究述评[J]. 中国图书馆学报, 2015, 41(3): 99-113.
[1] (Zhang Bin, Ma Feicheng.A Review on Link Prediction of Scientific Knowledge Network[J]. Journal of Library Science in China, 2015, 41(3): 99-113.)
[2] Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
[3] Newman M E J. Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality[J]. Physical Review E, 2001, 64(1): 016132.
[4] Newman M E J. The Structure of Scientific Collaboration Networks[J]. Proceedings of the National Academy of Sciences, 2001, 98(2): 404-409.
[5] Barabási A L, Jeong H, Néda Z, et al.Evolution of the Social Network of Scientific Collaborations[J]. Physica A: Statistical Mechanics and Its Applications, 2002, 311(3-4): 590-614.
[6] De Solla Price D J. Little Science, Big Science… and Beyond[M]. New York: Columbia University Press, 1986.
[7] Zuckerman H A.Patterns of Name Ordering Among Authors of Scientific Papers: A Study of Social Symbolism and Its Ambiguity[J]. American Journal of Sociology, 1968, 74(3): 276-291.
[8] Kretschmer H.Author Productivity and Geodesic Distance in Bibliographic Co-authorship Networks, and Visibility on the Web[J]. Scientometrics, 2004, 60(3): 409-420.
[9] Lü L, Zhou T.Link Prediction in Complex Networks: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2011, 390(6): 1150-1170.
[10] Zhu B, Xia Y. An Information-theoretic Model for Link Prediction in Complex Networks[J]. Scientific Reports, 2015, 5: Article No. 13707.
[11] Guns R, Rousseau R.Predicting and Recommending Potential Research Collaborations[C]//Proceedings of ISSI. 2013: 1409-1418.
[12] Guns R, Rousseau R.Recommending Research Collaborations Using Link Prediction and Random Forest Classifiers[J]. Scientometrics, 2014, 101(2): 1461-1473.
[13] Yan E, Guns R.Predicting and Recommending Collaborations: An Author-, Institution-, and Country-level Analysis[J]. Journal of Informetrics, 2014, 8(2): 295-309.
[14] 张斌, 李亚婷. 知识网络演化模型研究述评[J]. 中国图书馆学报, 2016, 42(5): 85-101.
[14] (Zhang Bin, Li Yating.A Review of the Evolution Model of Scientific Knowledge Network[J]. Journal of Library Science in China, 2016, 42(5): 85-101.)
[15] Getoor L, Diehl C P.Link Mining: A Survey[J]. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
[16] Liben-Nowell D, Kleinberg J.The Link Prediction Problem for Social Networks[J]. Journal of the Association for Information Science and Technology, 2007, 58(7): 1019-1031.
[17] 吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010, 39(5): 651-661.
[17] (Lv Linyuan.Link Prediction on Complex Networks[J]. Journal of University of Electronic Science and Technology of China, 2010, 39(5): 651-661.)
[18] Guns R.Missing Links: Predicting Interactions Based on a Multi-relational Network Structure with Applications in Informetrics [A]. // Missing Links: Predicting Interactions Based on a Multi-relational Network Structure with Applications in Informetrics[M]. Universiteit Antwerpen (Belgium). 2012.
[19] Tylenda T, Angelova R, Bedathur S.Towards Time-aware Link Prediction in Evolving Social Networks[C]//Proceedings of the 3rd Workshop on Social Network Mining and Analysis. ACM, 2009: 1-10.
[20] Wang C, Satuluri V, Parthasarathy S.Local Probabilistic Models for Link Prediction[C]//Proceedings of the 7th IEEE International Conference on Data Mining. IEEE, 2007: 322-331.
[21] Guns R.Generalizing Link Prediction: Collaboration at the University of Antwerp as a Case Study[J]. Proceedings of the American Society for Information Science and Technology, 2009, 46(1): 1-15.
[22] Mitchell T M. Machine Learning.1997[J]. Burr Ridge, IL: McGraw Hill, 1997, 45(37): 870-877.
[23] Backstrom L, Leskovec J.Supervised Random Walks: Predicting and Recommending Links in Social Networks[C]// Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, 2011: 635-644.
[24] Arora S K, Porter A L, Youtie J, et al.Capturing New Developments in an Emerging Technology: An Updated Search Strategy for Identifying Nanotechnology Research Outputs[J]. Scientometrics, 2013, 95(1): 351-370.
[25] Guns R.Bipartite Networks for Link Prediction: Can They Improve Prediction Performance[C]//Proceedings of ISSI. 2011: 249-260.
[26] Adamic L A, Adar E.Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
[27] Katz L.A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1): 39-43.
[28] Guns R.Link Prediction[A]//Measuring Scholarly Impact [M]. Springer International Publishing, 2014: 35-55.
[29] Pedregosa F, Varoquaux G, Gramfort A, et al.Scikit-learn: Machine Learning in Python[J]. Journal of Machine Learning Research, 2013, 12(10): 2825-2830.
[30] Hanley J A, McNeil B J. The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve[J]. Radiology, 1982, 143(1): 29-36.
[31] Herlocker J L, Konstan J A, Terveen L G, et al.Evaluating Collaborative Filtering Recommender Systems[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1): 5-53.
[32] Zhou T, Ren J, Medo M, et al.Bipartite Network Projection and Personal Recommendation[J]. Physical Review E, 2007, 76(2): 046115.
[33] Breiman L, Friedman J, Stone C J, et al.Classification and Regression Trees[M]. CRC Press, 1984.
[34] Schubert T, Sooryamoorthy R.Can the Centre-periphery Model Explain Patterns of International Scientific Collaboration Among Threshold and Industrialised Countries? The Case of South Africa and Germany[J]. Scientometrics, 2010, 83(1): 181-203.
[35] Boshoff N.South-South Research Collaboration of Countries in the Southern African Development Community (SADC)[J]. Scientometrics, 2010, 84(2): 481-503.
[36] Pavlov M, Ichise R.Finding Experts by Link Prediction in Co-authorship Networks[C]//Proceedings of the 2nd International Conference on Finding Experts on the Web with Semantics. 2007.
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[3] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[4] Yiwen Zhang,Chenkun Zhang,Anju Yang,Chengrui Ji,Lihua Yue. A Conditional Walk Quadripartite Graph Based Personalized Recommendation Algorithm[J]. 数据分析与知识发现, 2019, 3(4): 117-125.
[5] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[6] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[7] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[8] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[9] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[10] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[11] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[12] Datian Bi,Fu Wang,Pengcheng Xu. Analyzing Mobile Library Users and Recommending Services with VSM[J]. 数据分析与知识发现, 2018, 2(9): 100-108.
[13] Yue He,Yue Feng,Shupeng Zhao,Yufeng Ma. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[14] Cheng Zhou,Hongqin Wei. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[15] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938