Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (9): 78-88    DOI: 10.11925/infotech.2096-3467.2022.0909
Current Issue | Archive | Adv Search |
Predicting Scientific Research Cooperation with Heterogeneous Network and Representation Learning
Li Hui(),Liu Sha,Hu Yaohua,Meng Wei
School of Economics and Management, Xidian University, Xi’an 710119, China
Download: PDF (1272 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a prediction method based on heterogeneous networks and representation learning. It tries to promote exchanges and cooperation among scientific researchers. [Methods] First, we constructed a heterogeneous scientific research cooperation network with information on scholars, institutions, papers, and journals. According to the different co-occurrence relationships among scholars included in the network, we divided the heterogeneous network into three types of homogenous co-occurrence networks. Then, we used Node2Vec and Doc2Vec to learn the network structure and content attribute features of scholars, respectively. Finally, we merged them to calculate the cosine similarity between scholars. [Results] We examined the new method with datasets in artificial intelligence from WOS. The proposed method’s predicted AUC and F1 values reached 0.987 9 and 0.942 4, respectively, outperforming the baseline methods. [Limitations] The representation of scholar content characteristics does not consider the scholar’s research topics. [Conclusions] The proposed model includes the scholar’s structure and content attributes. It also combines heterogeneous networks and integrates various information, including institutions, papers, and journals. The new method can predict scientific cooperation more effectively.

Key wordsScientific Research Cooperation Forecast      Heterogeneous Network      Representation Learning     
Received: 29 August 2022      Published: 24 October 2023
ZTFLH:  G350  
Fund:The Fundamental Research Funds for the Central Universities(QTZX22081)
Corresponding Authors: Li Hui,ORCID:0000-0002-3468-5170,E-mail:lihui@xidian.edu.cn。   

Cite this article:

Li Hui, Liu Sha, Hu Yaohua, Meng Wei. Predicting Scientific Research Cooperation with Heterogeneous Network and Representation Learning. Data Analysis and Knowledge Discovery, 2023, 7(9): 78-88.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0909     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I9/78

Cooperation Prediction Pipeline
Schematic Diagram of Heterogeneous Co-occurrence Network
Homogeneous Subnetwork Division Process
比较项 训练集 测试集
时间段 2012-2016 2017-2021
学者数(A) 36 186 125 498
论文数(P) 8 522 31 350
机构数(O) 9 470 27 182
期刊数(J) 2 993 5 361
Descriptive Analysis of Datasets
预测算法 计算方式及参数设置
AA A A ( v i , v j ) = v k Γ ( v i ) ? Γ ( v j ) 1 l o g 2 Γ ( v k )
PA P A ( v i , v j ) = Γ ( v i ) ? Γ ( v j )
JC J C ( v i , v j ) = Γ ( v i ) ? Γ ( v j ) Γ ( v i ) ? Γ ( v j )
DeepWalk 向量维数为128维,每个节点开始的随机游走路径长度为80,每个随机游走的步长为10,Skip-gram的窗口设为10
LINE 向量维数为128维,负采样数目为5,学习率为0.025
SDNE 向量维数为128维,一阶和二阶相似度损失系数分别为0.6和0.4,学习率为0.01,批次大小为128
ICPSC 衰减因子 η=0.05;Node2Vec模型的向量维数为64维,参数p=1,q=1,每个节点采样10次,采样的序列长度设为80,Skip-gram窗口设为10,最小频次为1;Doc2Vec模型的向量维数为64维,窗口设为8,最小频次为5
Prediction Algorithms and Parameter Settings
链接类型 链接数量 最大权重值 最小权重值 平均权重值
A2O 38 878 1 1 1
O2A 38 878 6 905.500 0 0.500 0 24.130 9
A2P 42 013 2 0.204 7 0.632 8
P2A 42 013 1 0.000 1 0.202 8
P2V 8 522 1 1 1
V2P 8 522 2 0.818 7 0.917 5
Linked Information on Directed Weighted Heterogeneous Research Collaboration Networks (2012-2016)
共现类型 节点数 链接数
A-O-A 33 027 1 257 114
A-P-A 35 715 1 460 738
A-P-J-P-A 36 046 5 339 760
The Extracted Network Information of Homogenous Subnet of Scholars (2012-2016)
ROC Curve
算法 AUC Accuracy Precision Recall F1
AA 0.851 7 0.839 1 0.998 8 0.678 8 0.726 2
PA 0.915 1 0.659 1 0.764 8 0.318 2 0.326 3
JC 0.851 8 0.845 7 0.996 6 0.692 8 0.739 3
DeepWalk 0.853 5 0.720 3 0.666 7 0.949 5 0.780 1
LINE 0.515 0 0.515 8 0.512 1 0.714 9 0.596 6
SDNE 0.381 7 0.511 2 0.507 2 0.790 4 0.617 5
ICPSC 0.987 9 0.940 3 0.910 1 0.977 1 0.942 4
Algorithm Performance
[1] Kanakia A, Shen Z, Eide D, et al. A Scalable Hybrid Research Paper Recommender System for Microsoft Academic[C]// Proceedings of the World Wide Web Conference. 2019: 2893-2899.
[2] Abramo G, D’Angelo C A, Di Costa F. The Collaboration Behavior of Top Scientists[J]. Scientometrics, 2019, 118(1): 215-232.
doi: 10.1007/s11192-018-2970-9
[3] Fan S, Zhu J, Han X, et al. Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 2478-2486.
[4] Liu J, Xia F, Wang L, et al. Shifu2: A Network Representation Learning Based Model for Advisor-Advisee Relationship Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 33(4): 1763-1777.
[5] Lu Y, Shi C, Hu L, et al. Relation Structure-aware Heterogeneous Information Network Embedding[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 4456-4463.
[6] Coccia M. The Evolution of Scientific Disciplines in Applied Sciences: Dynamics and Empirical Properties of Experimental Physics[J]. Scientometrics, 2020, 124(1): 451-487.
doi: 10.1007/s11192-020-03464-y
[7] Newman M E J. The Structure of Scientific Collaboration Networks[J]. PNAS, 2001, 98(2): 404-409.
doi: 10.1073/pnas.021544898 pmid: 11149952
[8] Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
doi: 10.1103/PhysRevE.64.016131
[9] Newman M E J. Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality[J]. Physical Review E, 2001, 64(1): 016132.
doi: 10.1103/PhysRevE.64.016132
[10] Newman M E J, Girvan M. Finding and Evaluating Community Structure in Networks[J]. Physical Review E, 2004, 69(2): 026113.
doi: 10.1103/PhysRevE.69.026113
[11] Martínez V, Berzal F, Cubero J C. A Survey of Link Prediction in Complex Networks[J]. ACM Computing Surveys, 2016, 49(4): 1-33.
[12] Jaccard P. Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura[J]. Bulletin Société Vaudoise Science Nature, 1901, 37: 547-579.
[13] Adamic L A, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
doi: 10.1016/S0378-8733(03)00009-1
[14] Liben-Nowell D, Kleinberg J. The Link-Prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2007, 58(7): 1019-1031.
doi: 10.1002/asi.v58:7
[15] Zhou T, Lü L, Zhang Y C. Predicting Missing Links via Local Information[J]. The European Physical Journal B, 2009, 71(4): 623-630.
doi: 10.1140/epjb/e2009-00335-8
[16] Rafiee S, Salavati C, Abdollahpouri A. CNDP: Link Prediction Based on Common Neighbors Degree Penalization[J]. Physica A: Statistical Mechanics and Its Applications, 2020, 539: 122950.
doi: 10.1016/j.physa.2019.122950
[17] 丁敬达, 郭杰. 融合内容相似度和路径相似性的潜在作者合作关系挖掘[J]. 情报理论与实践, 2021, 44(01): 124-128, 123.
[17] (Ding Jingda, Guo Jie. Mining Potential Author Cooperative Relationships Based on the Similarity of Content and Path[J]. Information Studies: Theory & Application, 2021, 44(1): 124-128, 123.)
[18] 汪志兵, 韩文民, 孙竹梅, 等. 基于网络拓扑结构与节点属性特征融合的科研合作预测研究[J]. 情报理论与实践, 2019, 42(8): 116-120, 109.
[18] (Wang Zhibing, Han Wenmin, Sun Zhumei, et al. Research on Scientific Collaboration Prediction Based on the Combination of Network Topology and Node Attributes[J]. Information Studies: Theory & Application, 2019, 42(8): 116-120, 109.)
[19] Yao Y, Zhang R, Yang F, et al. Link Prediction in Complex Networks Based on the Interactions among Paths[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 510: 52-67.
doi: 10.1016/j.physa.2018.06.051
[20] 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018, 37(2):132-139.
[20] (Zhang Jinzhu, Yu Wenqian, Liu Jingjie, et al. Predicting Research Collaborations Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(2): 132-139.)
[21] 陈文杰. 基于翻译模型的科研合作预测研究[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[21] (Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 28-36.)
[22] 余传明, 林奥琛, 钟韵辞, 等. 基于网络表示学习的科研合作推荐研究[J]. 情报学报, 2019, 38(5): 500-511.
[22] (Yu Chuanming, Lin Aochen, Zhong Yunci, et al. Scientific Collaboration Recommendation Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(5): 500-511.)
[23] 张鑫, 文奕, 许海云. 一种融合表示学习与主题表征的作者合作预测模型[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[23] (Zhang Xin, Wen Yi, Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 88-100.)
[24] Abramo G, D’Angelo C A, Di Costa F. The Collaboration Behavior of Top Scientists[J]. Scientometrics, 2019, 118(1): 215-232.
doi: 10.1007/s11192-018-2970-9
[25] Wang W, Liu J, Yang Z, et al. Sustainable Collaborator Recommendation Based on Conference Closure[J]. IEEE Transactions on Computational Social Systems, 2019, 6(2): 311-322.
doi: 10.1109/TCSS.2019.2898198
[26] Bornmann L, Leydesdorff L. Topical Connections Between the Institutions Within an Organisation (Institutional Co-authorships, Direct Citation Links and Co-citations)[J]. Scientometrics, 2015, 102(1): 455-463.
doi: 10.1007/s11192-014-1425-1
[27] Ding Y, Li X. Time Weight Collaborative Filtering[C]// Proceedings of the 14th ACM International Conference on Information and Knowledge Management. New York: ACM, 2005: 485-492.
[28] Hagen N T. Harmonic Allocation of Authorship Credit: Source-level Correction of Bibliometric Bias Assures Accurate Publication and Citation Analysis[J]. PLoS One, 2008, 3(12): e4021.
doi: 10.1371/journal.pone.0004021
[29] Zhao Z, Zhang X, Zhou H, et al. HetNERec: Heterogeneous Network Embedding Based Recommendation[J]. Knowledge-Based Systems, 2020, 204: 106218.
doi: 10.1016/j.knosys.2020.106218
[30] Wang W, Yu S, Bekele T M, et al. Scientific Collaboration Patterns Vary with Scholars’ Academic Ages[J]. Scientometrics, 2017, 112(1): 329-343.
doi: 10.1007/s11192-017-2388-9
[31] Barabasi A L, Albert R. Emergence of Scaling in Random Networks[J]. Science, 1999, 286(5439): 509-512.
doi: 10.1126/science.286.5439.509 pmid: 10521342
[32] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 701-710.
[33] Tang J, Qu M, Wang M, et al. Line: Large-scale Information Network Embedding[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 1067-1077.
[34] Wang D, Cui P, Zhu W. Structural Deep Network Embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 1225-1234.
[1] Cao Kun, Wu Xinnian, Jin Junbao, Zheng Yurong, Fu Shuang. Identification of Emerging Technology Based on Co-words and Node2Vec Representation Learning[J]. 数据分析与知识发现, 2023, 7(9): 89-99.
[2] Wu Jialun, Zhang Ruonan, Kang Wulin, Yuan Puwei. Deep Learning Model of Drug Recommendation Based on Patient Similarity Analysis[J]. 数据分析与知识发现, 2023, 7(6): 148-160.
[3] Cui Huanqing, Yang Junzhu, Song Weiqing. Name Disambiguation Based on Similar Features and Relation Graph Optimization[J]. 数据分析与知识发现, 2023, 7(5): 71-80.
[4] Feng Yong, Xu Wentao, Wang Rongbing, Xu Hongyan, Zhang Yonggang. User Community Partition Based on Multi-layer Information Fusion in E-commerce Heterogeneous Network[J]. 数据分析与知识发现, 2022, 6(5): 89-98.
[5] Deng Qiping, Chen Weijing, Ji Ling, Zhang Yu’e. Author Name Disambiguation Based on Heterogeneous Information Network[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[6] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[7] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[8] Zhang Jinzhu, Yu Wenqian. Topic Recognition and Key-Phrase Extraction with Phrase Representation Learning[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[9] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[10] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[11] Yu Chuanming,Zhong Yunci,Lin Aochen,An Lu. Author Name Disambiguation with Network Embedding[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[12] Zhang Chunjin,Guo Shenghui,Ji Shujuan,Yang Wei,Yi Lei. Group Recommendation Algorithms Based on Implicit Representation Learning of Multi-attribute Ratings[J]. 数据分析与知识发现, 2020, 4(12): 120-135.
[13] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[14] Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[15] Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An. Knowledge Representation Based on Deep Learning:Network Perspective[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn