Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (9): 78-88     https://doi.org/10.11925/infotech.2096-3467.2022.0909
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合异质网络与表示学习的科研合作预测方法研究*
李慧(),刘莎,胡耀华,孟玮
西安电子科技大学经济与管理学院 西安 710119
Predicting Scientific Research Cooperation with Heterogeneous Network and Representation Learning
Li Hui(),Liu Sha,Hu Yaohua,Meng Wei
School of Economics and Management, Xidian University, Xi’an 710119, China
全文: PDF (1272 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】为促进科研人员之间的交流合作,提出一种融合异质网络与表示学习的科研合作预测方法。【方法】运用学者、机构、论文、期刊等信息构建异质科研合作网络,根据网络中包含的学者之间不同的共现关系,将该异质网络划分为三种同质共现网络,再进一步利用Node2Vec和Doc2Vec算法分别学习学者的网络结构特征向量和内容属性特征向量,并进行融合。最后通过计算学者向量之间的余弦相似度进行合作预测。【结果】采用Web of Science数据库中人工智能领域的论文数据进行对比实验,本文所提预测方法的AUC值和F1值分别达到0.987 9和0.942 4,优于基线方法。【局限】对学者内容特征的表示没有考虑到学者的研究主题。【结论】本文方法考虑了学者的结构和内容属性,并结合异质网络,融合了机构、论文、期刊等多方面信息,能够得到更好的合作预测效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李慧
刘莎
胡耀华
孟玮
关键词 科研合作预测异质网络表示学习    
Abstract

[Objective] This paper proposes a prediction method based on heterogeneous networks and representation learning. It tries to promote exchanges and cooperation among scientific researchers. [Methods] First, we constructed a heterogeneous scientific research cooperation network with information on scholars, institutions, papers, and journals. According to the different co-occurrence relationships among scholars included in the network, we divided the heterogeneous network into three types of homogenous co-occurrence networks. Then, we used Node2Vec and Doc2Vec to learn the network structure and content attribute features of scholars, respectively. Finally, we merged them to calculate the cosine similarity between scholars. [Results] We examined the new method with datasets in artificial intelligence from WOS. The proposed method’s predicted AUC and F1 values reached 0.987 9 and 0.942 4, respectively, outperforming the baseline methods. [Limitations] The representation of scholar content characteristics does not consider the scholar’s research topics. [Conclusions] The proposed model includes the scholar’s structure and content attributes. It also combines heterogeneous networks and integrates various information, including institutions, papers, and journals. The new method can predict scientific cooperation more effectively.

Key wordsScientific Research Cooperation Forecast    Heterogeneous Network    Representation Learning
收稿日期: 2022-08-29      出版日期: 2023-10-24
ZTFLH:  G350  
基金资助:*中央高校基本科研业务费专项资金项目(QTZX22081)
通讯作者: 李慧,ORCID:0000-0002-3468-5170,E-mail:lihui@xidian.edu.cn。   
引用本文:   
李慧, 刘莎, 胡耀华, 孟玮. 融合异质网络与表示学习的科研合作预测方法研究*[J]. 数据分析与知识发现, 2023, 7(9): 78-88.
Li Hui, Liu Sha, Hu Yaohua, Meng Wei. Predicting Scientific Research Cooperation with Heterogeneous Network and Representation Learning. Data Analysis and Knowledge Discovery, 2023, 7(9): 78-88.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0909      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I9/78
Fig.1  合作预测流程
Fig.2  异质共现网络示意图
Fig.3  同质子网划分流程
比较项 训练集 测试集
时间段 2012-2016 2017-2021
学者数(A) 36 186 125 498
论文数(P) 8 522 31 350
机构数(O) 9 470 27 182
期刊数(J) 2 993 5 361
Table 1  数据集描述性分析
预测算法 计算方式及参数设置
AA A A ( v i , v j ) = v k Γ ( v i ) ? Γ ( v j ) 1 l o g 2 Γ ( v k )
PA P A ( v i , v j ) = Γ ( v i ) ? Γ ( v j )
JC J C ( v i , v j ) = Γ ( v i ) ? Γ ( v j ) Γ ( v i ) ? Γ ( v j )
DeepWalk 向量维数为128维,每个节点开始的随机游走路径长度为80,每个随机游走的步长为10,Skip-gram的窗口设为10
LINE 向量维数为128维,负采样数目为5,学习率为0.025
SDNE 向量维数为128维,一阶和二阶相似度损失系数分别为0.6和0.4,学习率为0.01,批次大小为128
ICPSC 衰减因子 η=0.05;Node2Vec模型的向量维数为64维,参数p=1,q=1,每个节点采样10次,采样的序列长度设为80,Skip-gram窗口设为10,最小频次为1;Doc2Vec模型的向量维数为64维,窗口设为8,最小频次为5
Table 2  各预测算法及参数设置
链接类型 链接数量 最大权重值 最小权重值 平均权重值
A2O 38 878 1 1 1
O2A 38 878 6 905.500 0 0.500 0 24.130 9
A2P 42 013 2 0.204 7 0.632 8
P2A 42 013 1 0.000 1 0.202 8
P2V 8 522 1 1 1
V2P 8 522 2 0.818 7 0.917 5
Table 3  有向加权异质科研合作网络的链接信息(2012-2016)
共现类型 节点数 链接数
A-O-A 33 027 1 257 114
A-P-A 35 715 1 460 738
A-P-J-P-A 36 046 5 339 760
Table 4  提取出的学者同质子网的网络信息(2012-2016)
Fig.4  ROC曲线对比
算法 AUC Accuracy Precision Recall F1
AA 0.851 7 0.839 1 0.998 8 0.678 8 0.726 2
PA 0.915 1 0.659 1 0.764 8 0.318 2 0.326 3
JC 0.851 8 0.845 7 0.996 6 0.692 8 0.739 3
DeepWalk 0.853 5 0.720 3 0.666 7 0.949 5 0.780 1
LINE 0.515 0 0.515 8 0.512 1 0.714 9 0.596 6
SDNE 0.381 7 0.511 2 0.507 2 0.790 4 0.617 5
ICPSC 0.987 9 0.940 3 0.910 1 0.977 1 0.942 4
Table 5  算法性能对比
[1] Kanakia A, Shen Z, Eide D, et al. A Scalable Hybrid Research Paper Recommender System for Microsoft Academic[C]// Proceedings of the World Wide Web Conference. 2019: 2893-2899.
[2] Abramo G, D’Angelo C A, Di Costa F. The Collaboration Behavior of Top Scientists[J]. Scientometrics, 2019, 118(1): 215-232.
doi: 10.1007/s11192-018-2970-9
[3] Fan S, Zhu J, Han X, et al. Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 2478-2486.
[4] Liu J, Xia F, Wang L, et al. Shifu2: A Network Representation Learning Based Model for Advisor-Advisee Relationship Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 33(4): 1763-1777.
[5] Lu Y, Shi C, Hu L, et al. Relation Structure-aware Heterogeneous Information Network Embedding[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 4456-4463.
[6] Coccia M. The Evolution of Scientific Disciplines in Applied Sciences: Dynamics and Empirical Properties of Experimental Physics[J]. Scientometrics, 2020, 124(1): 451-487.
doi: 10.1007/s11192-020-03464-y
[7] Newman M E J. The Structure of Scientific Collaboration Networks[J]. PNAS, 2001, 98(2): 404-409.
doi: 10.1073/pnas.021544898 pmid: 11149952
[8] Newman M E J. Scientific Collaboration Networks. I. Network Construction and Fundamental Results[J]. Physical Review E, 2001, 64(1): 016131.
doi: 10.1103/PhysRevE.64.016131
[9] Newman M E J. Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality[J]. Physical Review E, 2001, 64(1): 016132.
doi: 10.1103/PhysRevE.64.016132
[10] Newman M E J, Girvan M. Finding and Evaluating Community Structure in Networks[J]. Physical Review E, 2004, 69(2): 026113.
doi: 10.1103/PhysRevE.69.026113
[11] Martínez V, Berzal F, Cubero J C. A Survey of Link Prediction in Complex Networks[J]. ACM Computing Surveys, 2016, 49(4): 1-33.
[12] Jaccard P. Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura[J]. Bulletin Société Vaudoise Science Nature, 1901, 37: 547-579.
[13] Adamic L A, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
doi: 10.1016/S0378-8733(03)00009-1
[14] Liben-Nowell D, Kleinberg J. The Link-Prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2007, 58(7): 1019-1031.
doi: 10.1002/asi.v58:7
[15] Zhou T, Lü L, Zhang Y C. Predicting Missing Links via Local Information[J]. The European Physical Journal B, 2009, 71(4): 623-630.
doi: 10.1140/epjb/e2009-00335-8
[16] Rafiee S, Salavati C, Abdollahpouri A. CNDP: Link Prediction Based on Common Neighbors Degree Penalization[J]. Physica A: Statistical Mechanics and Its Applications, 2020, 539: 122950.
doi: 10.1016/j.physa.2019.122950
[17] 丁敬达, 郭杰. 融合内容相似度和路径相似性的潜在作者合作关系挖掘[J]. 情报理论与实践, 2021, 44(01): 124-128, 123.
[17] (Ding Jingda, Guo Jie. Mining Potential Author Cooperative Relationships Based on the Similarity of Content and Path[J]. Information Studies: Theory & Application, 2021, 44(1): 124-128, 123.)
[18] 汪志兵, 韩文民, 孙竹梅, 等. 基于网络拓扑结构与节点属性特征融合的科研合作预测研究[J]. 情报理论与实践, 2019, 42(8): 116-120, 109.
[18] (Wang Zhibing, Han Wenmin, Sun Zhumei, et al. Research on Scientific Collaboration Prediction Based on the Combination of Network Topology and Node Attributes[J]. Information Studies: Theory & Application, 2019, 42(8): 116-120, 109.)
[19] Yao Y, Zhang R, Yang F, et al. Link Prediction in Complex Networks Based on the Interactions among Paths[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 510: 52-67.
doi: 10.1016/j.physa.2018.06.051
[20] 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018, 37(2):132-139.
[20] (Zhang Jinzhu, Yu Wenqian, Liu Jingjie, et al. Predicting Research Collaborations Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(2): 132-139.)
[21] 陈文杰. 基于翻译模型的科研合作预测研究[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[21] (Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 28-36.)
[22] 余传明, 林奥琛, 钟韵辞, 等. 基于网络表示学习的科研合作推荐研究[J]. 情报学报, 2019, 38(5): 500-511.
[22] (Yu Chuanming, Lin Aochen, Zhong Yunci, et al. Scientific Collaboration Recommendation Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(5): 500-511.)
[23] 张鑫, 文奕, 许海云. 一种融合表示学习与主题表征的作者合作预测模型[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[23] (Zhang Xin, Wen Yi, Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 88-100.)
[24] Abramo G, D’Angelo C A, Di Costa F. The Collaboration Behavior of Top Scientists[J]. Scientometrics, 2019, 118(1): 215-232.
doi: 10.1007/s11192-018-2970-9
[25] Wang W, Liu J, Yang Z, et al. Sustainable Collaborator Recommendation Based on Conference Closure[J]. IEEE Transactions on Computational Social Systems, 2019, 6(2): 311-322.
doi: 10.1109/TCSS.2019.2898198
[26] Bornmann L, Leydesdorff L. Topical Connections Between the Institutions Within an Organisation (Institutional Co-authorships, Direct Citation Links and Co-citations)[J]. Scientometrics, 2015, 102(1): 455-463.
doi: 10.1007/s11192-014-1425-1
[27] Ding Y, Li X. Time Weight Collaborative Filtering[C]// Proceedings of the 14th ACM International Conference on Information and Knowledge Management. New York: ACM, 2005: 485-492.
[28] Hagen N T. Harmonic Allocation of Authorship Credit: Source-level Correction of Bibliometric Bias Assures Accurate Publication and Citation Analysis[J]. PLoS One, 2008, 3(12): e4021.
doi: 10.1371/journal.pone.0004021
[29] Zhao Z, Zhang X, Zhou H, et al. HetNERec: Heterogeneous Network Embedding Based Recommendation[J]. Knowledge-Based Systems, 2020, 204: 106218.
doi: 10.1016/j.knosys.2020.106218
[30] Wang W, Yu S, Bekele T M, et al. Scientific Collaboration Patterns Vary with Scholars’ Academic Ages[J]. Scientometrics, 2017, 112(1): 329-343.
doi: 10.1007/s11192-017-2388-9
[31] Barabasi A L, Albert R. Emergence of Scaling in Random Networks[J]. Science, 1999, 286(5439): 509-512.
doi: 10.1126/science.286.5439.509 pmid: 10521342
[32] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 701-710.
[33] Tang J, Qu M, Wang M, et al. Line: Large-scale Information Network Embedding[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 1067-1077.
[34] Wang D, Cui P, Zhu W. Structural Deep Network Embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 1225-1234.
[1] 曹琨, 吴新年, 靳军宝, 郑玉荣, 付爽. 基于共词和Node2Vec表示学习的新兴技术识别方法*[J]. 数据分析与知识发现, 2023, 7(9): 89-99.
[2] 吴佳伦, 张若楠, 康武林, 袁普卫. 基于患者相似性分析的药物推荐深度学习模型研究*[J]. 数据分析与知识发现, 2023, 7(6): 148-160.
[3] 崔焕庆, 杨峻铸, 宋玮情. 基于相似特征和关系图优化的姓名消歧*[J]. 数据分析与知识发现, 2023, 7(5): 71-80.
[4] 邓启平, 陈卫静, 嵇灵, 张宇娥. 一种基于异质信息网络的学术文献作者重名消歧方法*[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[5] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[6] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[7] 张金柱, 于文倩. 基于短语表示学习的主题识别及其表征词抽取方法研究[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[8] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[9] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[10] 余传明,钟韵辞,林奥琛,安璐. 基于网络表示学习的作者重名消歧研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[11] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[12] 张金柱,主立鹏,刘菁婕. 基于表示学习的无监督跨语言专利推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[13] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[14] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[15] 曾庆田,戴明弟,李超,段华,赵中英. 轨迹数据融合用户表示方法的重要位置发现*[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn