基于翻译模型的科研合作预测研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2020.0062

数据分析与知识发现

2020, Vol. 4

Issue (10): 28-36 https://doi.org/10.11925/infotech.2096-3467.2020.0062

研究论文

本期目录 | 过刊浏览 | 高级检索

基于翻译模型的科研合作预测研究^*

陈文杰(

)

中国科学院成都文献情报中心成都 610041

Predicting Research Collaboration Based on Translation Model

Chen Wenjie(

)

Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (887 KB) HTML ( 21 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 为促进科研人员间的交流合作,实现科研效率最大化,提出一种改进的翻译模型TransTopic,用于干细胞领域的科研合作预测研究。【方法】 TransTopic旨在将科研合作网络中的节点和边统一映射为低维向量。利用LDA主题模型抽取论文的主题分布特征,使用深度自编码器将主题特征编码为边向量,基于翻译机制得到节点向量,通过向量间的语义计算实现科研合作预测。【结果】 TransTopic在链接预测上的AUC（95.21%）和MeanRank（17.48）指标均表现最优,并且主题预测的准确率达到86.52%。【局限】 合作预测方法仅考虑了一步的翻译路径,并且作者的机构、研究兴趣和发文等级等多元信息没有得到充分的利用。【结论】 基于翻译模型的预测方法可以有效完成干细胞领域的科研合作预测工作。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈文杰

关键词 ：翻译模型, 深度自编码器, 主题模型, 链接预测

Abstract：

[Objective] This paper proposes a modified translation model (TransTopic) to predict research cooperation, aiming to promote exchanges among researchers and maximize efficiency.[Methods] We used TransTopic to uniformly map the nodes and edges of the scientific research cooperation network to low-dimensional vectors. First, we used the LDA model to extract the topic distribution features of stem cells papers. Then, we turned topic features to edge vectors with the deep autoencoder and obtained node vectors based on the translation mechanism. Finally, we predicted the scientific cooperation through the semantic calculation between the vectors.[Results] TransTopic’s AUC (95.21%) and MeanRank (17.48) indicators for link prediction are better than those of the existing models, and its topic prediction accuracy rate reached 86.52%.[Limitations] The proposed method only considered a one-step translation path, and did not fully utilized information like author’s institution, research interests, and publication levels.[Conclusions] The proposed method based on translation model could effectively predict research cooperation in the field of stem cells.

Key words： Translation Model Deep Autoencoder Topic Model Link Prediction

收稿日期: 2020-01-19 出版日期: 2020-11-09

ZTFLH:

TP391

基金资助:*本文系中国科学院十三五信息化基金项目“面向干细胞领域知识发现的科研信息化应用”的研究成果之一(XXH13506)

通讯作者: 陈文杰 E-mail: chenwj@clas.ac.cn

引用本文:

陈文杰. 基于翻译模型的科研合作预测研究^*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
Chen Wenjie. Predicting Research Collaboration Based on Translation Model. Data Analysis and Knowledge Discovery, 2020, 4(10): 28-36.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0062 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/28

Fig.1 翻译机制

Fig.2 深度自编码器

Fig.3 TransTopic模型架构

Table 1 数据集

Table 2 AUC

Table 3 MeanRank

Table 4 排名前5的合作关系

d	Net-S	Net-M	Net-L
20	82.17%	80.95%	76.54%
$70$	86.52%	84.73%	81.65%
$120$	81.23%	79.24%	73.28%

Table 5 主题预测

[1]	Guns R, Rousseau R. Recommending Research Collaborations Using Link Prediction and Random Forest Classifiers[J]. Scientometrics, 2014,101(2):1461-1473. doi: 10.1007/s11192-013-1228-9
[2]	张金柱, 王小梅, 韩涛. 文献-作者二分网络中基于路径组合的合著关系预测研究[J]. 现代图书情报技术, 2016 (10):42-49.
[2]	( Zhang Jinzhu, Wang Xiaomei, Han Tao. Predicting Co-authorship with Combination of Paths in Paper-author Bipartite Networks[J]. New Technology of Library and Information Service, 2016 (10):42-49.)
[3]	张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018,37(2):132-139.
[3]	( Zhang Jinzhu, Yu Wenqian, Liu Jingjie, et al. Predicting Research Collaborations Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(2):132-139.)
[4]	林原, 刘海峰, 王海龙, 等. 基于表示学习的学者间潜在合作机会挖掘[J]. 情报杂志, 2019,38(5):65-70.
[4]	( Lin Yuan, Liu Haifeng, Wang Hailong, et al. Potential Cooperation Opportunities Exploration Between Scholars Based on Presentation Learning[J]. Journal of Intelligence, 2019,38(5):65-70.)
[5]	Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013: 3111-3119.
[6]	Bordes A, Usunier N, Garcia-Durán A, et al. Translating Embeddings for Modeling Multi-Relational Data[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2. 2013: 2787-2795.
[7]	刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
[7]	( Liu Zhiyuan, Sun Maosong, Lin Yankai, et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
[8]	Wang Z W, Zhang J L, Feng J, et al. Knowledge Graph Embedding by Translating on Hyperplanes[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[9]	Lin Y K, Liu Z Y, Sun M S, et al. Learning Entity and Relation Embedding for Knowledge Graph Completion[C]//Proceedings of AAAI 2015. 2015:2181-2187.
[10]	Xiao H, Huang M L, Hao Y, et al. TransG: A Generative Mixture Model for Knowledge Graph Embedding[OL]. arXiv Preprint, arXiv:1509.05488, 2015.
[11]	He S Z, Liu K, Ji G L, et al. Learning to Represent Knowledge Graphs with Gaussian Embedding[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 623-632.
[12]	Xiao H, Huang M L, Hao Y, et al. TransA: An Adaptive Approach for Knowledge Graph Embedding[OL]. arXiv Preprint, arXiv: 1509.05490, 2015.
[13]	方阳, 赵翔, 谭真, 等. 一种改进的基于翻译的知识图谱表示方法[J]. 计算机研究与发展, 2018,55(1):139-150.
[13]	( Fang Yang, Zhao Xiang, Tan Zhen, et al. A Revised Translation-Based Method for Knowledge Graph Representation[J]. Journal of Computer Research and Development, 2018,55(1):139-150.)
[14]	Lin Y K, Liu Z Y, Luan H B, et al. Modeling Relation Paths for Representation Learning of Knowledge Bases[OL]. arXiv Preprint, arXiv:1506.00379, 2015.
[15]	Xie R B, Liu Z Y, Jia J, et al. Representation Learning of Knowledge Graphs with Entity Descriptions[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016: 2659-2665.
[16]	Newman M E J. The Structure and Function of Complex Networks[J]. SIAM Review, 2003,45(2):167-256. doi: 10.1137/S003614450342480
[17]	Liben-Nowell D, Kleinberg J. The Link Prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2003,58(7):1019-1031. doi: 10.1002/(ISSN)1532-2890
[18]	孙晓玲. 作者合作网络的结构及其演化与预测研究[D]. 大连: 大连理工大学, 2014.
[18]	( Sun Xiaoling. Research on the Structure, Evolution and Prediction of Author Cooperative Network[D]. Dalian: Dalian University of Technology, 2014.)
[19]	刘竟, 孙薇. 基于链路预测的潜在科研合作关系发现研究[J]. 情报理论与实践, 2017,40(7):88-92, 121.
[19]	( Liu Jing, Sun Wei. Discovery of Potential Scientific and Technical Collaborative Relationship Based on Link Prediction[J]. Information Studies: Theory & Application, 2017,40(7):88-92, 121.)
[20]	汪志兵, 韩文民, 孙竹梅, 等. 基于网络拓扑结构与节点属性特征融合的科研合作预测研究[J]. 情报理论与实践, 2019,42(8):116-120, 109.
[20]	( Wang Zhibing, Han Wenmin, Sun Zhumei, et al. Research on Scientific Collaboration Prediction Based on the Combination of Network Topology and Node Attributes[J]. Information Studies:Theory & Application, 2019,42(8):116-120, 109.)
[21]	张金柱, 韩涛, 王小梅. 作者-关键词二分网络中的合著关系预测研究[J]. 图书情报工作, 2016,60(21):74-80.
[21]	( Zhang Jinzhu, Han Tao, Wang Xiaomei. Co-authorship Prediction in the Author-keyword Bipartite Networks[J]. Library and Information Service, 2016,60(21):74-80.)
[22]	Luong N T, Nguyen T T, Jung J J, et al. Discovering Co-author Relationship in Bibliographic Data Using Similarity Measures and Random Walk Model[C]//Proceedings of 2015 Asian Conference on Intelligent Information and Database Systems. 2015: 127-136.
[23]	艾科, 马国帅, 杨凯凯, 等. 一种基于集成学习的科研合作者潜力预测分类方法[J]. 计算机研究与发展, 2019,56(7):1383-1395.
[23]	( Ai Ke, Ma Guoshuai, Yang Kaikai, et al. A Classification Method of Scientific Collaborator Potential Prediction Based on Ensemble Learning[J]. Journal of Computer Research and Development, 2019,56(7):1383-1395.)
[24]	余传明, 林奥琛, 钟韵辞, 等. 基于网络表示学习的科研合作推荐研究[J]. 情报学报, 2019,38(5):500-511.
[24]	( Yu Chuanming, Lin Aochen, Zhong Yunci, et al. Scientific Collaboration Recommendation Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(5):500-511.)
[25]	Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4/5):993-1022.
[26]	Tu C C, Zhang Z Y, Liu Z Y, et al. TransNet: Translation-Based Network Representation Learning for Social Relation Extraction[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 2864-2870.
[27]	Schölkopf B, Platt J, Hofmann T. Greedy Layer-Wise Training of Deep Networks[A]//Advances in Neural Information Processing Systems[M]. MIT Press, 2007: 153-160.
[28]	孙丽娟. 科技论文作者署名排序与通讯作者[J]. 中国科技期刊研究, 2005,16(2):242-244.
[28]	( Sun Lijuan. Order of Authors and Corresponding Author in Scientific Papers[J]. Chinese Journal of Scientific and Technical Periodicals, 2005,16(2):242-244.)
[29]	Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. Journal of Machine Learning Research, 2014,15(1):1929-1958.
[30]	李志宇, 梁循, 周小平, 等. 一种大规模网络中基于节点结构特征映射的链接预测方法[J]. 计算机学报, 2016,39(10):1947-1964.
[30]	( Li Zhiyu, Liang Xun, Zhou Xiaoping, et al. A Link Prediction Method for Large-Scale Networks[J]. Chinese Journal of Computer, 2016,39(10):1947-1964.)

[1]	伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[2]	张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型^*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[3]	赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 ^*[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[4]	余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究^*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[5]	陈浩, 张梦毅, 程秀峰. *融合主题模型与决策树的跨地区专利合作关系发现与推荐^——以广东省和武汉市高校专利库为例**[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[6]	余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[7]	潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[8]	余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[9]	凌洪飞,欧石燕. 面向主题模型的主题自动语义标注研究综述 ^*[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[10]	聂维民,陈永洲,马静. 融合多粒度信息的文本向量表示模型 ^*[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[11]	曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 ^*[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[12]	余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类^*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[13]	席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究^*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[14]	张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析^*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[15]	刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究^*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.

Viewed

Full text

Abstract

Cited

Shared

Discussed