Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (3): 88-100     https://doi.org/10.11925/infotech.2096-3467.2020.0515
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种融合表示学习与主题表征的作者合作预测模型*
张鑫1,文奕1,2(),许海云1,2
1中国科学院成都文献情报中心 成都 610041
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration
Zhang Xin1,Wen Yi1,2(),Xu Haiyun1,2
1Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (4073 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 提出融合网络表示学习和作者主题模型的科研合作预测方法。【方法】 基于经典网络表示学习方法计算得到作者节点的嵌入式向量表示,采用余弦相似度计算作者的结构相似性;基于作者主题模型计算得到作者的主题向量表征,采用Hellinger距离计算作者主题相似性。再将两种相似性方法进行线性特征融合,采用贝叶斯优化方法进行融合超参数选择。【结果】 用NIPS论文数据进行实证研究,经过贝叶斯参数选择后效果最好的node2vec+ATM模型,预测的AUC值达到0.927 1,比基准模型提高0.185 6,也优于现有的一些融合外部信息的表示学习模型。【局限】 仅考虑作者文章内容信息,没有将作者单位、地理位置等更多属性信息融入模型。【结论】 本文提出的融合模型考虑了结构与内容特征,能够得到比简单网络表示学习更好的合作预测效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张鑫
文奕
许海云
关键词 网络表示学习作者主题模型模型融合合作预测    
Abstract

[Objective] This paper proposes a method to predict scientific collaboration based on the network representation learning and author topic model. [Methods] First, we established the embedding vector representation of authors with the help of network representation learning method. Then, we calculated the structural similarity of authors with cosine similarity. Third, we obtained the topic representation of authors with the author-topic model, and computed the authors’ topic similarity with Hellinger distance. Finally, we linearly merged the two similarity measures, and used the Bayesian optimization method for the hyperparameter selection. [Results] We examined the proposed method with the NIPS datasets and found the best node2vec+ATM model after Bayesian parameter selection. It had an AUC value of 0.9271, which was 0.1856 higher than that of the benchmark model. [Limitations] We did not include the author’s institution and geographic location to the model. [Conclusions] The proposed model utilizes structure and content features to improve the prediction results of network representation learning.

Key wordsNetwork Representation Learning    Author Topic Model    Model Fusion    Scientific Collaboration Prediction
收稿日期: 2020-06-03      出版日期: 2020-11-24
ZTFLH:  G350  
基金资助:*国家自然科学基金项目(71704170);中国科学院信息化专项(XXH13506-203);中国科学院青年创新促进会项目(2016159)
通讯作者: 文奕     E-mail: wenyi@clas.ac.cn
引用本文:   
张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration. Data Analysis and Knowledge Discovery, 2021, 5(3): 88-100.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0515      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I3/88
Fig.1  研究方法和流程
Fig.2  DeepWalk算法流程[15]
Fig.3  SDNE模型[19]
Fig.4  主题模型的概率图表示
基准算法 参数设置
LE 向量维数为128
Graph Factorization 向量维数为128
DeepWalk 向量维数为128,每个节点开始的随机游走数目为10,每个节点开始的随机游走步长为80,skip-gram模型窗口大小为10
LINE 向量维数为128,取混合一阶近似和二阶近似,负采样数目为5
node2vec 向量维数为128,游走参数p=0.25,q=0.25,每个节点开始的随机游走数目为10,每个节点开始的随机游走步长为80,skip-gram模型窗口大小为10
SDNE 向量维数为128,隐层神经元个数为1 000,控制一阶近似的超参数α=10-6,构造矩阵B的超参数β=5,自编码器中的L1损失参数μ1=10-5,L2损失参数μ2 =10-4,批次大小取200,学习率为0.01
TADW 向量维数为128
CANE 向量维数为128
Table 1  基准表示学习算法及参数设置
Fig.5  node2vec表示学习结果
主题 代表词
Topic1
模型构建
model human feature neuron spike task architecture response neural brain study input visual region population mechanism activity natural level implement
Topic2
模型推断
model inference process Bayesian structure variable gaussian approach latent distribution probabilistic variational tree datum method likelihood generative posterior Markov graphical
Topic3
数据
datum show result large method well scale propose number set achieve high performance dataset paper algorithm parameter require order experiment
Topic4
模式识别
network miss neural image learn deep object train representation layer convolutional recognition information recurrent code different learning noisy model visual
Topic5
模型计算
algorithm problem function method optimization result gradient convex show bound stochastic loss convergence study guarantee learning online set regret rate
Topic6
分类聚类特征提取
learn feature datum method kernel task approach propose learning label base classification graph problem art clustering dataset cluster metric class
Topic7
采样降维
sample sparse distribution matrix estimate estimation analysis problem estimator point statistical dimensional non provide show high regression low error consider
Topic8
时序挖掘
time state dynamic learn system decision policy optimal search action approach control problem base information reinforcement space user reward value
Table 2  作者-主题模型抽取出来的研究主题
Fig.6  ATM计算得到的作者分布
算法 正确率 召回率 AUC值
原模型 X+ATM 原模型 X+ATM 原模型 连接 X+ATM
ATM 0.003 0 1.000 0 0.500 0
LE 0.966 6 0.940 1 0.449 9 0.594 8 0.709 1 0.628 4 0.768 1
GF 0.996 0 0.993 0 0.008 3 0.046 9 0.505 7 0.608 2 0.521 4
DeepWalk 0.992 2 0.972 3 0.223 8 0.537 4 0.609 2 0.569 3 0.755 5
LINE 0.199 7 0.059 9 0.961 3 0.997 5 0.579 4 0.603 3 0.527 3
node2vec 0.979 5 0.878 8 0.502 1 0.811 4 0.741 5 0.735 0 0.845 2
SDNE 0.543 4 0.432 6 0.561 3 0.726 5 0.552 3 0.552 3 0.579 1
TADW 0.977 6 0.375 1 0.677 3
CANE 0.992 5 0.328 1 0.661 4
Table 3  融合模型的计算结果
迭代次数 α AUC值 迭代次数 α AUC值
1 0.417 0 0.862 8 8 0.560 2 0.922 1
2 0.720 3 0.881 7 9 0.592 7 0.922 9
3 0.000 1 0.500 0 10 0.581 3 0.927 1
4 0.302 3 0.777 6 11 0.580 5 0.926 8
5 1.000 0 0.809 7 12 0.580 2 0.926 6
6 0.577 4 0.926 0 13 0.578 8 0.926 3
7 0.599 2 0.924 2 14 0.578 4 0.926 2
Table 4  node2vec+ATM模型贝叶斯优化参数选择过程
Fig.7  贝叶斯优化结果图
模型 最优融合参数 AUC值
LE+ATM 0.427 4 0.830 9
Graph Factorization+ATM 0.219 6 0.676 6
DeepWalk+ATM 0.344 9 0.835 4
LINE+ATM 0.999 9 0.577 5
node2vec+ATM 0.581 3 0.927 1
SDNE+ATM 0.359 2 0.555 5
Table 5  融合模型的最优融合参数值以及相应的AUC值
合作者 Yoshua Bengio Geoffrey E.Hinton Yann LeCun
1 Pascal Vincent Sam T. Roweis John S. Denker
2 David S. Touretzky Christopher K. I. Williams Rob Fergus
3 Samy Bengio Richard S. Zemel Corinna Cortes
4 Yann LeCun Max Welling Vladimir Vapnik
5 Ruslan R. Salakhutdinov Ilya Sutskever Yoshua Bengio
6 Mitsuo Kawato Ruslan R. Salakhutdinov Alex Waibel
7 John S. Denker Brendan J. Frey Bartlett W. Mel
8 Ilya Sutskever Peter Dayan Andrew Zisserman
9 Geoffrey E. Hinton Yee W. The Christof Koch
10 Yoram Singer Lawrence K. Saul Tomaso Poggio
Table 6  作者合作预测结果
[1] Newman M E J. Coauthorship Networks and Patterns of Scientific Collaboration[J]. Proceedings of the National Academy of the United States of America, 2004,101(S1):5200-5205.
[2] Liben‐Nowell D, Kleinberg J. The Link‐Prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2007,58(7):1019-1031.
[3] 吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010,39(5):651-661.
[3] ( Lv Linyuan. Link Prediction in Complex Networks[J]. Journal of University of Electronic Science and Technology of China, 2010,39(5):651-661.)
[4] Guns R, Rousseau R. Recommending Research Collaborations Using Link Prediction and Random Forest Classifiers[J]. Scientometrics, 2014,101(2):1461-1473.
[5] Yan E, Guns R. Predicting and Recommending Collaborations: An Author-, Institution-, and Country-Level Analysis[J]. Journal of Informetrics, 2014,8(2):295-309.
[6] 汪志兵, 韩文民, 孙竹梅, 等. 基于网络拓扑结构与节点属性特征融合的科研合作预测研究[J]. 情报理论与实践, 2019,42(8):116-120, 109.
[6] ( Wang Zhibing, Han Wenmin, Sun Zhumei, et al. Research on Scientific Collaboration Prediction Based on the Combination of Network Topology and Node Attributes[J]. Information Studies: Theory & Application, 2019,42(8):116-120, 109.)
[7] 单嵩岩, 吴振新. 面向作者消歧和合作预测领域的作者相似度算法述评[J]. 东北师大学报(自然科学版), 2019,51(2):71-80.
[7] ( Shan Songyan, Wu Zhenxin. Review on the Author Similarity Algorithm in the Field of Author Name Disambiguation and Research Collaboration Prediction[J]. Journal of Northeast Normal University(Natural Science Edition), 2019,51(2):71-80.)
[8] 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018,37(2):132-139.
[8] ( Zhang Jinzhu, Yu Wenqian, Liu Jingjie, et al. Predicting Research Collaborations Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(2):132-139.)
[9] 余传明, 林奥琛, 钟韵辞, 等. 基于网络表示学习的科研合作推荐研究[J]. 情报学报, 2019,38(5):500-511.
[9] ( Yu Chuanming, Lin Aochen, Zhong Yunci, et al. Scientific Collaboration Recommendation Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(5):500-511.)
[10] Balasubramanian M, Schwartz E L. The Isomap Algorithm and Topological Stability[J]. Science, 2002,295(5552):7.
pmid: 11778013
[11] Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000,290(5500):2323-2326.
[12] Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2002: 585-591.
[13] Chen M, Yang Q, Tang X O. Directed Graph Embedding[C]// Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007: 2707-2712.
[14] Ahmed A, Shervashidze N, Narayanamurthy S, et al. Distributed Large-Scale Natural Graph Factorization[C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 37-48.
[15] Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014: 701-710.
[16] Grover A, Leskovec J. node2vec: Scalable Feature Learning for Networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 855-864.
[17] Cao S S, Lu W, Xu Q K. GraRep: Learning Graph Representations with Global Structural Information[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 2015: 891-900.
[18] Tang J, Qu M, Wang M Z, et al. LINE: Large-Scale Information Network Embedding[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 1067-1077.
[19] Wang D X, Cui P, Zhu W W. Structural Deep Network Embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 1225-1234.
[20] Ou M D, Cui P, Pei J, et al. Asymmetric Transitivity Preserving Graph Embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 1105-1114.
[21] Kipf T N, Welling M. Variational Graph Auto-Encoders[OL]. arXiv Preprint, arXiv: 1611.07308,2016.
[22] Wang H W, Wang J, Wang J L, et al. GraphGAN: Graph Representation Learning with Generative Adversarial Nets[J]. IEEE Transactions on Knowledge and Data Engineering, DOI:10.1109/TKDE.2019.2961882.
doi: 10.1109/TKDE.2012.149 pmid: 24693210
[23] Yang C, Liu Z Y, Zhao D L, et al. Network Representation Learning with Rich Text Information[C]// Proceeding of the 24th International Conference on Artificial Intelligence. 2015: 2111-2117.
[24] Sun X F, Guo J, Ding X, et al. A General Framework for Content-Enhanced Network Representation Learning[OL]. arXiv Preprint, arXiv: 1610.02906,2016.
[25] Tu C C, Liu H, Liu Z Y, et al. CANE: Context-Aware Network Embedding for Relation Modeling[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1722-1731.
[26] Lerer A, Wu L, Shen J J, et al. PyTorch-BigGraph: A Large-scale Graph Embedding System[C]// Proceedings of the Conference on Systems and Machine Learning. 2019.
[27] Fey M, Lenssen J E. Fast Graph Representation Learning with PyTorch Geometric[OL]. arXiv Preprint, arXiv Preprint, arXiv: 1903.02428,2019.
[28] Zhu Z C, Xu S Z, Tang J, et al. GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding[C]// Proceedings of the World Wide Web Conference. ACM, 2019: 2494-2504.
[29] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[30] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Documents[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494.
[31] Snoek J, Larochelle H, Adams R P. Practical Bayesian Optimization of Machine Learning Algorithms[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 2951-2959.
[32] LeCun Y, Bengio Y, Hinton G. Deep Learning[J]. Nature, 2015,521(7553):436.
doi: 10.1038/nature14539 pmid: 26017442
[33] Zhang J, Dong Y X, Wang Y, et al. ProNE: Fast and Scalable Network Representation Learning[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI'19). 2019: 4278-4284
[34] Qiu J Z, Dong Y X, Ma H, et al. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization[C]// Proceedings of the World Wide Web Conference. ACM, 2019: 1509-1520.
[1] 余传明,钟韵辞,林奥琛,安璐. 基于网络表示学习的作者重名消歧研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[2] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[3] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[4] 伍杰华, 朱岸青. 混合拓扑因子的科研网络合作关系预测[J]. 现代图书情报技术, 2015, 31(4): 65-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn