Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (9): 56-67     https://doi.org/10.11925/infotech.2096-3467.2020.0531
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例*
张思凡1,牛振东1,2(),陆浩1,朱一凡1,王荣荣1
1北京理工大学计算机学院 北京 100081
2北京理工大学图书馆 北京 100081
Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research
Zhang Sifan1,Niu Zhendong1,2(),Lu Hao1,Zhu Yifan1,Wang Rongrong1
1School of Computer, Beijing Institute of Technology, Beijing 100081, China
2Beijing Institute of Technology Library, Beijing 100081, China
全文: PDF (2727 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 提出一种文献被引量预测模型,用于发现潜在研究热点、优化改进刊物采编工作。【方法】 综合考虑文献的关键词、作者、机构、国家、被引量等相关因素,利用图卷积进行特征提取,利用循环神经网络与注意力机制对被引量的时序信息与重要文献特征进行挖掘。【结果】 利用Web of Science核心集中交通运输领域的文献对模型进行验证,与基准模型相比,在RMSE、MAE等各项指标上最大提升幅度达15.23%与16.91%。【局限】 在所提模型的预训练步骤中,进行多次图卷积,使得算法的时间复杂度较高。【结论】 本文所提模型将文献各项特征充分融合,极大提高了预测模型的性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张思凡
牛振东
陆浩
朱一凡
王荣荣
关键词 被引量预测图卷积特征交叉    
Abstract

[Objective] This paper proposes a citation prediction model for scholarly articles, which could identify potential research hot spots and optimize journal editing.[Methods] First, we used graph convolution to extract literature features, which include keywords, authors, institutions, countries, and citations. Then, we used recurrent neural network and attention model to examine the time-series information of citations and other features.[Results] We evaluated the proposed model with transportation articles from core journals indexed by the Web of Science. Compared with the benchmark model, our new method’s maximum improvements on RMSE and MAE were 15.23% and 16.91%.[Limitations] At the pre-training stage, our model adopted multiple graph convolutions, which was very time consuming.[Conclusions] The proposed model, which fully integrates literature features, could effectively predict their citations.

Key wordsCitation Prediction    Graph Convolution    Feature Cross
收稿日期: 2020-06-08      出版日期: 2020-10-14
ZTFLH:  TP393  
基金资助:*本文系国家重点研发计划基金项目“专业内容知识聚合服务技术研发与创新服务示范”的研究成果之一(2019YFB1406302)
通讯作者: 牛振东     E-mail: zniu@bit.edu.cn
引用本文:   
张思凡,牛振东,陆浩,朱一凡,王荣荣. 基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例*[J]. 数据分析与知识发现, 2020, 4(9): 56-67.
Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong. Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research. Data Analysis and Knowledge Discovery, 2020, 4(9): 56-67.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0531      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I9/56
Fig.1  关键词与作者共现关系数据
分类 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
K 769 870 917 1 009 1 088 624 703 528 349 453 325
A 646 677 832 965 1 040 1 144 1 314 561 731 740 777
I 1 382 1 540 1 616 1 768 1 711 714 776 877 474 422 355
C 66 62 59 59 65 68 69 75 76 80 79
Table 1  关键词、作者、机构、国家网络中的节点数量
数据 标签
2008年-2012年 K/A/I/C/count 2013年 count
2009年-2013年 K/A/I/C/count 2014年 count
2010年-2014年 K/A/I/C/count 2015年 count
2011年-2015年 K/A/I/C/count 2016年 count
2012年-2016年 K/A/I/C/count 2017年 count
Table 2  被引量预测数据与标签
Fig.2  预测模型整体架构
模型 RMSE MAE R-Squared
AVR 326.27 340.40 0.731 5
GMM 293.25 312.73 0.805 3
NNCP 279.42 282.36 0.851 5
本文模型 276.58 282.73 0.867 9
Table 3  不同模型实验结果对比
Fig.3  各项评估指标随向量维度的变化
模型修改 RMSE MAE R-Squared
去除交叉网络 325.74 340.25 0.750 9
替换GRU层 307.24 320.54 0.816 3
去除注意力层 289.53 302.76 0.842 4
本文方法 276.58 282.73 0.867 9
Table 4  不同模块作用对比
[1] Abrishami A, Aliakbary S. Predicting Citation Counts Based on Deep Neural Network Learning Techniques[J]. Journal of Informetrics, 2019,13(2):485-499.
doi: 10.1016/j.joi.2019.02.011
[2] Garfield E. The Use of Journal Impact Factors and Citation Analysis for Evaluation of Science[C] //Proceedings of the 41st Annual Meeting of the Council of Biology Editors, Salt Lake City, UT. 1998.
[3] Hirsch J E. An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005,102(46):16569-16572.
pmid: 16275915
[4] Garfield E. The History and Meaning of the Journal Impact Factor[J]. JAMA: The Journal of the American Medical Association, 2006,295(1):90-93.
doi: 10.1001/jama.295.1.90 pmid: 16391221
[5] Abramo G, D’Angelo C A, Felici G. Predicting Publication Long-Term Impact Through a Combination of Early Citations and Journal Impact Factor[J]. Journal of Informetrics, 2019,13(1):32-49.
doi: 10.1016/j.joi.2018.11.003
[6] Kosteas V D. Predicting Long-Run Citation Counts for Articles in Top Economics Journals[J]. Scientometrics, 2018,115(3):1395-1412.
doi: 10.1007/s11192-018-2703-0
[7] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv:1409.0473.
[8] Fiala D, Tutoky G. PageRank-Based Prediction of Award-Winning Researchers and the Impact of Citations[J]. Journal of Informetrics, 2017,11(4):1044-1068.
doi: 10.1016/j.joi.2017.09.008
[9] Bütün E, Kaya M, Alhajj R. A Supervised Learning Method for Prediction Citation Count of Scientists in Citation Networks[C] // Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2017: 952-958.
[10] Zhang Z, Cui P, Zhu W. Deep Learning on Graphs: A Survey[OL]. arXiv Preprint, arXiv: 1812.04202.
[11] Lü Y, Duan Y, Kang W, et al. Traffic Flow Prediction with Big Data: A Deep Learning Approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2015,16(2):865-873.
[12] Zhang Q, Yang L T, Chen Z, et al. A Survey on Deep Learning for Big Data[J]. Information Fusion, 2018,42:146-157.
doi: 10.1016/j.inffus.2017.10.006
[13] Cho H, Choi I S. Deep Learning Algorithm of Graph Convolutional Network: A Case of Aqueous Solubility Problems[J]. Bulletin of the Korean Chemical Society, 2019,40(6):485-486.
doi: 10.1002/bkcs.2019.40.issue-6
[14] Goyal P, Ferrara E. Graph Embedding Techniques, Applications, and Performance: A Survey[J]. Knowledge Based Systems, 2018,151:78-94.
doi: 10.1016/j.knosys.2018.03.022
[15] Wang S, Zhu W. Sparse Graph Embedding Unsupervised Feature Selection[J]. IEEE Transactions on Systems, Man, Cybernetics: Systems, 2016,48(3):329-341.
doi: 10.1109/TSMC.2016.2605132
[16] Luo X, Zhang L, Li F, et al. Graph Embedding-Based Ensemble Learning for Image Clustering[C] // Proceedings of the 24th International Conference on Pattern Recognition. IEEE, 2018: 213-218.
[17] Feng J, Huang M, Yang Y, et al. GAKE: Graph Aware Knowledge Embedding[C] //Proceedings of the 26th International Conference on Computational Linguistics. 2016: 641-651.
[18] Nie F, Zhu W, Li X. Unsupervised Large Graph Embedding[C] // Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 2422-2428.
[19] Acuna D E, Allesina S, Kording K P. Predicting Scientific Success[J]. Nature, 2012,489(7415):201-202.
doi: 10.1038/489201a pmid: 22972278
[20] Sun L, Yin Y. Discovering Themes and Trends in Transportation Research Using Topic Modeling[J]. Transportation Research Part C: Emerging Technologies, 2017,77:49-66.
doi: 10.1016/j.trc.2017.01.013
[21] Li L, Li X, Li Z, et al. A Bibliographic Analysis of the IEEE Transactions on Intelligent Transportation Systems Literature[J]. IEEE Transactions on Intelligent Transportation Systems, 2010,11(2):251-255.
doi: 10.1109/TITS.2010.2049890
[22] Xu X, Wang W, Liu Y, et al. A Bibliographic Analysis and Collaboration Patterns of IEEE Transactions on Intelligent Transportation Systems Between 2000 and 2015[J]. IEEE Transactions on Intelligent Transportation Systems, 2016,17(8):2238-2247.
doi: 10.1109/TITS.2016.2519038
[23] Zhao X, Wang T, Lu H, et al. A Bibliographic and Coauthorship Analysis of IEEE T-ITS Literature Between 2014 and 2016[J]. IEEE Transactions on Intelligent Transportation Systems, 2018,19(9):2751-2761.
doi: 10.1109/TITS.2017.2767062
[24] Cobo M J, Chiclana F, Collop A, et al. A Bibliometric Analysis of the Intelligent Transportation Systems Research Based on Science Mapping[J]. IEEE Transactions on Intelligent Transportation Systems, 2014,15(2):901-908.
doi: 10.1109/TITS.2013.2284756
[25] Tian X, Geng Y, Zhong S, et al. A Bibliometric Analysis on Trends and Characters of Carbon Emissions from Transport Sector[J]. Transportation Research Part D: Transport and Environment, 2018,59:1-10.
doi: 10.1016/j.trd.2017.12.009
[26] Das S, Dixon K, Sun X, et al. Trends in Transportation Research: Exploring Content Analysis in Topics[J]. Transportation Research Record, 2017,2614:27-38.
doi: 10.3141/2614-04
[27] Sarigöl E, Pfitzner R, Scholtes I, et al. Predicting Scientific Success Based on Coauthorship Networks[J]. EPJ Data Science, 2014,3(1):9-20.
doi: 10.1140/epjds/s13688-014-0009-x
[28] Pobiedina N, Ichise R. Citation Count Prediction as a Link Prediction Problem[J]. Applied Intelligence, 2016,44(2):252-268.
doi: 10.1007/s10489-015-0657-y
[29] Daud A, Ahmed W, Amjad T, et al. Who Will Cite You Back? Re-Ciprocal Link Prediction in Citation Networks[J]. Library Hi Tech, 2017,35(4):509-520.
doi: 10.1108/LHT-02-2017-0044
[30] Klimek P, Jovanovic A S, Egloff R, et al. Successful Fish Go with the Flow: Citation Impact Prediction Based on Centrality Measures for Term-Document Networks[J]. Scientometrics, 2016,107(3):1265-1282.
doi: 10.1007/s11192-016-1926-1
[31] Dong Y, Johnson R A, Chawla N V. Can Scientific Impact be Predicted?[J]. IEEE Transactions on Big Data 2016,2(1):18-30.
doi: 10.1109/TBDATA.2016.2521657
[32] Mazloumian A. Predicting Scholars’ Scientific Impact[J]. PLoS ONE, 2012,7(11):e49246.
doi: 10.1371/journal.pone.0049246 pmid: 23185311
[33] Bornmann L, Leydesdorff L, Wang J. How to Improve the Prediction Based on Citation Impact Percentiles for Years Shortly After the Publication Date?[J]. Journal of Informetrics, 2014,8(1):175-180.
doi: 10.1016/j.joi.2013.11.005
[34] Lamb C T, Gilbert S L, Ford A T. Tweet Success? Scientific Communication Correlates with Increased Citations in Ecology and Conservation[J]. PeerJ, 2018,6:e4564.
doi: 10.7717/peerj.4564 pmid: 29666750
[35] Cao X, Chen Y, Liu K J R. A Data Analytic Approach to Quantifying Scientific Impact[J]. Journal of Informetrics, 2016,10(2):471-484.
doi: 10.1016/j.joi.2016.02.006
[36] Girvan M, Newman M E J. Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002,99(12):7821-7826.
doi: 10.1073/pnas.122653799 pmid: 12060727
[37] Wang W, Lu Y. Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model[J]. IOP Conference Series: Materials Science and Engineering, 2018,324(1):012049.
doi: 10.1088/1757-899X/324/1/012049
[38] Gelman A, Goodrich B, Gabry J, et al. R-squared for Bayesian Regression Models[J]. The American Statistician, 2019,73(3):307-309.
doi: 10.1080/00031305.2018.1549100
[1] 张纯金, 郭盛辉, 纪淑娟, 杨伟, 伊磊. 基于多属性评分隐表征学习的群组推荐算法 [J]. 数据分析与知识发现, 0, (): 1-.
[2] 张思凡, 牛振东, 陆浩, 朱一凡, 王荣荣. 基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例 [J]. 数据分析与知识发现, 0, (): 1-.
[3] 曾桢,李纲,毛进,陈璟浩. 区域公共安全数据治理与业务领域本体研究*[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[4] 温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[5] 黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[6] 沈喆, 王毅, 姚毅凡, 成颖. 面向学术文献的作者名消歧方法研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[7] 刘倩, 李晨亮. 基于社交媒体的话题演变研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[8] 盛嘉祺, 许鑫. 融合主题相似度与合著网络的学者标签扩展方法研究*[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[9] 秦成磊, 章成志. 基于层次注意力网络模型的学术文本结构功能识别 [J]. 数据分析与知识发现, 0, (): 1-.
[10] 沈志宏,赵子豪,王海波. 以图为中心的新型大数据技术栈研究 *[J]. 数据分析与知识发现, 2020, 4(7): 50-65.
[11] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[12] 徐以聪,田学东,李新福,杨芳,史青宣. 基于犹豫模糊权重的数学表达式检索 *[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[13] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14] 刘伟江,魏海,运天鹤. 基于卷积神经网络的客户信用评估模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[15] 于丰畅,陆伟. 一种学术文献图表位置标注数据集构建方法[J]. 数据分析与知识发现, 2020, 4(6): 35-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn