Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 56-67    DOI: 10.11925/infotech.2096-3467.2020.0531
Current Issue | Archive | Adv Search |
Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research
Zhang Sifan1,Niu Zhendong1,2(),Lu Hao1,Zhu Yifan1,Wang Rongrong1
1School of Computer, Beijing Institute of Technology, Beijing 100081, China
2Beijing Institute of Technology Library, Beijing 100081, China
Download: PDF (2727 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a citation prediction model for scholarly articles, which could identify potential research hot spots and optimize journal editing.[Methods] First, we used graph convolution to extract literature features, which include keywords, authors, institutions, countries, and citations. Then, we used recurrent neural network and attention model to examine the time-series information of citations and other features.[Results] We evaluated the proposed model with transportation articles from core journals indexed by the Web of Science. Compared with the benchmark model, our new method’s maximum improvements on RMSE and MAE were 15.23% and 16.91%.[Limitations] At the pre-training stage, our model adopted multiple graph convolutions, which was very time consuming.[Conclusions] The proposed model, which fully integrates literature features, could effectively predict their citations.

Key wordsCitation Prediction      Graph Convolution      Feature Cross     
Received: 08 June 2020      Published: 14 October 2020
ZTFLH:  TP393  
Corresponding Authors: Niu Zhendong     E-mail: zniu@bit.edu.cn

Cite this article:

Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong. Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research. Data Analysis and Knowledge Discovery, 2020, 4(9): 56-67.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0531     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I9/56

Co-occurrence Data of Keywords and Authors
分类 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
K 769 870 917 1 009 1 088 624 703 528 349 453 325
A 646 677 832 965 1 040 1 144 1 314 561 731 740 777
I 1 382 1 540 1 616 1 768 1 711 714 776 877 474 422 355
C 66 62 59 59 65 68 69 75 76 80 79
Network Nodes of Keywords, Authors, Institutions and Countries
数据 标签
2008年-2012年 K/A/I/C/count 2013年 count
2009年-2013年 K/A/I/C/count 2014年 count
2010年-2014年 K/A/I/C/count 2015年 count
2011年-2015年 K/A/I/C/count 2016年 count
2012年-2016年 K/A/I/C/count 2017年 count
Citation Prediction Data and Labels
The Architecture of Model
模型 RMSE MAE R-Squared
AVR 326.27 340.40 0.731 5
GMM 293.25 312.73 0.805 3
NNCP 279.42 282.36 0.851 5
本文模型 276.58 282.73 0.867 9
Experimental Results of Different Models
Changes of Various Indicators Caused by Different Vector Dimensions
模型修改 RMSE MAE R-Squared
去除交叉网络 325.74 340.25 0.750 9
替换GRU层 307.24 320.54 0.816 3
去除注意力层 289.53 302.76 0.842 4
本文方法 276.58 282.73 0.867 9
Comparison of Different Modules
[1] Abrishami A, Aliakbary S. Predicting Citation Counts Based on Deep Neural Network Learning Techniques[J]. Journal of Informetrics, 2019,13(2):485-499.
doi: 10.1016/j.joi.2019.02.011
[2] Garfield E. The Use of Journal Impact Factors and Citation Analysis for Evaluation of Science[C] //Proceedings of the 41st Annual Meeting of the Council of Biology Editors, Salt Lake City, UT. 1998.
[3] Hirsch J E. An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005,102(46):16569-16572.
pmid: 16275915
[4] Garfield E. The History and Meaning of the Journal Impact Factor[J]. JAMA: The Journal of the American Medical Association, 2006,295(1):90-93.
doi: 10.1001/jama.295.1.90 pmid: 16391221
[5] Abramo G, D’Angelo C A, Felici G. Predicting Publication Long-Term Impact Through a Combination of Early Citations and Journal Impact Factor[J]. Journal of Informetrics, 2019,13(1):32-49.
doi: 10.1016/j.joi.2018.11.003
[6] Kosteas V D. Predicting Long-Run Citation Counts for Articles in Top Economics Journals[J]. Scientometrics, 2018,115(3):1395-1412.
doi: 10.1007/s11192-018-2703-0
[7] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv:1409.0473.
[8] Fiala D, Tutoky G. PageRank-Based Prediction of Award-Winning Researchers and the Impact of Citations[J]. Journal of Informetrics, 2017,11(4):1044-1068.
doi: 10.1016/j.joi.2017.09.008
[9] Bütün E, Kaya M, Alhajj R. A Supervised Learning Method for Prediction Citation Count of Scientists in Citation Networks[C] // Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2017: 952-958.
[10] Zhang Z, Cui P, Zhu W. Deep Learning on Graphs: A Survey[OL]. arXiv Preprint, arXiv: 1812.04202.
[11] Lü Y, Duan Y, Kang W, et al. Traffic Flow Prediction with Big Data: A Deep Learning Approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2015,16(2):865-873.
[12] Zhang Q, Yang L T, Chen Z, et al. A Survey on Deep Learning for Big Data[J]. Information Fusion, 2018,42:146-157.
doi: 10.1016/j.inffus.2017.10.006
[13] Cho H, Choi I S. Deep Learning Algorithm of Graph Convolutional Network: A Case of Aqueous Solubility Problems[J]. Bulletin of the Korean Chemical Society, 2019,40(6):485-486.
doi: 10.1002/bkcs.2019.40.issue-6
[14] Goyal P, Ferrara E. Graph Embedding Techniques, Applications, and Performance: A Survey[J]. Knowledge Based Systems, 2018,151:78-94.
doi: 10.1016/j.knosys.2018.03.022
[15] Wang S, Zhu W. Sparse Graph Embedding Unsupervised Feature Selection[J]. IEEE Transactions on Systems, Man, Cybernetics: Systems, 2016,48(3):329-341.
doi: 10.1109/TSMC.2016.2605132
[16] Luo X, Zhang L, Li F, et al. Graph Embedding-Based Ensemble Learning for Image Clustering[C] // Proceedings of the 24th International Conference on Pattern Recognition. IEEE, 2018: 213-218.
[17] Feng J, Huang M, Yang Y, et al. GAKE: Graph Aware Knowledge Embedding[C] //Proceedings of the 26th International Conference on Computational Linguistics. 2016: 641-651.
[18] Nie F, Zhu W, Li X. Unsupervised Large Graph Embedding[C] // Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 2422-2428.
[19] Acuna D E, Allesina S, Kording K P. Predicting Scientific Success[J]. Nature, 2012,489(7415):201-202.
doi: 10.1038/489201a pmid: 22972278
[20] Sun L, Yin Y. Discovering Themes and Trends in Transportation Research Using Topic Modeling[J]. Transportation Research Part C: Emerging Technologies, 2017,77:49-66.
doi: 10.1016/j.trc.2017.01.013
[21] Li L, Li X, Li Z, et al. A Bibliographic Analysis of the IEEE Transactions on Intelligent Transportation Systems Literature[J]. IEEE Transactions on Intelligent Transportation Systems, 2010,11(2):251-255.
doi: 10.1109/TITS.2010.2049890
[22] Xu X, Wang W, Liu Y, et al. A Bibliographic Analysis and Collaboration Patterns of IEEE Transactions on Intelligent Transportation Systems Between 2000 and 2015[J]. IEEE Transactions on Intelligent Transportation Systems, 2016,17(8):2238-2247.
doi: 10.1109/TITS.2016.2519038
[23] Zhao X, Wang T, Lu H, et al. A Bibliographic and Coauthorship Analysis of IEEE T-ITS Literature Between 2014 and 2016[J]. IEEE Transactions on Intelligent Transportation Systems, 2018,19(9):2751-2761.
doi: 10.1109/TITS.2017.2767062
[24] Cobo M J, Chiclana F, Collop A, et al. A Bibliometric Analysis of the Intelligent Transportation Systems Research Based on Science Mapping[J]. IEEE Transactions on Intelligent Transportation Systems, 2014,15(2):901-908.
doi: 10.1109/TITS.2013.2284756
[25] Tian X, Geng Y, Zhong S, et al. A Bibliometric Analysis on Trends and Characters of Carbon Emissions from Transport Sector[J]. Transportation Research Part D: Transport and Environment, 2018,59:1-10.
doi: 10.1016/j.trd.2017.12.009
[26] Das S, Dixon K, Sun X, et al. Trends in Transportation Research: Exploring Content Analysis in Topics[J]. Transportation Research Record, 2017,2614:27-38.
doi: 10.3141/2614-04
[27] Sarigöl E, Pfitzner R, Scholtes I, et al. Predicting Scientific Success Based on Coauthorship Networks[J]. EPJ Data Science, 2014,3(1):9-20.
doi: 10.1140/epjds/s13688-014-0009-x
[28] Pobiedina N, Ichise R. Citation Count Prediction as a Link Prediction Problem[J]. Applied Intelligence, 2016,44(2):252-268.
doi: 10.1007/s10489-015-0657-y
[29] Daud A, Ahmed W, Amjad T, et al. Who Will Cite You Back? Re-Ciprocal Link Prediction in Citation Networks[J]. Library Hi Tech, 2017,35(4):509-520.
doi: 10.1108/LHT-02-2017-0044
[30] Klimek P, Jovanovic A S, Egloff R, et al. Successful Fish Go with the Flow: Citation Impact Prediction Based on Centrality Measures for Term-Document Networks[J]. Scientometrics, 2016,107(3):1265-1282.
doi: 10.1007/s11192-016-1926-1
[31] Dong Y, Johnson R A, Chawla N V. Can Scientific Impact be Predicted?[J]. IEEE Transactions on Big Data 2016,2(1):18-30.
doi: 10.1109/TBDATA.2016.2521657
[32] Mazloumian A. Predicting Scholars’ Scientific Impact[J]. PLoS ONE, 2012,7(11):e49246.
doi: 10.1371/journal.pone.0049246 pmid: 23185311
[33] Bornmann L, Leydesdorff L, Wang J. How to Improve the Prediction Based on Citation Impact Percentiles for Years Shortly After the Publication Date?[J]. Journal of Informetrics, 2014,8(1):175-180.
doi: 10.1016/j.joi.2013.11.005
[34] Lamb C T, Gilbert S L, Ford A T. Tweet Success? Scientific Communication Correlates with Increased Citations in Ecology and Conservation[J]. PeerJ, 2018,6:e4564.
doi: 10.7717/peerj.4564 pmid: 29666750
[35] Cao X, Chen Y, Liu K J R. A Data Analytic Approach to Quantifying Scientific Impact[J]. Journal of Informetrics, 2016,10(2):471-484.
doi: 10.1016/j.joi.2016.02.006
[36] Girvan M, Newman M E J. Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002,99(12):7821-7826.
doi: 10.1073/pnas.122653799 pmid: 12060727
[37] Wang W, Lu Y. Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model[J]. IOP Conference Series: Materials Science and Engineering, 2018,324(1):012049.
doi: 10.1088/1757-899X/324/1/012049
[38] Gelman A, Goodrich B, Gabry J, et al. R-squared for Bayesian Regression Models[J]. The American Statistician, 2019,73(3):307-309.
doi: 10.1080/00031305.2018.1549100
[1] Zhang Chunjin, Guo Shenghui, Ji Shujuan, Yang Wei, Yi Lei . The Group recommendation algorithms based on implicit representation learning of multi-attribute ratings [J]. 数据分析与知识发现, 0, (): 1-.
[2] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[3] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[4] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[5] Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[6] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[7] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[8] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[9] Chenglei Qin, Chengzhi Zhang. Using Hierarchical Attention Network Model to Recognize Structure Functions of Academic Articles [J]. 数据分析与知识发现, 0, (): 1-.
[10] Shen Zhihong,Zhao Zihao,Wang Haibo. Big Data Technology Stack Shifting: From SQL Centric to Graph Centric[J]. 数据分析与知识发现, 2020, 4(7): 50-65.
[11] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[12] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[13] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[15] Yu Fengchang,Lu Wei. Constructing Data Set for Location Annotations of Academic Literature Figures and Tables[J]. 数据分析与知识发现, 2020, 4(6): 35-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn