Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 56-67    DOI: 10.11925/infotech.2096-3467.2020.0531
Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research
Zhang Sifan1,Niu Zhendong1,2(),Lu Hao1,Zhu Yifan1,Wang Rongrong1
1School of Computer, Beijing Institute of Technology, Beijing 100081, China
2Beijing Institute of Technology Library, Beijing 100081, China
[Objective] This paper proposes a citation prediction model for scholarly articles, which could identify potential research hot spots and optimize journal editing.[Methods] First, we used graph convolution to extract literature features, which include keywords, authors, institutions, countries, and citations. Then, we used recurrent neural network and attention model to examine the time-series information of citations and other features.[Results] We evaluated the proposed model with transportation articles from core journals indexed by the Web of Science. Compared with the benchmark model, our new method’s maximum improvements on RMSE and MAE were 15.23% and 16.91%.[Limitations] At the pre-training stage, our model adopted multiple graph convolutions, which was very time consuming.[Conclusions] The proposed model, which fully integrates literature features, could effectively predict their citations.

Key wordsCitation Prediction      Graph Convolution      Feature Cross     
Received: 08 June 2020      Published: 14 October 2020
Corresponding Authors: Niu Zhendong

Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong. Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research. Data Analysis and Knowledge Discovery, 2020, 4(9): 56-67.

Co-occurrence Data of Keywords and Authors
分类 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
K 769 870 917 1 009 1 088 624 703 528 349 453 325
A 646 677 832 965 1 040 1 144 1 314 561 731 740 777
I 1 382 1 540 1 616 1 768 1 711 714 776 877 474 422 355
C 66 62 59 59 65 68 69 75 76 80 79
Network Nodes of Keywords, Authors, Institutions and Countries
数据 标签
2008年-2012年 K/A/I/C/count 2013年 count
2009年-2013年 K/A/I/C/count 2014年 count
2010年-2014年 K/A/I/C/count 2015年 count
2011年-2015年 K/A/I/C/count 2016年 count
2012年-2016年 K/A/I/C/count 2017年 count
Citation Prediction Data and Labels
The Architecture of Model
模型 RMSE MAE R-Squared
AVR 326.27 340.40 0.731 5
GMM 293.25 312.73 0.805 3
NNCP 279.42 282.36 0.851 5
本文模型 276.58 282.73 0.867 9
Experimental Results of Different Models
Changes of Various Indicators Caused by Different Vector Dimensions
模型修改 RMSE MAE R-Squared
去除交叉网络 325.74 340.25 0.750 9
替换GRU层 307.24 320.54 0.816 3
去除注意力层 289.53 302.76 0.842 4
本文方法 276.58 282.73 0.867 9
Comparison of Different Modules
