Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 15-27    DOI: 10.11925/infotech.2096-3467.2020.0384
Author Name Disambiguation Techniques for Academic Literature: A Review
Shen Zhe1,Wang Yi1,Yao Yifan1,Cheng Ying1,2()
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Chinese Language and Literature, Shandong Normal University, Jinan 250014, China
Abstract: [Objective] This paper reviews research on author name disambiguation techniques for the academic literature, aiming to provide references for future studies. [Coverage] A total of 51 papers published between January 1, 2016 to March 28 , 2020 were retrieved from the Web of Science, Google Scholar, CNKI and Wanfang Database. [Methods] First, we explored findings from these papers based on the process of author name disambiguation. Then, we summarized techniques like feature extraction, feature representation, model training and prediction. Finally, we discussed common issues facing these research multi-dimensionally. [Results] Graph-based and probabilistic methods, as well as hybrid feature representation models improved the calculation of complicated network features. We need to optimize machine-learning models' efficiency and generalization ability to finish tasks with large databases and incremental disambiguation. Most research did not address issues like unbalanced training data, missing feature data, and authors using different names. [Limitations] Due to the differences in empirical data, we did not carry out quantitative comparison among different methods. [Conclusions] Our study proposed multi-source data fusion, user intervention, and pre-trained models to improve author name disambiguation.

Key wordsAuthor Name Disambiguation      Name Ambiguity      Same Name Disambiguation      Literature Database     
Received: 05 May 2020      Published: 05 June 2020
Cheng Ying

Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.

General Framework for AND
Frequency of Features Used in AND
