Abstract: [Objective] This paper reviews research on author name disambiguation techniques for the academic literature, aiming to provide references for future studies. [Coverage] A total of 51 papers published between January 1, 2016 to March 28 , 2020 were retrieved from the Web of Science, Google Scholar, CNKI and Wanfang Database. [Methods] First, we explored findings from these papers based on the process of author name disambiguation. Then, we summarized techniques like feature extraction, feature representation, model training and prediction. Finally, we discussed common issues facing these research multi-dimensionally. [Results] Graph-based and probabilistic methods, as well as hybrid feature representation models improved the calculation of complicated network features. We need to optimize machine-learning models' efficiency and generalization ability to finish tasks with large databases and incremental disambiguation. Most research did not address issues like unbalanced training data, missing feature data, and authors using different names. [Limitations] Due to the differences in empirical data, we did not carry out quantitative comparison among different methods. [Conclusions] Our study proposed multi-source data fusion, user intervention, and pre-trained models to improve author name disambiguation.
沈喆, 王毅, 姚毅凡, 成颖. 面向学术文献的作者名消歧方法研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.
( Shan Songyan, Wu Zhenxin. Review on the Author Similarity Algorithm in the Field of Author Name Disambiguation and Research Collaboration Prediction[J]. Journal of Northeast Normal University(Natural Science Edition), 2019,51(2):71-80.)
Delgado A D, Montalvo S, Martinez-Unanue R, et al. A Survey of Person Name Disambiguation on the Web[J]. IEEE Access, 2018,6:59496-59514.
( Ke Hao, Li Tian, Zhou Yue, et al. Author Name Disambiguation Using BP Neural Networks Under Missing Data[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(6):600-609.)
Zhang S Y, E X H, Pan T. A Multi-Level Author Name Disambiguation Algorithm[J]. IEEE Access, 2019,7:104250-104257.
( Shang Yuling, Cao Jianjun, Li Hongmei, et al. Co-author and Affiliate Based Name Disambiguation Approach[J]. Computer Science, 2018,45(11):220-225,260.)
Ding X, Zhang H, Guo X Y. An Unsupervised Framework for Author-paper Linking in Bibliographic Retrieval System[C]// Proceedings of the 14th International Conference on Semantics, Knowledge and Grids (SKG). 2018: 152-159.
Hazra R, Saha A, Deb S B, et al. An Efficient Technique for Author Name Disambiguation[C]// Proceedings of 2016 IEEE International Conference on Current Trends in Advanced Computing. 2016: 1-6.
( Liu Lin. Multi-strategy Combination Model for Scientific and Technological Talent Disambiguation[J]. Communications Technology, 2018,51(8):1836-1843.)
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[OL]. arXiv Preprint, arXiv: 1702.02287.
Zhang W, Yan Z, Zheng Y. Author Name Disambiguation Using Graph Node Embedding Method[C]// Proceedings of 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). 2019: 410-415.
( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020,4(2/3):48-59.)
Yu Z Z, Yang B. Researcher Name Disambiguation: Feature Learning and Affinity Propagation Clustering[C]// Proceedings of the International Symposium on Methodologies for Intelligent Systems. 2018: 225-235.
Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018,114(3):781-794.
Shen Q M, Wu T S, Yang H Y, et al. NameClarifier: A Visual Analytics System for Author Name Disambiguation[J]. IEEE Transactions on Visualization and Computer Graphics, 2017,23(1):141-150.
GB/T 6447-1986, 文摘编写规则[S]. 北京: 中国标准出版社, 1986.
( GB/T 6447-1986, Rules for Abstracts and Abstracting[S]. Beijing: Standards Press of China, 1986. )
Han H Q, Yao C Q, Fu Y, et al. Semantic Fingerprints-based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017,111(3):1879-1896.
Li N, Han J. The Application of Naive Bayes Classifier in Name Disambiguation[C]// Proceedings of the 3rd International Conference on Cloud Computing and Security. 2017: 611-618.
Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-citations[J]. Turkish Journal of Electrical Engineering and Computer Sciences, 2019,27(5):3665-3681.
Abdulhayoglu M A, Thijs B. Use of ResearchGate and Google CSE for Author Name Disambiguation[J]. Scientometrics, 2017,111(3):1965-1985.
Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data[C]// Proceedings of 2017 IEEE International Conference on Data Science & Advanced Analytics. 2017: 175-183.
Pooja K M, Mondal S, Chandra J. A Graph Combination with Edge Pruning-Based Approach for Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020,71(1):69-83.
Backes T. Effective Unsupervised Author Disambiguation with Relative Frequencies[OL]. arXiv Preprint, arXiv: 1808.04216.
Pooja K M, Mondal S, Chandra J. An Unsupervised Heuristic Based Approach for Author Name Disambiguation[C]// Proceedings of 2018 10th International Conference on Communication Systems & Networks (COMSNETS). 2018: 540-542.
Peng L W, Shen S Q, Xu J, et al. Diting: An Author Disambiguation Method Based on Network Representation Learning[J]. IEEE Access, 2019,7:135539-135555.
Huang H H, Kuo Y H. Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach[J]. IEEE Transactions on Fuzzy Systems, 2010,18(6):1098-1111.
Hussain I, Asghar S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity[J]. Arabian Journal for Science & Engineering, 2018,43(12):7421-7437.
Muller M C. Semantic Author Name Disambiguation with Word Embeddings[C]// Proceedings of the International Conference on Theory and Practice of Digital Libraries. 2017: 300-311.
Xu X L, Li Y P, Liptrott M, et al. NDFMF: An Author Name Disambiguation Algorithm Based on the Fusion of Multiple Features[C]// Proceedings of the 42nd Annual Computer Software and Applications Conference (COMPSAC). IEEE, 2018: 187-190.
Zhang L Z, Ban Z J. Large Scale Name Disambiguation Using Rule-based Post Processing Combined with Aminer[C]// Proceedings of the International CCF Conference on Artificial Intelligence. 2019: 147-158.
Zhang B C, Dundar M, Hasan M A. Bayesian Non-exhaustive Classification a Case Study: Online Name Disambiguation Using Temporal Record Streams[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1341-1350.
Zhang B C, Dundar M, Dave V, et al. Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter[C]// Proceedings of 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 2019: 269-278.
Lin X Q, Zhu J, Tang Y, et al. A Novel Approach for Author Name Disambiguation Using Ranking Confidence[C]// Proceeding of the International Conference on Database Systems for Advanced Applications. 2017: 169-182.
Protasiewicz J, Dadas S. A Hybrid Knowledge-based Framework for Author Name Disambiguation[C]// Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2016: 594-600.
( Zhai Xiaorui, Han Hongqi, Zhang Yunliang, et al. Research on English Author Name Disambiguation Based on Sparse Distributed Representation[J]. Application Research of Computers, 2019,36(12):3534-3538.)
Zhang Y T, Zhang F J, Yao P, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
Sun S M, Zhang H, Li N, et al. Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering[C]// Proceedings of IEEE International Conference on Computational Science & Engineering & IEEE International Conference on Embedded & Ubiquitous Computing. 2017: 176-182.
Du H L, Jiang Z Y, Gao J L. Who is Who: Name Disambiguation in Large-Scale Scientific Literature[C]// Proceedings of the International Conference on Data Mining Workshops (ICDMW). 2019: 1037-1044.
Chen B, Zhang J, Tang J, et al. CONNA: Addressing Name Disambiguation on the Fly[OL]. arXiv Preprint,arXiv: 1910. 12202.
Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[OL]. arXiv Preprint,arXiv: 2002. 09803.
Ma X, Wang R R, Zhang Y. Author Name Disambiguation in Heterogeneous Academic Networks[C]// Proceedings of International Conference on Web Information Systems and Applications. 2019: 126-137.
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of 2019 IEEE International Conference on Big Data. 2019: 910-919.
Silva J M B, Silva F. Feature Extraction for the Author Name Disambiguation Problem in a Bibliographic Database[C]// Proceedings of the Symposium on Applied Computing. 2017: 783-789.
Santana A F, Goncalves M A, Laender A H F, Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics[J]. Journal of the American Society for Information Science and Technology, 2017,68(4):931-945.
Katsurai M, Ohmukai I, Takeda H, Topic Representation of Researchers' Interests in a Large-scale Academic Database and Its Application to Author Disambiguation[J]. IEICE Transactions on Information & Systems, 2016,E99,D(4):1010-1018.
Amplayo R K, Hwang S-W, Song M. AutoSense Model for Word Sense Induction[J]. arXiv Preprint, arXiv: 1811.09242.
Kim K, Rohatgi S, Giles C L. Hybrid Deep Pairwise Classification for Author Name Disambiguation[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2369-2372.
Xu J, Shen S Q, Li D S, et al. A Network-embedding Based Method for Author Disambiguation[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 1735-1738.
Deng C H, Deng H F, Li C R. A Scholar Disambiguation Method Based on Heterogeneous Relation-Fusion and Attribute Enhancement[J]. IEEE Access, 2020,8:28375-28384.
Yan H, Peng H, Li C, et al. Bibliographic Name Disambiguation with Graph Convolutional Network[C]// Proceedings of the 20th International Conference on Web Information Systems Engineering. 2019: 538-551.
( Fan Wuyou. Method to Remove Ambiguity of Names of Known Authors[J]. Library Journal, 2018,37(12):56-63.)
Kim J, Kim J. The Impact of Imbalanced Training Data on Machine Learning for Author Name Disambiguation[J]. Scientometrics, 2018,117(1):511-526.
Halkidi M, Vazirgiannis M, Batistakis Y. Quality Scheme Assessment in the Clustering Process[C]// Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. 2000: 265-276.
Momeni F, Mayr P. Using Co-authorship Networks for Author Name Disambiguation[C]// Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. 2016: 261-262.
Fan X M, Wang J Y, Pu X, et al. On Graph-based Name Disambiguation[J]. ournal of Data and Information Quality, 2011, 10.
Franzoni V, Lepri M, Milani A. Topological and Semantic Graph-based Author Disambiguation on DBLP Data in Neo4j[OL]. arXiv Preprint,arXiv: 1901. 08977.
( Zhang Wangqiang, Zhu Zhongming, Li Yamei, et al. Disambiguating Author Names Automatically for Institutional Repository[J]. Data Analysis and Knowledge Discovery, 2019,3(6):92-98.)
Muller M C, Reitz F, Roy N. Data Sets for Author Name Disambiguation: An Empirical Analysis and a New Resource[J]. Scientometrics, 2017,111(3):1467-1500.
Tang A Y, Wu C K, Liu J, et al. Parallel Computing for Large-scale Author Name Disambiguation in MEDLINE[C]// Proceedings of 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 2019: 1580-1586.