Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 15-27    DOI: 10.11925/infotech.2096-3467.2020.0384
Current Issue | Archive | Adv Search |
Author Name Disambiguation Techniques for Academic Literature: A Review
Shen Zhe1,Wang Yi1,Yao Yifan1,Cheng Ying1,2()
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Chinese Language and Literature, Shandong Normal University, Jinan 250014, China
Download: PDF (815 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

Abstract: [Objective] This paper reviews research on author name disambiguation techniques for the academic literature, aiming to provide references for future studies. [Coverage] A total of 51 papers published between January 1, 2016 to March 28 , 2020 were retrieved from the Web of Science, Google Scholar, CNKI and Wanfang Database. [Methods] First, we explored findings from these papers based on the process of author name disambiguation. Then, we summarized techniques like feature extraction, feature representation, model training and prediction. Finally, we discussed common issues facing these research multi-dimensionally. [Results] Graph-based and probabilistic methods, as well as hybrid feature representation models improved the calculation of complicated network features. We need to optimize machine-learning models' efficiency and generalization ability to finish tasks with large databases and incremental disambiguation. Most research did not address issues like unbalanced training data, missing feature data, and authors using different names. [Limitations] Due to the differences in empirical data, we did not carry out quantitative comparison among different methods. [Conclusions] Our study proposed multi-source data fusion, user intervention, and pre-trained models to improve author name disambiguation.

Key wordsAuthor Name Disambiguation      Name Ambiguity      Same Name Disambiguation      Literature Database     
Received: 05 May 2020      Published: 05 June 2020
ZTFLH:  TP393  
Corresponding Authors: Cheng Ying     E-mail: chengy@nju.edu.cn

Cite this article:

Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0384     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I8/15

General Framework for AND
Frequency of Features Used in AND
[1] Haak L L, Fenner M, Paglione L, et al. ORCID: A System to Uniquely Identify Researchers[J]. Learned Publishing, 2012,25(4):259-264.
doi: 10.1087/20120404
[2] Smalheiser N R, Torvik V I. Author Name Disambiguation[J]. Annual Review of Information Science & Technology, 2009,43(1):1-43.
[3] Elliot S. Survey of Author Name Disambiguation: 2004 to 2010[J/OL]. Library Philosophy and Practice, [2004-04-10]. https://digitalcommons.unl.edu/libphilprac/443/
[4] Ferreira A A, Gongalves M A, Laender A H F. A Brief Survey of Automatic Methods for Author Name Disambiguation[J]. ACM SIGMOD Record, 2012,41(2):15-26.
[5] Hussain I, Asghar S. A Survey of Author Name Disambiguation Techniques: 2010-2016[J]. The Knowledge Engineering Review, 2017,32:e22.
doi: 10.1017/S0269888917000182
[6] 付媛, 朱礼军, 韩红旗. 姓名消歧方法研究进展[J]. 情报工程, 2016,2(1):53-58.
[6] ( Fu Yuan, Zhu Lijun, Han Hongqi. A Survey of Name Disambiguation[J]. Technology Intelligence Engineering, 2016,2(1):53-58.)
[7] Sanyal D K, Bhowmick P K, Das P P. A Review of Author Name Disambiguation Techniques for the PubMed Bibliographic Database[J]. Journal of Information Science, 2019(3):1-28.
[8] 单嵩岩, 吴振新. 面向作者消歧和合作预测领域的作者相似度算法述评[J]. 东北师大学报(自然科学版), 2019,51(2):71-80.
[8] ( Shan Songyan, Wu Zhenxin. Review on the Author Similarity Algorithm in the Field of Author Name Disambiguation and Research Collaboration Prediction[J]. Journal of Northeast Normal University(Natural Science Edition), 2019,51(2):71-80.)
[9] Delgado A D, Montalvo S, Martinez-Unanue R, et al. A Survey of Person Name Disambiguation on the Web[J]. IEEE Access, 2018,6:59496-59514.
doi: 10.1109/Access.6287639
[10] 柯昊, 李天, 周悦, 等. 数据缺失时基于BP神经网络的作者重名辨识研究[J]. 情报学报, 2018,37(6):600-609.
[10] ( Ke Hao, Li Tian, Zhou Yue, et al. Author Name Disambiguation Using BP Neural Networks Under Missing Data[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(6):600-609.)
[11] Zhang S Y, E X H, Pan T. A Multi-Level Author Name Disambiguation Algorithm[J]. IEEE Access, 2019,7:104250-104257.
doi: 10.1109/Access.6287639
[12] 尚玉玲, 曹建军, 李红梅, 等. 基于合作作者与隶属机构信息的同名排歧方法[J]. 计算机科学, 2018,45(11):220-225, 260.
[12] ( Shang Yuling, Cao Jianjun, Li Hongmei, et al. Co-author and Affiliate Based Name Disambiguation Approach[J]. Computer Science, 2018,45(11):220-225,260.)
[13] Ding X, Zhang H, Guo X Y. An Unsupervised Framework for Author-paper Linking in Bibliographic Retrieval System[C]// Proceedings of the 14th International Conference on Semantics, Knowledge and Grids (SKG). 2018: 152-159.
[14] Hazra R, Saha A, Deb S B, et al. An Efficient Technique for Author Name Disambiguation[C]// Proceedings of 2016 IEEE International Conference on Current Trends in Advanced Computing. 2016: 1-6.
[15] 刘林. 面向科技人才情报的多策略组合模型同名消歧方法[J]. 通信技术, 2018,51(8):1836-1843.
[15] ( Liu Lin. Multi-strategy Combination Model for Scientific and Technological Talent Disambiguation[J]. Communications Technology, 2018,51(8):1836-1843.)
[16] Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[OL]. arXiv Preprint, arXiv: 1702.02287.
[17] Zhang W, Yan Z, Zheng Y. Author Name Disambiguation Using Graph Node Embedding Method[C]// Proceedings of 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). 2019: 410-415.
[18] 余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020,4(2/3):48-59.
[18] ( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020,4(2/3):48-59.)
[19] Yu Z Z, Yang B. Researcher Name Disambiguation: Feature Learning and Affinity Propagation Clustering[C]// Proceedings of the International Symposium on Methodologies for Intelligent Systems. 2018: 225-235.
[20] Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018,114(3):781-794.
doi: 10.1007/s11192-017-2611-8
[21] Shen Q M, Wu T S, Yang H Y, et al. NameClarifier: A Visual Analytics System for Author Name Disambiguation[J]. IEEE Transactions on Visualization and Computer Graphics, 2017,23(1):141-150.
doi: 10.1109/TVCG.2016.2598465 pmid: 27514051
[22] GB/T 6447-1986, 文摘编写规则[S]. 北京: 中国标准出版社, 1986.
[22] ( GB/T 6447-1986, Rules for Abstracts and Abstracting[S]. Beijing: Standards Press of China, 1986. )
[23] Han H Q, Yao C Q, Fu Y, et al. Semantic Fingerprints-based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017,111(3):1879-1896.
doi: 10.1007/s11192-017-2338-6
[24] Li N, Han J. The Application of Naive Bayes Classifier in Name Disambiguation[C]// Proceedings of the 3rd International Conference on Cloud Computing and Security. 2017: 611-618.
[25] Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-citations[J]. Turkish Journal of Electrical Engineering and Computer Sciences, 2019,27(5):3665-3681.
doi: 10.3906/elk-1806-132
[26] Abdulhayoglu M A, Thijs B. Use of ResearchGate and Google CSE for Author Name Disambiguation[J]. Scientometrics, 2017,111(3):1965-1985.
doi: 10.1007/s11192-017-2341-y
[27] Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data[C]// Proceedings of 2017 IEEE International Conference on Data Science & Advanced Analytics. 2017: 175-183.
[28] Pooja K M, Mondal S, Chandra J. A Graph Combination with Edge Pruning-Based Approach for Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020,71(1):69-83.
doi: 10.1002/asi.v71.1
[29] Backes T. Effective Unsupervised Author Disambiguation with Relative Frequencies[OL]. arXiv Preprint, arXiv: 1808.04216.
[30] Pooja K M, Mondal S, Chandra J. An Unsupervised Heuristic Based Approach for Author Name Disambiguation[C]// Proceedings of 2018 10th International Conference on Communication Systems & Networks (COMSNETS). 2018: 540-542.
[31] Peng L W, Shen S Q, Xu J, et al. Diting: An Author Disambiguation Method Based on Network Representation Learning[J]. IEEE Access, 2019,7:135539-135555.
doi: 10.1109/Access.6287639
[32] Huang H H, Kuo Y H. Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach[J]. IEEE Transactions on Fuzzy Systems, 2010,18(6):1098-1111.
doi: 10.1109/TFUZZ.2010.2065811
[33] Hussain I, Asghar S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity[J]. Arabian Journal for Science & Engineering, 2018,43(12):7421-7437.
[34] Muller M C. Semantic Author Name Disambiguation with Word Embeddings[C]// Proceedings of the International Conference on Theory and Practice of Digital Libraries. 2017: 300-311.
[35] Xu X L, Li Y P, Liptrott M, et al. NDFMF: An Author Name Disambiguation Algorithm Based on the Fusion of Multiple Features[C]// Proceedings of the 42nd Annual Computer Software and Applications Conference (COMPSAC). IEEE, 2018: 187-190.
[36] Zhang L Z, Ban Z J. Large Scale Name Disambiguation Using Rule-based Post Processing Combined with Aminer[C]// Proceedings of the International CCF Conference on Artificial Intelligence. 2019: 147-158.
[37] Zhang B C, Dundar M, Hasan M A. Bayesian Non-exhaustive Classification a Case Study: Online Name Disambiguation Using Temporal Record Streams[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1341-1350.
[38] Zhang B C, Dundar M, Dave V, et al. Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter[C]// Proceedings of 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 2019: 269-278.
[39] Lin X Q, Zhu J, Tang Y, et al. A Novel Approach for Author Name Disambiguation Using Ranking Confidence[C]// Proceeding of the International Conference on Database Systems for Advanced Applications. 2017: 169-182.
[40] Protasiewicz J, Dadas S. A Hybrid Knowledge-based Framework for Author Name Disambiguation[C]// Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2016: 594-600.
[41] 翟晓瑞, 韩红旗, 张运良, 等. 基于稀疏分布式表征的英文著者姓名消歧研究[J]. 计算机应用研究, 2019,36(12):3534-3538.
[41] ( Zhai Xiaorui, Han Hongqi, Zhang Yunliang, et al. Research on English Author Name Disambiguation Based on Sparse Distributed Representation[J]. Application Research of Computers, 2019,36(12):3534-3538.)
[42] Zhang Y T, Zhang F J, Yao P, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
[43] Sun S M, Zhang H, Li N, et al. Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering[C]// Proceedings of IEEE International Conference on Computational Science & Engineering & IEEE International Conference on Embedded & Ubiquitous Computing. 2017: 176-182.
[44] Du H L, Jiang Z Y, Gao J L. Who is Who: Name Disambiguation in Large-Scale Scientific Literature[C]// Proceedings of the International Conference on Data Mining Workshops (ICDMW). 2019: 1037-1044.
[45] Chen B, Zhang J, Tang J, et al. CONNA: Addressing Name Disambiguation on the Fly[OL]. arXiv Preprint,arXiv: 1910. 12202.
[46] Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[OL]. arXiv Preprint,arXiv: 2002. 09803.
[47] Ma X, Wang R R, Zhang Y. Author Name Disambiguation in Heterogeneous Academic Networks[C]// Proceedings of International Conference on Web Information Systems and Applications. 2019: 126-137.
[48] Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of 2019 IEEE International Conference on Big Data. 2019: 910-919.
[49] Silva J M B, Silva F. Feature Extraction for the Author Name Disambiguation Problem in a Bibliographic Database[C]// Proceedings of the Symposium on Applied Computing. 2017: 783-789.
[50] Santana A F, Goncalves M A, Laender A H F, Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics[J]. Journal of the American Society for Information Science and Technology, 2017,68(4):931-945.
[51] Katsurai M, Ohmukai I, Takeda H, Topic Representation of Researchers' Interests in a Large-scale Academic Database and Its Application to Author Disambiguation[J]. IEICE Transactions on Information & Systems, 2016,E99,D(4):1010-1018.
[52] Amplayo R K, Hwang S-W, Song M. AutoSense Model for Word Sense Induction[J]. arXiv Preprint, arXiv: 1811.09242.
[53] Kim K, Rohatgi S, Giles C L. Hybrid Deep Pairwise Classification for Author Name Disambiguation[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2369-2372.
[54] Xu J, Shen S Q, Li D S, et al. A Network-embedding Based Method for Author Disambiguation[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 1735-1738.
[55] Deng C H, Deng H F, Li C R. A Scholar Disambiguation Method Based on Heterogeneous Relation-Fusion and Attribute Enhancement[J]. IEEE Access, 2020,8:28375-28384.
doi: 10.1109/Access.6287639
[56] Yan H, Peng H, Li C, et al. Bibliographic Name Disambiguation with Graph Convolutional Network[C]// Proceedings of the 20th International Conference on Web Information Systems Engineering. 2019: 538-551.
[57] 范午攸. 一种针对已知作者的姓名消歧方法[J]. 图书馆杂志, 2018,37(12):56-63.
[57] ( Fan Wuyou. Method to Remove Ambiguity of Names of Known Authors[J]. Library Journal, 2018,37(12):56-63.)
[58] Kim J, Kim J. The Impact of Imbalanced Training Data on Machine Learning for Author Name Disambiguation[J]. Scientometrics, 2018,117(1):511-526.
doi: 10.1007/s11192-018-2865-9
[59] Halkidi M, Vazirgiannis M, Batistakis Y. Quality Scheme Assessment in the Clustering Process[C]// Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. 2000: 265-276.
[60] Momeni F, Mayr P. Using Co-authorship Networks for Author Name Disambiguation[C]// Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. 2016: 261-262.
[61] Fan X M, Wang J Y, Pu X, et al. On Graph-based Name Disambiguation[J]. ournal of Data and Information Quality, 2011, 10.
[62] Franzoni V, Lepri M, Milani A. Topological and Semantic Graph-based Author Disambiguation on DBLP Data in Neo4j[OL]. arXiv Preprint,arXiv: 1901. 08977.
[63] 于夏薇, 袁军鹏. 融合语料库的论文作者姓名中英自动翻译研究[J]. 情报工程, 2018,4(1):42-51.
[63] ( Yu Xiawei, Yuan Junpeng. Research on Chinese and English Automatic Translation Combined Authorship Corpus[J]. Technology Intelligence Engineering, 2018,4(1):42-51.)
[64] 张新征, 雷鹏飞, 李玉坤, 等. 面向论文检索的同名作者区分方法[J]. 计算机与数字工程, 2017,45(2):216-220, 372.
[64] ( Zhang Xinzheng, Lei Pengfei, Li Yukun, et al. A Method of Same Name Author Distinguishment Towards Paper Retrieval[J]. Computer and Digital Engineering, 2017,45(2):216-220, 372.)
[65] 邓可君, 华凯, 邓昌明, 等. 基于机器学习的论文作者名消歧方法研究[J]. 四川大学学报(自然科学版), 2019,56(2):241-245.
[65] ( Deng Kejun, Hua Kai, Deng Changming, et al. Research on Author Name Disambiguation Method Based on Machine Learning[J]. Journal of Sichuan University (Natural Science Edition), 2019,56(2):241-245.)
[66] Kim K, Sefid A, Giles C L. Scaling Author Name Disambiguation with CNF Blocking[OL]. arXiv Preprint,arXiv: 1709.09657.
[67] 高悦, 王文贤, 杨淑贤. 一种基于狄利克雷过程混合模型的文本聚类算法[J]. 信息网络安全, 2015(11):60-65.
[67] ( Gao Yue, Wang Wenxian, Yang Shuxian. A Document Clustering Algorithm Based on Dirichlet Process Mixture Model[J]. Netinfo Security, 2015(11):60-65.)
[68] 张旺强, 祝忠明, 李雅梅, 等. 机构知识库作者名自动消歧框架设计与实践[J]. 数据分析与知识发现, 2019,3(6):92-98.
[68] ( Zhang Wangqiang, Zhu Zhongming, Li Yamei, et al. Disambiguating Author Names Automatically for Institutional Repository[J]. Data Analysis and Knowledge Discovery, 2019,3(6):92-98.)
[69] Muller M C, Reitz F, Roy N. Data Sets for Author Name Disambiguation: An Empirical Analysis and a New Resource[J]. Scientometrics, 2017,111(3):1467-1500.
doi: 10.1007/s11192-017-2363-5 pmid: 28596627
[70] Tang A Y, Wu C K, Liu J, et al. Parallel Computing for Large-scale Author Name Disambiguation in MEDLINE[C]// Proceedings of 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 2019: 1580-1586.
[1] Yu Chuanming,Zhong Yunci,Lin Aochen,An Lu. Author Name Disambiguation with Network Embedding[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[2] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[3] Sun Haixia,Wang Lei,Wu Yingjie,Hua Weina,Li Junlian. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[4] Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name[J]. 现代图书情报技术, 2015, 31(6): 57-63.
[5] Zhang Jianyong, Huang Yongwen, Yu Qianqian, Dong Zhipeng, Guo Shu. Design and Implementation of ORCID China Service ‘iAuthor’[J]. 现代图书情报技术, 2015, 31(3): 84-91.
[6] Guo Shu. Research on Author Name Disambiguation Algorithm in the Literature Database[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
[7] Tan Chunmei. Development and Design of Conference Literature Database  System[J]. 现代图书情报技术, 2004, 20(12): 51-54.
[8] Tan Chunmei,Duan Weihua. Research and Realization of Key Techniques of Special Literature Database System[J]. 现代图书情报技术, 2002, 18(6): 52-54.
[9] Tan Chunmei,Duan Weihua,Tian Zhibing. Design and Realization of Standard Literature Database System[J]. 现代图书情报技术, 2001, 17(6): 21-23.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn