Review of Studies on Incremental Name Disambiguation
Cao Simeng,Li Chunwang()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Department of Library Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper analyzes the research on name incremental disambiguation for authors, aiming to provide reference for future studies. [Coverage] We used “author” and “name disambiguation” as keywords to search Google Scholar, ACM, IEEE, Elsevier, Springer, CNKI and VIP databases. After manually screening and extending citation search based on seed documents, a total of 58 articles were retrieved, which included 30 papers directly discussing incremental disambiguation, and 28 other related research. [Methods] We discussed the developments, technical frameworks, and basic principles of incremental disambiguation. We also analyzed the development of incremental disambiguation on similarity comparison strategies, author assignment methods, and other issues.[Results] Popular areas include feature selection and representation, similarity calculation and author assignment methods. However, fragment merging, multi-topic recognition of the same author, and error-correction needs to be strengthened.[Limitations] There were limited studies on direct incremental disambiguation of author names, which could not fully support our results. [Conclusions] The research on incremental disambiguation should be strengthened. Combining traditional feature engineering methods with deep learning and a.pngicial intelligence technology could address more practical issues.
曹思萌, 李春旺. 作者名称增量消歧研究综述*[J]. 数据分析与知识发现, 2022, 6(5): 10-19.
Cao Simeng, Li Chunwang. Review of Studies on Incremental Name Disambiguation. Data Analysis and Knowledge Discovery, 2022, 6(5): 10-19.
Chen Y B, Jiang Z Y, Gao J L, et al. A Supervised and Distributed Framework for Cold-Start Author Disambiguation in Large-Scale Publications[J]. Neural Computing and Applications, 2021: 1-16.
[2]
Hussain I, Asghar S. A Survey of Author Name Disambiguation Techniques: 2010-2016[J]. The Knowledge Engineering Review, 2017, 32: e22.
doi: 10.1017/S0269888917000182
( Shen Zhe, Wang Yi, Yao Yifan, et al. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.)
[4]
Ferreira A A, Gonçalves M A, Laender A H F. A Brief Survey of Automatic Methods for Author Name Disambiguation[J]. ACM SIGMOD Record, 2012, 41(2): 15-26.
[5]
Delgado A D, Martínez R, Fresno V, et al. A Data Driven Approach for Person Name Disambiguation in Web Search Results[C]// Proceedings of the 25th International Conference on Computational Linguistics. 2014:301-310.
[6]
Khabsa M, Treeratpituk P, Giles C L. Large Scale Author Name Disambiguation in Digital Libraries[C]// Proceedings of the 2014 IEEE International Conference on Big Data. IEEE, 2014: 41-42.
[7]
Zha H. Spectral Relaxation for K-means Clustering[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001:1057-1064.
[8]
Carvalho A, Ferreira A A, Laender A H F, et al. Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries[J]. Journal of Information & Data Management, 2011, 2: 289-304.
[9]
Ferreira A A, Gonçalves M A, Laender A H F. Automatic Disambiguation of Author Names in Bibliographic Repositories[J]. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2020, 12(1): 1-146.
[10]
Tang J, Fong A C M, Wang B, et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975-987.
doi: 10.1109/TKDE.2011.13
[11]
Han H Q, Yao C Q, Fu Y, et al. Semantic Fingerprints-Based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017, 111(3): 1879-1896.
doi: 10.1007/s11192-017-2338-6
[12]
Protasiewicz J, Dadas S. A Hybrid Knowledge-Based Framework for Author Name Disambiguation[C]// Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2016: 594-600.
( Chang Ning, Dou Yongxiang, Xu Wei. Disambiguation of Sci-Tech Literature Authors with Multi-Source Data[J]. Information Science, 2021, 39(6): 108-116.)
[14]
Hussain I, Asghar S. Resolving Namesakes Using the Author’s Social Network[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2018, 26: 554-569.
[15]
Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-Citations[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2019, 27(5): 3665-3681.
[16]
Veloso A, Ferreira A A, Gonçalves M A, et al. Cost-Effective On-Demand Associative Author Name Disambiguation[J]. Information Processing & Management, 2012, 48(4): 680-697.
doi: 10.1016/j.ipm.2011.08.005
( Zhai Xiaorui, Han Hongqi, Zhang Yunliang, et al. Research on English Author Name Disambiguation Based on Sparse Distributed Representation[J]. Application Research of Computers, 2019, 36(12): 3534-3538.)
[18]
涂世文. 面向学术文献数据的同名作者消歧方法研究[D]. 上海: 华东师范大学, 2020.
[18]
( Tu Shiwen. A Study on Methods of Author Name Disambiguation in Academic Literature[D]. Shanghai: East China Normal University, 2020.)
[19]
Zhang B C, Dundar M, Hasan M A. Bayesian Non-Exhaustive Classification a Case Study: Online Name Disambiguation Using Temporal Record Streams[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1341-1350.
[20]
Zhang B C, Dundar M, Hasan M A. Bayesian Non-Exhaustive Classification for Active Online Name Disambiguation[OL]. arXiv Preprint, arXiv: 1708.04531.
[21]
Zhang B C. Towards Name Disambiguation:Relational, Streaming, and Privacy-Preserving Text Data[D]. Indiana, USA: Purdue University, 2017.
[22]
Zhang B C, Dundar M, Dave V, et al. Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter[C]// Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries. 2019: 269-278.
[23]
Chen B, Zhang J, Tang J, et al. CONNA: Addressing Name Disambiguation on the Fly[J]. IEEE Transactions on Knowledge and Data Engineering, 2020. DOI: 10.1109/TKDE.2020.3021256.
doi: 10.1109/TKDE.2020.3021256
( Wu Ziming. Research and Application on Big Scholarly Data-Based Key Technique of Academic Search System[D]. Guangzhou: South China University of Technology, 2020.)
[25]
Katsurai M, Ohmukai I, Takeda H. Topic Representation of Researchers’ Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation[J]. IEICE Transactions on Information and Systems, 2016, E99. D(4): 1010-1018.
[26]
Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scie.pngic Citation Data[C]// Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics. 2017: 175-183.
( Zhou Jie, Li Bicheng, Tang Yongwang. Incremental Clustering Method Based on Key Evidence and E2LSH for Person Name Disambiguation[J]. Journal of the China Society for Scie.pngic and Technical Information, 2016, 35(7): 714-722.)
( Zhou Jie. Research on Named Entity Recognition and Disambiguation Based on Network Semantic Resource[D]. Zhengzhou: PLA Information Engineering University, 2016.)
[29]
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
[30]
Santana A F, Gonçalves M A, Laender A H F, et al. Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 931-945.
doi: 10.1002/asi.23726
[31]
Santana A F, Gonçalves M A, Laender A H F, et al. On the Combination of Domain-Specific Heuristics for Author Name Disambiguation: The Nearest Cluster Method[J]. International Journal on Digital Libraries, 2015, 16(3-4): 229-246.
doi: 10.1007/s00799-015-0158-y
[32]
Khabsa M, Treeratpituk P, Giles C L. Online Person Name Disambiguation with Constraints[C]// Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. 2015: 37-46.
[33]
Treeratpituk P. Person Name Disambiguation in the Multicultural and Online Setting[D]. Pennsylvania, USA: The Pennsylvania State University, 2012.
[34]
Esperidião L, Ferreira A, Laender A, et al. Reducing Fragmentation in Incremental Author Name Disambiguation[J]. Journal of Information and Data Management, 2014, 5: 293-307.
[35]
Qian Y N, Zheng Q H, Sakai T, et al. Dynamic Author Name Disambiguation for Growing Digital Libraries[J]. Information Retrieval Journal, 2015, 18(5): 379-412.
doi: 10.1007/s10791-015-9261-3
[36]
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 2019: 910-919.
[37]
李娜. 作者姓名消歧方法研究与应用[D]. 上海: 华东师范大学, 2020.
[37]
( Li Na. Research and Application on Disambiguating Authors[D]. Shanghai: East China Normal University, 2020.)
[38]
Chen Y, Lee S Y M, Huang C R. PolyUHK: A Robust Information Extraction System for Web Personal Names[C]// Proceedings of the 2nd Web People Search Evaluation Workshop. 2009.
[39]
Han H, Giles L, Zha H, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. IEEE, 2004: 296-305.
[40]
Aldous D J. Exchangeability and Related Topics[A]//Hennequin P L. École d’Été de Probabilités de Saint-Flour XIII — 1983[M]. Springer, 1985.
( Gao Yue, Wang Wenxian, Yang Shuxian. A Document Clustering Algorithm Based on Dirichlet Process Mixture Model[J]. Netinfo Security, 2015(11): 60-65.)
[42]
Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785-794.
[43]
Kim K, Rohatgi S, Giles C L. Hybrid Deep Pairwise Classification for Author Name Disambiguation[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2369-2372.
[44]
Ferreira A A, Veloso A, Gonçalves M A, et al. Effective Self-Training Author Name Disambiguation in Scholarly Digital Libraries[C]// Proceedings of the 10th Annual Joint Conference on Digital Libraries. 2010: 39-48.
[45]
Ferreira A A, Veloso A, Gonçalves M A, et al. Self-Training Author Name Disambiguation for Information Scarce Scenarios[J]. Journal of the Association for Information Science and Technology, 2014, 65(6): 1257-1278.
doi: 10.1002/asi.22992
[46]
Ferreira A A, Machado T M, Gonçalves M A. Improving Author Name Disambiguation with User Relevance Feedback[J]. Journal of Information and Data Management, 2012, 3(3): 332-347.
[47]
Wang J, Berzins K, Hicks D, et al. A Boosted-Trees Method for Name Disambiguation[J]. Scientometrics, 2012, 93(2): 391-411.
doi: 10.1007/s11192-012-0681-1
[48]
Elmacioglu E, Tan Y F, Yan S, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C]// Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 268-271.
[49]
Asharaf S, Murty M N. An Adaptive Rough Fuzzy Single Pass Algorithm for Clustering Large Data Sets[J]. Pattern Recognition, 2003, 36(12): 3015-3018.
doi: 10.1016/S0031-3203(03)00081-5
[50]
Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018, 114(3): 781-794.
doi: 10.1007/s11192-017-2611-8
[51]
Han H, Xu W, Zha H Y, et al. A Hierarchical Naive Bayes Mixture Model for Name Disambiguation in Author Citations[C]// Proceedings of the 2005 ACM Symposium on Applied Computing. 2005: 1065-1069.
[52]
Bhattacharya I, Getoor L. A Latent Dirichlet Model for Unsupervised Entity Resolution[C]// Proceedings of the 2006 SIAM International Conference on Data Mining. 2006: 47-58.
[53]
Kim K, Khabsa M, Giles C L. Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN[C]// Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries. IEEE, 2016: 269-270.
[54]
Chen P Y, Choudhury S, Hero A O. Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection[C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016: 4553-4557.
[55]
Malin B. Unsupervised Name Disambiguation via Social Network Similarity[C]// Proceedings of the 2005 SIAM Workshop on Link Analysis, Counterterrorism, and Security. 2005: 93-102.
[56]
Ma X, Wang R R, Zhang Y. Author Name Disambiguation in Heterogeneous Academic Networks[C]// Proceedings of the 16th International Conference on Web Information Systems and Applications. 2019:126-137.
[57]
Tran H N, Huynh T, Do T. Author Name Disambiguation by Using Deep Neural Network[C]// Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems. 2014:123-132.
[58]
Wagstaff K, Cardie C. Clustering with Instance-Level Constraints[C]// Proceedings of the 7th International Conference on Machine Learning. 2000:1103-1110.