Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (5): 10-19    DOI: 10.11925/infotech.2096-3467.2021.0189
Current Issue | Archive | Adv Search |
Review of Studies on Incremental Name Disambiguation
Cao Simeng,Li Chunwang()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Department of Library Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (642 KB)   HTML ( 34
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the research on name incremental disambiguation for authors, aiming to provide reference for future studies. [Coverage] We used “author” and “name disambiguation” as keywords to search Google Scholar, ACM, IEEE, Elsevier, Springer, CNKI and VIP databases. After manually screening and extending citation search based on seed documents, a total of 58 articles were retrieved, which included 30 papers directly discussing incremental disambiguation, and 28 other related research. [Methods] We discussed the developments, technical frameworks, and basic principles of incremental disambiguation. We also analyzed the development of incremental disambiguation on similarity comparison strategies, author assignment methods, and other issues.[Results] Popular areas include feature selection and representation, similarity calculation and author assignment methods. However, fragment merging, multi-topic recognition of the same author, and error-correction needs to be strengthened.[Limitations] There were limited studies on direct incremental disambiguation of author names, which could not fully support our results. [Conclusions] The research on incremental disambiguation should be strengthened. Combining traditional feature engineering methods with deep learning and a.pngicial intelligence technology could address more practical issues.

Key wordsAuthor Name Disambiguation      Incremental Disambiguation      Similarity     
Received: 01 March 2021      Published: 21 June 2022
ZTFLH:  G250  
Fund:Literature and Information Capacity Building Project of Chinese Academy of Sciences(Y929090401)
Corresponding Authors: Li Chunwang,ORCID:0000-0002-6313-6576     E-mail: licw@mail.las.ac.cn

Cite this article:

Cao Simeng, Li Chunwang. Review of Studies on Incremental Name Disambiguation. Data Analysis and Knowledge Discovery, 2022, 6(5): 10-19.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0189     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I5/10

作者分配方法 相似度比较策略 特征表示方式 主要方法或作者 发表年份
基于规则法 全记录 代数模型 INDi[8-9]、Tang等[10]、Han等[11] 2011/2020、2011、2017
集合论模型+代数模型 Protasiewicz等[12]、昌宁等[13] 2016、2020
作者模型 集合论模型 CAND[14-15] 2018/2019
分类法 全记录 代数模型 SLAND[16]、翟晓瑞等[17]、涂世文[18] 2012、2019、2020
概率模型 Zhang等[19-22] 2016/2017/2017/2019
部分记录 代数模型 CONNA[23] 2019
作者模型 代数模型 吴梓明[24] 2020
概率模型 Katsurai等[25]、Zhao等[26] 2016、2017
聚类法 全记录 代数模型 周杰等[27-28]、Zhang等[29] 2016/2016/2016、2018
概率模型 INC[30-31] 2015/2017
部分记录 代数模型 Khabsa[32]、Treeratpituk[33]、MINDi[34] 2012/2015、2014
作者模型 概率模型 IncAD[35] 2015
基于图的方法 全记录 图模型 Qiao等[36] 2019
作者模型 图模型 李娜[37] 2020
The Comparison of Principal Disambiguation Techniques
[1] Chen Y B, Jiang Z Y, Gao J L, et al. A Supervised and Distributed Framework for Cold-Start Author Disambiguation in Large-Scale Publications[J]. Neural Computing and Applications, 2021: 1-16.
[2] Hussain I, Asghar S. A Survey of Author Name Disambiguation Techniques: 2010-2016[J]. The Knowledge Engineering Review, 2017, 32: e22.
doi: 10.1017/S0269888917000182
[3] 沈喆, 王毅, 姚毅凡, 等. 面向学术文献的作者名消歧方法研究综述[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[3] ( Shen Zhe, Wang Yi, Yao Yifan, et al. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.)
[4] Ferreira A A, Gonçalves M A, Laender A H F. A Brief Survey of Automatic Methods for Author Name Disambiguation[J]. ACM SIGMOD Record, 2012, 41(2): 15-26.
[5] Delgado A D, Martínez R, Fresno V, et al. A Data Driven Approach for Person Name Disambiguation in Web Search Results[C]// Proceedings of the 25th International Conference on Computational Linguistics. 2014:301-310.
[6] Khabsa M, Treeratpituk P, Giles C L. Large Scale Author Name Disambiguation in Digital Libraries[C]// Proceedings of the 2014 IEEE International Conference on Big Data. IEEE, 2014: 41-42.
[7] Zha H. Spectral Relaxation for K-means Clustering[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001:1057-1064.
[8] Carvalho A, Ferreira A A, Laender A H F, et al. Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries[J]. Journal of Information & Data Management, 2011, 2: 289-304.
[9] Ferreira A A, Gonçalves M A, Laender A H F. Automatic Disambiguation of Author Names in Bibliographic Repositories[J]. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2020, 12(1): 1-146.
[10] Tang J, Fong A C M, Wang B, et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975-987.
doi: 10.1109/TKDE.2011.13
[11] Han H Q, Yao C Q, Fu Y, et al. Semantic Fingerprints-Based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017, 111(3): 1879-1896.
doi: 10.1007/s11192-017-2338-6
[12] Protasiewicz J, Dadas S. A Hybrid Knowledge-Based Framework for Author Name Disambiguation[C]// Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2016: 594-600.
[13] 昌宁, 窦永香, 徐薇. 基于多源数据的科技文献作者同名消歧研究[J]. 情报科学, 2021, 39(6): 108-116.
[13] ( Chang Ning, Dou Yongxiang, Xu Wei. Disambiguation of Sci-Tech Literature Authors with Multi-Source Data[J]. Information Science, 2021, 39(6): 108-116.)
[14] Hussain I, Asghar S. Resolving Namesakes Using the Author’s Social Network[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2018, 26: 554-569.
[15] Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-Citations[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2019, 27(5): 3665-3681.
[16] Veloso A, Ferreira A A, Gonçalves M A, et al. Cost-Effective On-Demand Associative Author Name Disambiguation[J]. Information Processing & Management, 2012, 48(4): 680-697.
doi: 10.1016/j.ipm.2011.08.005
[17] 翟晓瑞, 韩红旗, 张运良, 等. 基于稀疏分布式表征的英文著者姓名消歧研究[J]. 计算机应用研究, 2019, 36(12): 3534-3538.
[17] ( Zhai Xiaorui, Han Hongqi, Zhang Yunliang, et al. Research on English Author Name Disambiguation Based on Sparse Distributed Representation[J]. Application Research of Computers, 2019, 36(12): 3534-3538.)
[18] 涂世文. 面向学术文献数据的同名作者消歧方法研究[D]. 上海: 华东师范大学, 2020.
[18] ( Tu Shiwen. A Study on Methods of Author Name Disambiguation in Academic Literature[D]. Shanghai: East China Normal University, 2020.)
[19] Zhang B C, Dundar M, Hasan M A. Bayesian Non-Exhaustive Classification a Case Study: Online Name Disambiguation Using Temporal Record Streams[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1341-1350.
[20] Zhang B C, Dundar M, Hasan M A. Bayesian Non-Exhaustive Classification for Active Online Name Disambiguation[OL]. arXiv Preprint, arXiv: 1708.04531.
[21] Zhang B C. Towards Name Disambiguation:Relational, Streaming, and Privacy-Preserving Text Data[D]. Indiana, USA: Purdue University, 2017.
[22] Zhang B C, Dundar M, Dave V, et al. Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter[C]// Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries. 2019: 269-278.
[23] Chen B, Zhang J, Tang J, et al. CONNA: Addressing Name Disambiguation on the Fly[J]. IEEE Transactions on Knowledge and Data Engineering, 2020. DOI: 10.1109/TKDE.2020.3021256.
doi: 10.1109/TKDE.2020.3021256
[24] 吴梓明. 基于学术大数据的学术搜索系统关键技术研究及应用[D]. 广州: 华南理工大学, 2020.
[24] ( Wu Ziming. Research and Application on Big Scholarly Data-Based Key Technique of Academic Search System[D]. Guangzhou: South China University of Technology, 2020.)
[25] Katsurai M, Ohmukai I, Takeda H. Topic Representation of Researchers’ Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation[J]. IEICE Transactions on Information and Systems, 2016, E99. D(4): 1010-1018.
[26] Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scie.pngic Citation Data[C]// Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics. 2017: 175-183.
[27] 周杰, 李弼程, 唐永旺. 基于关键证据与E2LSH的增量式人名聚类消歧方法[J]. 情报学报, 2016, 35(7): 714-722.
[27] ( Zhou Jie, Li Bicheng, Tang Yongwang. Incremental Clustering Method Based on Key Evidence and E2LSH for Person Name Disambiguation[J]. Journal of the China Society for Scie.pngic and Technical Information, 2016, 35(7): 714-722.)
[28] 周杰. 基于网络语义资源的命名实体识别与消歧技术研究[D]. 郑州: 解放军信息工程大学, 2016.
[28] ( Zhou Jie. Research on Named Entity Recognition and Disambiguation Based on Network Semantic Resource[D]. Zhengzhou: PLA Information Engineering University, 2016.)
[29] Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
[30] Santana A F, Gonçalves M A, Laender A H F, et al. Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 931-945.
doi: 10.1002/asi.23726
[31] Santana A F, Gonçalves M A, Laender A H F, et al. On the Combination of Domain-Specific Heuristics for Author Name Disambiguation: The Nearest Cluster Method[J]. International Journal on Digital Libraries, 2015, 16(3-4): 229-246.
doi: 10.1007/s00799-015-0158-y
[32] Khabsa M, Treeratpituk P, Giles C L. Online Person Name Disambiguation with Constraints[C]// Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. 2015: 37-46.
[33] Treeratpituk P. Person Name Disambiguation in the Multicultural and Online Setting[D]. Pennsylvania, USA: The Pennsylvania State University, 2012.
[34] Esperidião L, Ferreira A, Laender A, et al. Reducing Fragmentation in Incremental Author Name Disambiguation[J]. Journal of Information and Data Management, 2014, 5: 293-307.
[35] Qian Y N, Zheng Q H, Sakai T, et al. Dynamic Author Name Disambiguation for Growing Digital Libraries[J]. Information Retrieval Journal, 2015, 18(5): 379-412.
doi: 10.1007/s10791-015-9261-3
[36] Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 2019: 910-919.
[37] 李娜. 作者姓名消歧方法研究与应用[D]. 上海: 华东师范大学, 2020.
[37] ( Li Na. Research and Application on Disambiguating Authors[D]. Shanghai: East China Normal University, 2020.)
[38] Chen Y, Lee S Y M, Huang C R. PolyUHK: A Robust Information Extraction System for Web Personal Names[C]// Proceedings of the 2nd Web People Search Evaluation Workshop. 2009.
[39] Han H, Giles L, Zha H, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. IEEE, 2004: 296-305.
[40] Aldous D J. Exchangeability and Related Topics[A]//Hennequin P L. École d’Été de Probabilités de Saint-Flour XIII — 1983[M]. Springer, 1985.
[41] 高悦, 王文贤, 杨淑贤. 一种基于狄利克雷过程混合模型的文本聚类算法[J]. 信息网络安全, 2015(11): 60-65.
[41] ( Gao Yue, Wang Wenxian, Yang Shuxian. A Document Clustering Algorithm Based on Dirichlet Process Mixture Model[J]. Netinfo Security, 2015(11): 60-65.)
[42] Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785-794.
[43] Kim K, Rohatgi S, Giles C L. Hybrid Deep Pairwise Classification for Author Name Disambiguation[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2369-2372.
[44] Ferreira A A, Veloso A, Gonçalves M A, et al. Effective Self-Training Author Name Disambiguation in Scholarly Digital Libraries[C]// Proceedings of the 10th Annual Joint Conference on Digital Libraries. 2010: 39-48.
[45] Ferreira A A, Veloso A, Gonçalves M A, et al. Self-Training Author Name Disambiguation for Information Scarce Scenarios[J]. Journal of the Association for Information Science and Technology, 2014, 65(6): 1257-1278.
doi: 10.1002/asi.22992
[46] Ferreira A A, Machado T M, Gonçalves M A. Improving Author Name Disambiguation with User Relevance Feedback[J]. Journal of Information and Data Management, 2012, 3(3): 332-347.
[47] Wang J, Berzins K, Hicks D, et al. A Boosted-Trees Method for Name Disambiguation[J]. Scientometrics, 2012, 93(2): 391-411.
doi: 10.1007/s11192-012-0681-1
[48] Elmacioglu E, Tan Y F, Yan S, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C]// Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 268-271.
[49] Asharaf S, Murty M N. An Adaptive Rough Fuzzy Single Pass Algorithm for Clustering Large Data Sets[J]. Pattern Recognition, 2003, 36(12): 3015-3018.
doi: 10.1016/S0031-3203(03)00081-5
[50] Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018, 114(3): 781-794.
doi: 10.1007/s11192-017-2611-8
[51] Han H, Xu W, Zha H Y, et al. A Hierarchical Naive Bayes Mixture Model for Name Disambiguation in Author Citations[C]// Proceedings of the 2005 ACM Symposium on Applied Computing. 2005: 1065-1069.
[52] Bhattacharya I, Getoor L. A Latent Dirichlet Model for Unsupervised Entity Resolution[C]// Proceedings of the 2006 SIAM International Conference on Data Mining. 2006: 47-58.
[53] Kim K, Khabsa M, Giles C L. Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN[C]// Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries. IEEE, 2016: 269-270.
[54] Chen P Y, Choudhury S, Hero A O. Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection[C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016: 4553-4557.
[55] Malin B. Unsupervised Name Disambiguation via Social Network Similarity[C]// Proceedings of the 2005 SIAM Workshop on Link Analysis, Counterterrorism, and Security. 2005: 93-102.
[56] Ma X, Wang R R, Zhang Y. Author Name Disambiguation in Heterogeneous Academic Networks[C]// Proceedings of the 16th International Conference on Web Information Systems and Applications. 2019:126-137.
[57] Tran H N, Huynh T, Do T. Author Name Disambiguation by Using Deep Neural Network[C]// Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems. 2014:123-132.
[58] Wagstaff K, Cardie C. Clustering with Instance-Level Constraints[C]// Proceedings of the 7th International Conference on Machine Learning. 2000:1103-1110.
[1] Li Hui, Hu Jixia, Tong Zhiying. Subject Topic Mining and Evolution Analysis with Multi-Source Data[J]. 数据分析与知识发现, 2022, 6(7): 44-55.
[2] Duan Jianyong, Xu Lishan, Liu Jie, Li Xin, Zhang Jiaming, Wang Hao. Question Generation Based on Sememe Knowledge and Bidirectional Attention Flow[J]. 数据分析与知识发现, 2022, 6(5): 44-53.
[3] Deng Qiping, Chen Weijing, Ji Ling, Zhang Yu’e. Author Name Disambiguation Based on Heterogeneous Information Network[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[4] Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
[5] Zhang Le, Leng Jidong, Lv Xueqiang, Yuan Menglong, You Xindong. Discovering Chinese New Words Based on Multi-sense Word Embedding[J]. 数据分析与知识发现, 2022, 6(1): 113-121.
[6] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[7] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[8] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[9] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[10] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[11] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[12] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[13] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[14] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[15] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn