Disambiguation of Chinese Author Names with Multiple Features
Lin Kerou,Wang Hao(),Gong Lijuan,Zhang Baolong
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper aims to address the issues facing document management systems due to Chinese authors with the same names. [Methods] We built author entities with “author name + institution name” based on bibliographic data. Then, we used the attributes of author entities to construct six similarity features from three aspects. Third, we merged these features by principal component analysis or direct weight assignment. Finally, we evaluated the performance of the proposed method. [Results] Our methods significantly reduced processing time. Their F1 values on the LIS dataset were 70.74% and 70.42%, while their F1 values on the economics dataset were 81.90% and 80.93%. [Limitations] The attributes used in this research were only retrieved from metadata of the papers. [Conclusions] The proposed method could improve weight setting of multiple features.
林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features. Data Analysis and Knowledge Discovery, 2021, 5(4): 90-102.
Strotmann A, Zhao D Z. Author Name Disambiguation: What Difference does It Make in Author-Based Citation Analysis?[J]. Journal of the American Society for Information Science & Technology, 2012,63(9):1820-1833.
[2]
Kang I S, Na S H, Lee S, et al. On Co-Authorship for Author Disambiguation[J]. Information Processing & Management, 2009,45(1):84-97.
doi: 10.1016/j.ipm.2008.06.006
( Yu Xiawei. An Empirical Study of Disambiguation Based on Uniqueness of Chinese Authors Name[D]. Beijing: Institute of Scientific and Technical Information of China, 2017.)
[5]
Haak L L, Fenner M, Paglione L, et al. ORCID: A System to Uniquely Identify Researchers[J]. Learned Publishing, 2012,25(4):259-264.
doi: 10.1087/20120404
[6]
Youtie J, Carley S, Porter A L, et al. Tracking Researchers and Their Outputs: New Insights from ORCIDs[J]. Scientometrics, 2017,113(1):437-453.
doi: 10.1007/s11192-017-2473-0
[7]
Sanyal D K, Bhowmick P K, Das P P. A Review of Author Name Disambiguation Techniques for the PubMed Bibliographic Database[J/OL]. Journal of Information Science. (2019-12-01). [2020-06-01]. https://doi.org/10.1177/0165551519888605.
( Chen Jiayong, Zhuo Jie, Li Ling, et al. Research on Author Claim Pattern for University Institutional Repository Based on Paper-Entity Relationship Model[J]. Information Studies: Theory & Application, 2015,38(2):59-63.)
[9]
D’Angelo C A, van Eck N J. Collecting Large-Scale Publication Data at the Level of Individual Researchers: A Practical Proposal for Author Name Disambiguation[J]. Scientometrics, 2020,123(2):883-907.
doi: 10.1007/s11192-020-03410-y
( Liu Wei, Zhu Zhongming, Zhang Wangqiang, et al. Development and Research of Author Identifier and Item Claim Service for Institutional Repository[J]. New Technology of Library and Information Service, 2014(3):8-13.)
( Zhang Wangqiang, Zhu Zhongming, Li Yamei, et al. Disambiguating Author Names Automatically for Institutional Repository[J]. Data Analysis and Knowledge Discovery, 2019,3(6):92-98.)
( Sun Xiaoming, Li Yao, Wang Chengjun, et al. Research on Inventors’ Names Disambiguation Based on Expert Discussion[J]. Information Science, 2019,37(4):116-121.)
[13]
Han H, Giles L, Zha H Y, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries. New York: ACM, 2004: 296-305.
( Deng Kejun, Hua Kai, Deng Changming, et al. Research on Author Name Disambiguation Method Based on Machine Learning[J]. Journal of Sichuan University (Natural Science Edition), 2019,56(2):241-245.)
[15]
Ferreira A A, Goncalves M A, Laender A H F. A Brief Survey of Automatic Methods for Author Name Disambiguation[J]. Sigmod Record, 2012,41(2):15-26.
( Zhang Xiong, Chen Fucai, Huang Ruiyang. Research on Entity Disambiguation Method Based on Fusion Feature Similarity[J]. Application Research of Computers, 2017,34(2):347-350, 396.)
( Li Mengya. Research on Chinese Book Author’s Name Disambiguation Based on Fusion Features[J]. Computer Knowledge and Technology, 2018,14(11):182-184.)
( Yang Xinxin, Li Peifeng, Zhu Qiaoming, et al. A Name Disambiguation Method Based on Improved K-means Algorithm[J]. Computer and Digital Engineering, 2010,38(8):10-12,17.)
( Ren Jinghua. Using the Optimized DBSCAN Algorithm for Disambiguation of the Names of the Authors[J]. Library Theory and Practice, 2014(12):61-65.)
[21]
Kim K, Khabsa M, Giles C L. Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN[C]//Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. New York: IEEE, 2016: 269-270.
[22]
Han H Q, Yu Y S, Wang L J, et al. Disambiguating USPTO Inventor Names with Semantic Fingerprinting and DBSCAN Clustering[J]. The Electronic Library, 2019,37(2):225-239.
doi: 10.1108/EL-12-2018-0232
[23]
李维佳. 基于多层次聚类的同名区分算法研究与应用[D]. 大连: 大连理工大学, 2013.
[23]
( Li Weijia. The Research and Application of Name Disambiguation Algorithm Based on Multi-Level Clustering[D]. Dalian: Dalian University of Technology, 2013.)
[24]
Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018,114(3):781-794.
doi: 10.1007/s11192-017-2611-8
[25]
Zhang S Y, E X H, Pan T. A Multi-Level Author Name Disambiguation Algorithm[J]. IEEE Access, 2019,7:104250-104257.
doi: 10.1109/Access.6287639
( Hao Dandan, Guo Jingfeng, Zheng Chao. An Algorithm Based on Attributed Relational Graphs for Name Disambiguation[J]. Computer Engineering and Science, 2010,32(9):61-64.)
( Huang Bin. Random Walk Based Name Disambiguation Algorithm in Social Networks[J]. Application Research of Computers, 2015,32(12):3650-3653.)
[28]
Pooja K M, Mondal S, Chandra J. An Unsupervised Heuristic Based Approach for Author Name Disambiguation[C]//Proceedings of the 10th International Conference on Communication Systems & Networks. New York, USA: IEEE, 2018: 540-542.
[29]
Pooja K M, Mondal S, Chandra J. A Graph Combination with Edge Pruning‐Based Approach for Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020,71(1):69-83.
doi: 10.1002/asi.v71.1
[30]
Muller M C. Semantic Author Name Disambiguation with Word Embeddings[C]//Proceedings of 2017 International Conference on Theory and Practice of Digital Libraries. Cham, Switzerland: Springer, 2017: 300-311.