Name Disambiguation Based on Similar Features and Relation Graph Optimization
Cui Huanqing1,2(),Yang Junzhu1,Song Weiqing1
1College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China 2State Key Laboratory of High-end Server & Storage Technology, Inspur Group Co., Ltd., Jinan 250014, China
[Objective] The paper aims to fully utilize the feature information and relation information of academic literature to improve author name disambiguation. [Methods] We proposed a name disambiguation method combining feature information embedding and relation graph optimization. First, we extracted feature information from literature and applied representation learning to obtain the embedding vectors. Then, we mined the relationship information between literatures, and also constructed four relation graphs to optimize the embedding vectors of each literature. Finally, we used hierarchical agglomerative clustering algorithm to obtain the disambiguation results. [Results] We examined the new model on AMiner-na dataset and found its average F1 score reached 68.78%, which was 1.81 percent points higher than the second best method. [Limitations] The proposed method focuses on the average disambiguation effect of all authors, and the disambiguation effect of some authors needs to be improved. [Conclusions] The proposed method can fully utilize the literature relation information, and effectively improve the effect of author name disambiguation.
崔焕庆, 杨峻铸, 宋玮情. 基于相似特征和关系图优化的姓名消歧*[J]. 数据分析与知识发现, 2023, 7(5): 71-80.
Cui Huanqing, Yang Junzhu, Song Weiqing. Name Disambiguation Based on Similar Features and Relation Graph Optimization. Data Analysis and Knowledge Discovery, 2023, 7(5): 71-80.
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 1002-1011.
[2]
Louppe G, Al-Natsheh H T, Susik M, et al. Ethnicity Sensitive Author Disambiguation Using Semi-Supervised Learning[C]// Proceedings of International Conference on Knowledge Engineering and the Semantic Web. Berlin, Heidelberg: Springer, 2016: 272-287.
[3]
Han H Q, Yao C Q, Fu Y S, et al. Semantic Fingerprints-based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017, 111: 1879-1896.
doi: 10.1007/s11192-017-2338-6
[4]
Silva J M B, Silva F. Feature Extraction for the Author Name Disambiguation Problem in a Bibliographic Database[C]// Proceedings of the 32nd ACM Symposium on Applied Computing. New York, USA: ACM, 2017: 783-789.
[5]
Fan C, Li Y. Chinese Personal Name Disambiguation Based on Clustering[J]. Wireless Communications & Mobile Computing, 2021, 2021(5): Article ID 3790176.
[6]
Fan X M, Wang J Y, Pu X, et al. On Graph-Based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): Article No. 10.
[7]
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2017: 1239-1248.
[8]
Xu J, Shen S Q, Li D S, et al. A Network-embedding Based Method for Author Disambiguation[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2018: 1735-1738.
[9]
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of 2019 IEEE International Conference on Big Data. Piscataway, USA: IEEE, 2019: 910-919.
[10]
Hussain I, Asghar S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity[J]. Arabian Journal for Science and Engineering, 2018, 43: 7421-7437.
doi: 10.1007/s13369-018-3099-0
(Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 48-59.)
(Deng Qiping, Chen Weijing, Ji Ling, et al. Author Name Disambiguation Based on Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.)
[13]
Ma Y Y, Wu Y L, Lu C Q. A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory[J]. Entropy, 2020, 22(4). https://doi.org/10.3390/e22040416.
doi: https://doi.org/10.3390/e22040416
[14]
Chen Y, Yuan H L, Liu T T, et al. Name Disambiguation Based on Graph Convolutional Network[J]. Scientific Programming, 2021, 2021(4). https://doi.org/10.1155/2021/5577692.
doi: https://doi.org/10.1155/2021/5577692
[15]
Pooja K M, Mondal S, Chandra J. Exploiting Similarities Across Multiple Dimensions for Author Name Disambiguation[J]. Scientometrics, 2021, 126(9): 7525-7560.
doi: 10.1007/s11192-021-04101-y
[16]
Xiong B, Bao P, Wu Y L. Learning Semantic and Relationship Joint Embedding for Author Name Disambiguation[J]. Neural Computing & Applications, 2021, 33(6): 1987-1998.
(Wang Ruolin, Niu Zhendong, Lin Qika, et al. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 13-24.)
(Sheng Xiaoguang, Wang Ying, Qian Li, et al. Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3442-3450.)
[19]
涂世文. 面向学术文献数据的同名作者消歧方法研究[D]. 上海: 华东师范大学, 2020.
[19]
(Tu Shiwen. A Study on Methods of Author Name Disambiguation in Academic Literature[D]. Shanghai: East China Normal University, 2020.)
[20]
Kim J, Kim J, Owen-Smith J. Ethnicity-based Name Partitioning for Author Name Disambiguation Using Supervised Machine Learning[J]. Journal of the Association for Information Science and Technology, 2021, 72: 979-994.
doi: 10.1002/asi.24459
pmid: 34414251
[21]
Kim J, Kim J. Effect of Forename String on Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020, 71: 839-855.
doi: 10.1002/asi.v71.7
[22]
Schroff F, Kalenichenko D, Philbin J. FaceNet: A Unified Embedding for Face Recognition and Clustering[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, USA: IEEE, 2015: 815-823.
(Wu Yongliang, Zhao Shuliang, Li Changjing, et al. Text Classification Method Based on TF-IDF and Cosine Similarity[J]. Journal of Chinese Information Processing, 2017, 31(5): 138-145.)
[24]
Name Disambiguation Dataset[EB/OL]. [2021-10-01]. https://www.aminer.cn/na-data.