|
|
Name Disambiguation Based on Similar Features and Relation Graph Optimization |
Cui Huanqing1,2(),Yang Junzhu1,Song Weiqing1 |
1College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China 2State Key Laboratory of High-end Server & Storage Technology, Inspur Group Co., Ltd., Jinan 250014, China |
|
|
Abstract [Objective] The paper aims to fully utilize the feature information and relation information of academic literature to improve author name disambiguation. [Methods] We proposed a name disambiguation method combining feature information embedding and relation graph optimization. First, we extracted feature information from literature and applied representation learning to obtain the embedding vectors. Then, we mined the relationship information between literatures, and also constructed four relation graphs to optimize the embedding vectors of each literature. Finally, we used hierarchical agglomerative clustering algorithm to obtain the disambiguation results. [Results] We examined the new model on AMiner-na dataset and found its average F1 score reached 68.78%, which was 1.81 percent points higher than the second best method. [Limitations] The proposed method focuses on the average disambiguation effect of all authors, and the disambiguation effect of some authors needs to be improved. [Conclusions] The proposed method can fully utilize the literature relation information, and effectively improve the effect of author name disambiguation.
|
Received: 05 June 2022
Published: 09 November 2022
|
|
Fund:Natural Science Foundation of Shandong Province(ZR2021LZH004) |
Corresponding Authors:
Cui Huanqing,ORCID:0000-0002-9251-680X,E-mail:cuihq@sdust.edu.cn。
|
[1] |
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 1002-1011.
|
[2] |
Louppe G, Al-Natsheh H T, Susik M, et al. Ethnicity Sensitive Author Disambiguation Using Semi-Supervised Learning[C]// Proceedings of International Conference on Knowledge Engineering and the Semantic Web. Berlin, Heidelberg: Springer, 2016: 272-287.
|
[3] |
Han H Q, Yao C Q, Fu Y S, et al. Semantic Fingerprints-based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017, 111: 1879-1896.
doi: 10.1007/s11192-017-2338-6
|
[4] |
Silva J M B, Silva F. Feature Extraction for the Author Name Disambiguation Problem in a Bibliographic Database[C]// Proceedings of the 32nd ACM Symposium on Applied Computing. New York, USA: ACM, 2017: 783-789.
|
[5] |
Fan C, Li Y. Chinese Personal Name Disambiguation Based on Clustering[J]. Wireless Communications & Mobile Computing, 2021, 2021(5): Article ID 3790176.
|
[6] |
Fan X M, Wang J Y, Pu X, et al. On Graph-Based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): Article No. 10.
|
[7] |
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2017: 1239-1248.
|
[8] |
Xu J, Shen S Q, Li D S, et al. A Network-embedding Based Method for Author Disambiguation[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2018: 1735-1738.
|
[9] |
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of 2019 IEEE International Conference on Big Data. Piscataway, USA: IEEE, 2019: 910-919.
|
[10] |
Hussain I, Asghar S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity[J]. Arabian Journal for Science and Engineering, 2018, 43: 7421-7437.
doi: 10.1007/s13369-018-3099-0
|
[11] |
余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
|
[11] |
(Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 48-59.)
|
[12] |
邓启平, 陈卫静, 嵇灵, 等. 一种基于异质信息网络的学术文献作者重名消歧方法[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
|
[12] |
(Deng Qiping, Chen Weijing, Ji Ling, et al. Author Name Disambiguation Based on Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.)
|
[13] |
Ma Y Y, Wu Y L, Lu C Q. A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory[J]. Entropy, 2020, 22(4). https://doi.org/10.3390/e22040416.
doi: https://doi.org/10.3390/e22040416
|
[14] |
Chen Y, Yuan H L, Liu T T, et al. Name Disambiguation Based on Graph Convolutional Network[J]. Scientific Programming, 2021, 2021(4). https://doi.org/10.1155/2021/5577692.
doi: https://doi.org/10.1155/2021/5577692
|
[15] |
Pooja K M, Mondal S, Chandra J. Exploiting Similarities Across Multiple Dimensions for Author Name Disambiguation[J]. Scientometrics, 2021, 126(9): 7525-7560.
doi: 10.1007/s11192-021-04101-y
|
[16] |
Xiong B, Bao P, Wu Y L. Learning Semantic and Relationship Joint Embedding for Author Name Disambiguation[J]. Neural Computing & Applications, 2021, 33(6): 1987-1998.
|
[17] |
王若琳, 牛振东, 蔺奇卡, 等. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
|
[17] |
(Wang Ruolin, Niu Zhendong, Lin Qika, et al. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 13-24.)
|
[18] |
盛晓光, 王颖, 钱力, 等. 基于图卷积半监督学习的论文作者同名消歧方法研究[J]. 电子与信息学报, 2021, 43(12): 3442-3450.
|
[18] |
(Sheng Xiaoguang, Wang Ying, Qian Li, et al. Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3442-3450.)
|
[19] |
涂世文. 面向学术文献数据的同名作者消歧方法研究[D]. 上海: 华东师范大学, 2020.
|
[19] |
(Tu Shiwen. A Study on Methods of Author Name Disambiguation in Academic Literature[D]. Shanghai: East China Normal University, 2020.)
|
[20] |
Kim J, Kim J, Owen-Smith J. Ethnicity-based Name Partitioning for Author Name Disambiguation Using Supervised Machine Learning[J]. Journal of the Association for Information Science and Technology, 2021, 72: 979-994.
doi: 10.1002/asi.24459
pmid: 34414251
|
[21] |
Kim J, Kim J. Effect of Forename String on Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020, 71: 839-855.
doi: 10.1002/asi.v71.7
|
[22] |
Schroff F, Kalenichenko D, Philbin J. FaceNet: A Unified Embedding for Face Recognition and Clustering[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, USA: IEEE, 2015: 815-823.
|
[23] |
武永亮, 赵书良, 李长镜, 等. 基于TF-IDF和余弦相似度的文本分类方法[J]. 中文信息学报, 2017, 31(5): 138-145.
|
[23] |
(Wu Yongliang, Zhao Shuliang, Li Changjing, et al. Text Classification Method Based on TF-IDF and Cosine Similarity[J]. Journal of Chinese Information Processing, 2017, 31(5): 138-145.)
|
[24] |
Name Disambiguation Dataset[EB/OL]. [2021-10-01]. https://www.aminer.cn/na-data.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|