Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (5): 71-80    DOI: 10.11925/infotech.2096-3467.2022.0576
Current Issue | Archive | Adv Search |
Name Disambiguation Based on Similar Features and Relation Graph Optimization
Cui Huanqing1,2(),Yang Junzhu1,Song Weiqing1
1College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
2State Key Laboratory of High-end Server & Storage Technology, Inspur Group Co., Ltd., Jinan 250014, China
Download: PDF (938 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper aims to fully utilize the feature information and relation information of academic literature to improve author name disambiguation. [Methods] We proposed a name disambiguation method combining feature information embedding and relation graph optimization. First, we extracted feature information from literature and applied representation learning to obtain the embedding vectors. Then, we mined the relationship information between literatures, and also constructed four relation graphs to optimize the embedding vectors of each literature. Finally, we used hierarchical agglomerative clustering algorithm to obtain the disambiguation results. [Results] We examined the new model on AMiner-na dataset and found its average F1 score reached 68.78%, which was 1.81 percent points higher than the second best method. [Limitations] The proposed method focuses on the average disambiguation effect of all authors, and the disambiguation effect of some authors needs to be improved. [Conclusions] The proposed method can fully utilize the literature relation information, and effectively improve the effect of author name disambiguation.

Key wordsName Disambiguation      Feature Extraction      Representation Learning      Relation Extraction      Clustering     
Received: 05 June 2022      Published: 09 November 2022
ZTFLH:  TP391  
  G250  
Fund:Natural Science Foundation of Shandong Province(ZR2021LZH004)
Corresponding Authors: Cui Huanqing,ORCID:0000-0002-9251-680X,E-mail:cuihq@sdust.edu.cn。   

Cite this article:

Cui Huanqing, Yang Junzhu, Song Weiqing. Name Disambiguation Based on Similar Features and Relation Graph Optimization. Data Analysis and Knowledge Discovery, 2023, 7(5): 71-80.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0576     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I5/71

Name Disambiguation Framework
Feature Learning Model
文献信息 抽取关系 关系图
作者 共同作者 共同作者图
标题 相似标题 相似标题图
摘要 相似摘要 相似摘要图
关键字 相似关键字 相似关键字图
出版刊物 共同刊物 共同刊物图
作者单位 共同单位 共同单位图
年份 同年 共同年份图
Relation Extraction and Graph Construction
Disambiguation Results of Different Relation Graphs
关系 关系图 描述
共同作者 共同作者图 G n 表示文献之间存在相同合作作者
共同单位 共同单位图 G o 表示文献作者之间存在相同单位
共同刊物 共同刊物图 G u 表示两篇文献发表于同一刊物
相似专业词 相似专业图 G m 表示两篇文献存在相似专业词
Relation Type and Graph
作者姓名 本文方法(%) ADES(%) AMiner(%) ADNE(%) ReLU(%)
Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1 Pre Rec F1
Xu Xu 66.08 44.45 53.15 43.97 68.61 53.59 69.99 42.29 52.73 8.64 50.01 14.73 61.21 30.39 40.61
Rong Yu 82.79 46.00 59.14 39.48 77.58 52.33 68.74 38.15 49.07 29.06 87.13 43.59 83.03 33.32 47.56
Yong Tian 69.75 52.43 59.86 51.51 58.13 54.62 72.68 49.55 58.93 10.51 51.06 17.44 92.67 43.07 58.81
Lu Han 49.49 25.64 33.78 25.31 51.72 33.98 63.33 31.26 41.86 15.17 53.10 23.59 77.86 11.68 20.31
Lin Huang 67.42 32.60 43.95 52.37 84.72 64.72 80.35 31.42 45.18 10.52 38.44 16.52 99.49 22.80 37.10
Kexin Xu 88.47 98.98 93.42 92.09 90.78 91.43 83.79 53.32 65.17 71.61 97.71 82.65 90.93 67.86 77.72
Wei Quan 46.90 40.44 43.43 38.92 49.47 43.56 43.71 28.66 34.62 26.37 49.52 34.42 97.26 18.33 30.85
Tao Deng 73.00 39.38 51.16 42.01 76.06 54.12 77.05 42.66 54.92 15.99 64.97 25.66 80.55 13.72 23.44
Hongbin Li 68.20 75.74 71.77 58.66 73.90 65.40 72.59 60.22 65.83 10.36 60.69 17.71 63.72 29.90 40.70
Hua Bai 67.65 37.85 48.54 30.82 58.72 40.43 67.68 32.12 43.56 21.11 84.81 33.81 74.55 16.16 26.56
Meiling Chen 50.43 87.01 63.85 44.77 70.12 54.66 69.20 44.39 54.08 21.85 73.68 33.71 100.0 7.91 14.66
Yanqing Wang 29.91 66.88 41.33 72.73 64.82 68.54 63.60 59.22 61.33 15.52 57.80 24.46 100.0 25.97 41.24
Xudong Zhang 60.61 21.48 31.72 21.16 62.87 31.66 82.38 57.50 67.72 57.97 61.42 59.64 90.56 4.59 8.73
Qiang Shi 45.40 37.03 40.79 38.23 54.53 44.85 52.68 34.82 41.93 10.57 32.04 15.90 45.92 28.60 35.25
Min Zheng 48.79 20.64 29.01 18.75 55.34 28.01 66.43 18.24 28.62 13.76 50.87 21.66 87.41 10.68 19.03
平均 72.47 65.45 68.78 60.19 75.48 66.97 74.25 54.07 62.58 38.66 69.20 49.61 81.47 41.02 54.56
Experimental Result of Name Disambiguation
Experiment Results of Embedding Dimension
Comparative Experiment of Clustering Algorithm
不同阶段 Pre/% Rec/% F1/%
Embedding 38.65 25.19 30.50
Feature 71.03 49.62 58.42
Feature+Graph 72.47 65.45 68.78
Validity Analysis of Different Stages
[1] Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 1002-1011.
[2] Louppe G, Al-Natsheh H T, Susik M, et al. Ethnicity Sensitive Author Disambiguation Using Semi-Supervised Learning[C]// Proceedings of International Conference on Knowledge Engineering and the Semantic Web. Berlin, Heidelberg: Springer, 2016: 272-287.
[3] Han H Q, Yao C Q, Fu Y S, et al. Semantic Fingerprints-based Author Name Disambiguation in Chinese Documents[J]. Scientometrics, 2017, 111: 1879-1896.
doi: 10.1007/s11192-017-2338-6
[4] Silva J M B, Silva F. Feature Extraction for the Author Name Disambiguation Problem in a Bibliographic Database[C]// Proceedings of the 32nd ACM Symposium on Applied Computing. New York, USA: ACM, 2017: 783-789.
[5] Fan C, Li Y. Chinese Personal Name Disambiguation Based on Clustering[J]. Wireless Communications & Mobile Computing, 2021, 2021(5): Article ID 3790176.
[6] Fan X M, Wang J Y, Pu X, et al. On Graph-Based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): Article No. 10.
[7] Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2017: 1239-1248.
[8] Xu J, Shen S Q, Li D S, et al. A Network-embedding Based Method for Author Disambiguation[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2018: 1735-1738.
[9] Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]// Proceedings of 2019 IEEE International Conference on Big Data. Piscataway, USA: IEEE, 2019: 910-919.
[10] Hussain I, Asghar S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity[J]. Arabian Journal for Science and Engineering, 2018, 43: 7421-7437.
doi: 10.1007/s13369-018-3099-0
[11] 余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[11] (Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 48-59.)
[12] 邓启平, 陈卫静, 嵇灵, 等. 一种基于异质信息网络的学术文献作者重名消歧方法[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[12] (Deng Qiping, Chen Weijing, Ji Ling, et al. Author Name Disambiguation Based on Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.)
[13] Ma Y Y, Wu Y L, Lu C Q. A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory[J]. Entropy, 2020, 22(4). https://doi.org/10.3390/e22040416.
doi: https://doi.org/10.3390/e22040416
[14] Chen Y, Yuan H L, Liu T T, et al. Name Disambiguation Based on Graph Convolutional Network[J]. Scientific Programming, 2021, 2021(4). https://doi.org/10.1155/2021/5577692.
doi: https://doi.org/10.1155/2021/5577692
[15] Pooja K M, Mondal S, Chandra J. Exploiting Similarities Across Multiple Dimensions for Author Name Disambiguation[J]. Scientometrics, 2021, 126(9): 7525-7560.
doi: 10.1007/s11192-021-04101-y
[16] Xiong B, Bao P, Wu Y L. Learning Semantic and Relationship Joint Embedding for Author Name Disambiguation[J]. Neural Computing & Applications, 2021, 33(6): 1987-1998.
[17] 王若琳, 牛振东, 蔺奇卡, 等. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[17] (Wang Ruolin, Niu Zhendong, Lin Qika, et al. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 13-24.)
[18] 盛晓光, 王颖, 钱力, 等. 基于图卷积半监督学习的论文作者同名消歧方法研究[J]. 电子与信息学报, 2021, 43(12): 3442-3450.
[18] (Sheng Xiaoguang, Wang Ying, Qian Li, et al. Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3442-3450.)
[19] 涂世文. 面向学术文献数据的同名作者消歧方法研究[D]. 上海: 华东师范大学, 2020.
[19] (Tu Shiwen. A Study on Methods of Author Name Disambiguation in Academic Literature[D]. Shanghai: East China Normal University, 2020.)
[20] Kim J, Kim J, Owen-Smith J. Ethnicity-based Name Partitioning for Author Name Disambiguation Using Supervised Machine Learning[J]. Journal of the Association for Information Science and Technology, 2021, 72: 979-994.
doi: 10.1002/asi.24459 pmid: 34414251
[21] Kim J, Kim J. Effect of Forename String on Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020, 71: 839-855.
doi: 10.1002/asi.v71.7
[22] Schroff F, Kalenichenko D, Philbin J. FaceNet: A Unified Embedding for Face Recognition and Clustering[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, USA: IEEE, 2015: 815-823.
[23] 武永亮, 赵书良, 李长镜, 等. 基于TF-IDF和余弦相似度的文本分类方法[J]. 中文信息学报, 2017, 31(5): 138-145.
[23] (Wu Yongliang, Zhao Shuliang, Li Changjing, et al. Text Classification Method Based on TF-IDF and Cosine Similarity[J]. Journal of Chinese Information Processing, 2017, 31(5): 138-145.)
[24] Name Disambiguation Dataset[EB/OL]. [2021-10-01]. https://www.aminer.cn/na-data.
[1] Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[2] Xie Zhen, Ma Jianxia, Hu Wenjing. Mapping and Analyzing Personal Academic Trajectory from Multiple Dimensions[J]. 数据分析与知识发现, 2023, 7(2): 129-140.
[3] Cao Zhe, Guo Huilan, Wu Jiang, Hu Zhongyi. The Ideal and Reality of Metaverse: User Perception of VR Products Based on Review Mining[J]. 数据分析与知识发现, 2023, 7(1): 49-62.
[4] Cui Ji, Zhang Jinpeng, Bao Zhou, Ding Shengchun. Forecasting Developments of Core Topics in Science and Technology with Trend Analysis[J]. 数据分析与知识发现, 2022, 6(9): 1-13.
[5] Zhang Junliang, Fang Xuemei, Zhang Fan, Liu Xiwen, Zhu Peng. Analyzing Medical Semantic Association with Complex Network[J]. 数据分析与知识发现, 2022, 6(9): 125-137.
[6] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[7] Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[8] Xue Jingjing, Qin Yongbin, Huang Ruizhang, Ren Lina, Chen Yanping. SSVAE: A Deep Variational Text Clustering Model with Semantic Supplementation[J]. 数据分析与知识发现, 2022, 6(6): 71-83.
[9] Hu Jiming, Zheng Xiang. Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering[J]. 数据分析与知识发现, 2022, 6(6): 95-104.
[10] Cao Simeng, Li Chunwang. Review of Studies on Incremental Name Disambiguation[J]. 数据分析与知识发现, 2022, 6(5): 10-19.
[11] Zhou Qian, Yao Zhen, Sun Bo. Under-sampling Algorithm with Weighted Distance Based on Adaptive K-Means Clustering[J]. 数据分析与知识发现, 2022, 6(5): 127-136.
[12] Guo Lei, Liu Wenju, Wang Ze, Ren Yueqiang. Point-of-Interest Recommendation with Spectral Clustering and Multi-Factors[J]. 数据分析与知识发现, 2022, 6(5): 77-88.
[13] Feng Yong, Xu Wentao, Wang Rongbing, Xu Hongyan, Zhang Yonggang. User Community Partition Based on Multi-layer Information Fusion in E-commerce Heterogeneous Network[J]. 数据分析与知识发现, 2022, 6(5): 89-98.
[14] Deng Qiping, Chen Weijing, Ji Ling, Zhang Yu’e. Author Name Disambiguation Based on Heterogeneous Information Network[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[15] Nie Hui, Wu Xiaoyan, Lin Yun. Clustering and Characterizing Depression Patients Based on Online Medical Records[J]. 数据分析与知识发现, 2022, 6(2/3): 222-232.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn