[Objective] The paper tries to improve author name disambiguation with entity relationship data from academic literature. [Methods] First, we extracted multi-type nodes and their relationships from literature to construct a heterogeneous information network (HIN). Then, we applied representation learning to obtain the latent vectors of authors, and used clutering analysis to get a preliminary division. Finally, we merged several clusters based on strong rule matching to obtain the disambiguation. [Results] We examined the new model with dataset from the Web of Science. The K-Metric mean value was 0.842, a 63.18% increase over the baseline model. Without strong rule matching, the improvement also reached 34.69%. [Limitations] The proposed model requires citation information, which limited its application scenarios. [Conclusions] Our new method could effectively improve the performance of author name disambiguation.
邓启平, 陈卫静, 嵇灵, 张宇娥. 一种基于异质信息网络的学术文献作者重名消歧方法*[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
Deng Qiping, Chen Weijing, Ji Ling, Zhang Yu’e. Author Name Disambiguation Based on Heterogeneous Information Network. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.
( Zhou Hui, Zhao Zhongying, Li Chao. Survey on Representation Learning Methods Oriented to Heterogeneous Information Network[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7):1081-1093.)
[2]
Tang J, Qu M, Mei Q Z. PTE: Predictive Text Embedding Through Large-Scale Heterogeneous Text Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 1165-1174.
( Xu Haiyun, Dong Kun, Wei Ling, et al. Research on Multi-Source Data Fusion Method in Scientometrics[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3):318-328.)
[4]
Dong Y X, Chawla N V, Swami A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144.
[5]
Chen Y X, Wang C G. HINE: Heterogeneous Information Network Embedding[C]//Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017: 180-195.
[6]
Fu T Y, Lee W C, Lei Z. HIN2Vec: Explore Meta-Paths in Heterogeneous Information Networks for Representation Learning[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1797-1806.
[7]
Hussein R, Yang D Q, Cudré-Mauroux P. Are Meta-paths Necessary?: Revisiting Heterogeneous Graph Embeddings[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 437-446.
[8]
Ma X, Wang R R, Zhang Y, et al. A Name Disambiguation Module for Intelligent Robotic Consultant in Industrial Internet of Things[J]. Mechanical Systems and Signal Processing, 2020, 136:106413.
doi: 10.1016/j.ymssp.2019.106413
[9]
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1239-1248.
( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3):48-59.)
[11]
Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020: 238-245.
[12]
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]//Proceedings of 2019 IEEE International Conference on Big Data. 2019: 910-919.
[13]
Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-Citations[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2019, 27(5):3665-3681.
[14]
Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data[C]//Proceedings of 2017 IEEE International Conference on Data Science and Advanced Analytics. 2017: 175-183.
[15]
Frey B J, Dueck D. Clustering by Passing Messages Between Data Points[J]. Science, 2007, 315(5814):972-976.
doi: 10.1126/science.1136800
[16]
Shin D, Kim T, Choi J, et al. Author Name Disambiguation Using a Graph Model with Node Splitting and Merging Based on Bibliographic Information[J]. Scientometrics, 2014, 100(1):15-50.
doi: 10.1007/s11192-014-1289-4
[17]
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.