|
|
Author Name Disambiguation Based on Heterogeneous Information Network |
Deng Qiping(),Chen Weijing,Ji Ling,Zhang Yu’e |
Library of University of Electronic Science and Technology of China, Chengdu 611731, China |
|
|
Abstract [Objective] The paper tries to improve author name disambiguation with entity relationship data from academic literature. [Methods] First, we extracted multi-type nodes and their relationships from literature to construct a heterogeneous information network (HIN). Then, we applied representation learning to obtain the latent vectors of authors, and used clutering analysis to get a preliminary division. Finally, we merged several clusters based on strong rule matching to obtain the disambiguation. [Results] We examined the new model with dataset from the Web of Science. The K-Metric mean value was 0.842, a 63.18% increase over the baseline model. Without strong rule matching, the improvement also reached 34.69%. [Limitations] The proposed model requires citation information, which limited its application scenarios. [Conclusions] Our new method could effectively improve the performance of author name disambiguation.
|
Received: 06 August 2021
Published: 12 May 2022
|
|
Fund:University of Electronic Science and Technology of China 2021 “Double First-Class” Construction Research Support Program(SYLYJ2021213) |
Corresponding Authors:
Deng Qiping,ORCID:0000-0001-7078-2026
E-mail: dengqp@uestc.edu.cn
|
[1] |
周慧, 赵中英, 李超. 面向异质信息网络的表示学习方法研究综述[J]. 计算机科学与探索, 2019, 13(7):1081-1093.
|
[1] |
( Zhou Hui, Zhao Zhongying, Li Chao. Survey on Representation Learning Methods Oriented to Heterogeneous Information Network[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7):1081-1093.)
|
[2] |
Tang J, Qu M, Mei Q Z. PTE: Predictive Text Embedding Through Large-Scale Heterogeneous Text Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 1165-1174.
|
[3] |
许海云, 董坤, 隗玲, 等. 科学计量中多源数据融合方法研究述评[J]. 情报学报, 2018, 37(3):318-328.
|
[3] |
( Xu Haiyun, Dong Kun, Wei Ling, et al. Research on Multi-Source Data Fusion Method in Scientometrics[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3):318-328.)
|
[4] |
Dong Y X, Chawla N V, Swami A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144.
|
[5] |
Chen Y X, Wang C G. HINE: Heterogeneous Information Network Embedding[C]//Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017: 180-195.
|
[6] |
Fu T Y, Lee W C, Lei Z. HIN2Vec: Explore Meta-Paths in Heterogeneous Information Networks for Representation Learning[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1797-1806.
|
[7] |
Hussein R, Yang D Q, Cudré-Mauroux P. Are Meta-paths Necessary?: Revisiting Heterogeneous Graph Embeddings[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 437-446.
|
[8] |
Ma X, Wang R R, Zhang Y, et al. A Name Disambiguation Module for Intelligent Robotic Consultant in Industrial Internet of Things[J]. Mechanical Systems and Signal Processing, 2020, 136:106413.
doi: 10.1016/j.ymssp.2019.106413
|
[9] |
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1239-1248.
|
[10] |
余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3):48-59.
|
[10] |
( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3):48-59.)
|
[11] |
Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020: 238-245.
|
[12] |
Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]//Proceedings of 2019 IEEE International Conference on Big Data. 2019: 910-919.
|
[13] |
Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-Citations[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2019, 27(5):3665-3681.
|
[14] |
Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data[C]//Proceedings of 2017 IEEE International Conference on Data Science and Advanced Analytics. 2017: 175-183.
|
[15] |
Frey B J, Dueck D. Clustering by Passing Messages Between Data Points[J]. Science, 2007, 315(5814):972-976.
doi: 10.1126/science.1136800
|
[16] |
Shin D, Kim T, Choi J, et al. Author Name Disambiguation Using a Graph Model with Node Splitting and Merging Based on Bibliographic Information[J]. Scientometrics, 2014, 100(1):15-50.
doi: 10.1007/s11192-014-1289-4
|
[17] |
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|