Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (4): 60-68    DOI: 10.11925/infotech.2096-3467.2021.0805
Current Issue | Archive | Adv Search |
Author Name Disambiguation Based on Heterogeneous Information Network
Deng Qiping(),Chen Weijing,Ji Ling,Zhang Yu’e
Library of University of Electronic Science and Technology of China, Chengdu 611731, China
Download: PDF (825 KB)   HTML ( 24
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper tries to improve author name disambiguation with entity relationship data from academic literature. [Methods] First, we extracted multi-type nodes and their relationships from literature to construct a heterogeneous information network (HIN). Then, we applied representation learning to obtain the latent vectors of authors, and used clutering analysis to get a preliminary division. Finally, we merged several clusters based on strong rule matching to obtain the disambiguation. [Results] We examined the new model with dataset from the Web of Science. The K-Metric mean value was 0.842, a 63.18% increase over the baseline model. Without strong rule matching, the improvement also reached 34.69%. [Limitations] The proposed model requires citation information, which limited its application scenarios. [Conclusions] Our new method could effectively improve the performance of author name disambiguation.

Key wordsAuthor Name Disambiguation      Relational Data      Heterogeneous Information Network      Network Representation Learning     
Received: 06 August 2021      Published: 12 May 2022
ZTFLH:  TP391  
Fund:University of Electronic Science and Technology of China 2021 “Double First-Class” Construction Research Support Program(SYLYJ2021213)
Corresponding Authors: Deng Qiping,ORCID:0000-0001-7078-2026     E-mail: dengqp@uestc.edu.cn

Cite this article:

Deng Qiping, Chen Weijing, Ji Ling, Zhang Yu’e. Author Name Disambiguation Based on Heterogeneous Information Network. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0805     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I4/60

An Example of Heterogeneous Document Information Network
Framework of Author Name Disambiguation
作者姓名 相关论文量 真实作者数量
Hongbin Liang 179 10
Guorong Chen 267 14
Qi Hu 142 42
Jian Du 149 45
Xi Huang 233 45
Jia Xu 444 87
Experimental Dataset
作者姓名 本文方法 APV方法 Non-SFM方法
ACP AAP K-Metric ACP AAP K-Metric ACP AAP K-Metric
Hongbin Liang 0.987 0.911 0.948 0.889 0.237 0.459 0.938 0.379 0.596
Guorong Chen 0.918 0.848 0.882 0.911 0.161 0.383 0.919 0.286 0.513
Qi Hu 0.816 0.915 0.864 0.323 0.343 0.333 0.816 0.915 0.864
Jian Du 0.771 0.969 0.864 0.589 0.836 0.701 0.772 0.910 0.838
Xi Huang 0.821 0.817 0.819 0.758 0.534 0.636 0.688 0.728 0.708
Jia Xu 0.769 0.591 0.674 0.730 0.469 0.585 0.787 0.540 0.652
平均值 0.847 0.842 0.842 0.700 0.430 0.516 0.820 0.626 0.695
Comparison on Experimental Results
The Influence of Embedding Dimensions on Different Methods Performance
[1] 周慧, 赵中英, 李超. 面向异质信息网络的表示学习方法研究综述[J]. 计算机科学与探索, 2019, 13(7):1081-1093.
[1] ( Zhou Hui, Zhao Zhongying, Li Chao. Survey on Representation Learning Methods Oriented to Heterogeneous Information Network[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7):1081-1093.)
[2] Tang J, Qu M, Mei Q Z. PTE: Predictive Text Embedding Through Large-Scale Heterogeneous Text Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 1165-1174.
[3] 许海云, 董坤, 隗玲, 等. 科学计量中多源数据融合方法研究述评[J]. 情报学报, 2018, 37(3):318-328.
[3] ( Xu Haiyun, Dong Kun, Wei Ling, et al. Research on Multi-Source Data Fusion Method in Scientometrics[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3):318-328.)
[4] Dong Y X, Chawla N V, Swami A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144.
[5] Chen Y X, Wang C G. HINE: Heterogeneous Information Network Embedding[C]//Proceedings of the 22nd International Conference on Database Systems for Advanced Applications. 2017: 180-195.
[6] Fu T Y, Lee W C, Lei Z. HIN2Vec: Explore Meta-Paths in Heterogeneous Information Networks for Representation Learning[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1797-1806.
[7] Hussein R, Yang D Q, Cudré-Mauroux P. Are Meta-paths Necessary?: Revisiting Heterogeneous Graph Embeddings[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 437-446.
[8] Ma X, Wang R R, Zhang Y, et al. A Name Disambiguation Module for Intelligent Robotic Consultant in Industrial Internet of Things[J]. Mechanical Systems and Signal Processing, 2020, 136:106413.
doi: 10.1016/j.ymssp.2019.106413
[9] Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1239-1248.
[10] 余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3):48-59.
[10] ( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3):48-59.)
[11] Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020: 238-245.
[12] Qiao Z Y, Du Y, Fu Y J, et al. Unsupervised Author Disambiguation Using Heterogeneous Graph Convolutional Network Embedding[C]//Proceedings of 2019 IEEE International Conference on Big Data. 2019: 910-919.
[13] Hussain I, Asghar S. Incremental Author Name Disambiguation Using Author Profile Models and Self-Citations[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2019, 27(5):3665-3681.
[14] Zhao Z Q, Rollins J, Bai L G, et al. Incremental Author Name Disambiguation for Scientific Citation Data[C]//Proceedings of 2017 IEEE International Conference on Data Science and Advanced Analytics. 2017: 175-183.
[15] Frey B J, Dueck D. Clustering by Passing Messages Between Data Points[J]. Science, 2007, 315(5814):972-976.
doi: 10.1126/science.1136800
[16] Shin D, Kim T, Choi J, et al. Author Name Disambiguation Using a Graph Model with Node Splitting and Merging Based on Bibliographic Information[J]. Scientometrics, 2014, 100(1):15-50.
doi: 10.1007/s11192-014-1289-4
[17] Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1002-1011.
[1] Cao Simeng, Li Chunwang. Review of Studies on Incremental Name Disambiguation[J]. 数据分析与知识发现, 2022, 6(5): 10-19.
[2] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[3] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[4] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[5] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[6] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[7] Yu Chuanming,Zhong Yunci,Lin Aochen,An Lu. Author Name Disambiguation with Network Embedding[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[8] Wang Gensheng,Pan Fangzheng. Matrix Factorization Algorithm with Weighted Heterogeneous Information Network[J]. 数据分析与知识发现, 2020, 4(12): 76-84.
[9] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[10] Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An. Knowledge Representation Based on Deep Learning:Network Perspective[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[11] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[12] Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name[J]. 现代图书情报技术, 2015, 31(6): 57-63.
[13] Guo Shu. Research on Author Name Disambiguation Algorithm in the Literature Database[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
[14] Zhang Xiaofei,Cai Yaping,Liu Wei. Design and Implementation of an Intelligent Data Gathering System for Social Network Analysis
——Based on Web Data Mining Principle
[J]. 现代图书情报技术, 2009, (9): 64-69.
[15] An Lu. Comparative Research on General Relational Database with Fuzzy Database[J]. 现代图书情报技术, 2003, 19(5): 62-65.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn