|
|
Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters |
Wang Ruolin1,Niu Zhendong1,2(),Lin Qika3,Zhu Yifan1,Qiu Ping1,Lu Hao4,Liu Donglei1 |
1School of Computer, Beijing Institute of Technology, Beijing 100081, China 2Beijing Institute of Technology Library, Beijing 100081, China 3School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China 4Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.
|
Received: 12 March 2021
Published: 15 September 2021
|
|
Fund:National Key R&D Program of China(2019YFB1406302) |
Corresponding Authors:
Niu Zhendong ORCID:0000-0002-0576-7572
E-mail: zniu@bit.edu.cn
|
[1] |
Bekkerman R, McCallum A. Disambiguating Web Appearances of People in a Social Network[C]// Proceedings of the 14th International Conference on World Wide Web. 2005: 463-470.
|
[2] |
Hermansson L, Kerola T, Johansson F, et al. Entity Disambiguation in Anonymized Graphs Using Graph Kernels[C]// Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013: 1037-1046.
|
[3] |
Kanani P, McCallum A, Pal C. Improving Author Coreference by Resource-bounded Information Gathering from the Web[C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence. 2007: 429-434.
|
[4] |
Steorts R C, Ventura S L, Sadinle M, et al. A Comparison of Blocking Methods for Record Linkage[C]// Proceedings of International Conference on Privacy in Statistical Databases. Springer International Publishing, 2014: 253-268.
|
[5] |
Yoshida M, Ikeda M, Ono S, et al. Person Name Disambiguation by Bootstrapping[C]// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010: 10-17.
|
[6] |
付媛, 朱礼军, 韩红旗. 姓名消歧方法研究进展[J]. 情报工程, 2016, 2(1):53-58.
|
[6] |
( Fu Yuan, Zhu Lijun, Han Hongqi. A Survey of Name Disambiguation[J]. Technology Intelligence Engineering, 2016, 2(1):53-58.)
|
[7] |
Tang J, Fong A C M, Wang B, et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6):975-987.
doi: 10.1109/TKDE.2011.13
|
[8] |
Han H, Giles L, Zha H Y, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. 2004: 296-305.
|
[9] |
Sain S R. The Nature of Statistical Learning Theory[J]. Technometrics, 1996, 38(4):409.
|
[10] |
Huang J, Ertekin S, Giles C L. Efficient Name Disambiguation for Large-Scale Databases[C]// Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. 2006: 536-544.
|
[11] |
Lee D, On B W, Kang J, et al. Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries[C]// Proceedings of the 2nd International Workshop on Information Quality in Information Systems. 2005: 69-76.
|
[12] |
Zhang B C, Hasan M A. Name Disambiguation in Anonymized Graphs Using Network Embedding[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 1239-1248.
|
[13] |
余传明, 钟韵辞, 林奥琛, 等. 基于网络表示学习的作者重名消歧研究[J]. 数据分析与知识发现, 2020, 4(2/3):48-59.
|
[13] |
( Yu Chuanming, Zhong Yunci, Lin Aochen, et al. Author Name Disambiguation with Network Embedding[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3):48-59.)
|
[14] |
沈喆, 王毅, 姚毅凡, 等. 面向学术文献的作者名消歧方法研究综述[J]. 数据分析与知识发现, 2020, 4(8):15-27.
|
[14] |
( Shen Zhe, Wang Yi, Yao Yifan, et al. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):15-27.)
|
[15] |
Wang H W, Wang R J, Wen C, et al. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 238-245.
|
[16] |
Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 701-710.
|
[17] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the International Conference on Learning Representations. 2013.
|
[18] |
Grover A, Leskovec J. Node2Vec: Scalable Feature Learning for Networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 855-864.
|
[19] |
Shi C, Li Y T, Zhang J W, et al. A Survey of Heterogeneous Information Network Analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1):17-37.
doi: 10.1109/TKDE.2016.2598561
|
[20] |
Chang S Y, Han W, Tang J L, et al. Heterogeneous Network Embedding via Deep Architectures[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 119-128.
|
[21] |
Yun S, Jeong M, Kim R, et al. Graph Transformer Networks[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019: 11960-11970.
|
[22] |
Wang X, Ji H Y, Shi C, et al. Heterogeneous Graph Attention Network[C]// Proceedings of the 2019 International Conference on World Wide Web. 2019: 2022-2032.
|
[23] |
Shi C, Hu B B, Zhao W X, et al. Heterogeneous Information Network Embedding for Recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2):357-370.
doi: 10.1109/TKDE.2018.2833443
|
[24] |
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on International Conference on Machine Learning. 2014: 1188-1196.
|
[25] |
Tang J, Qu M, Wang M Z, et al. LINE: Large-scale Information Network Embedding[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 1067-1077.
|
[26] |
Tenenbaum J B, Silva V D, Langford J C. A Global Geometric Framework for Nonlinear Dimensionality Reduction[J]. Science, 2000, 290(5500):2319-2323.
pmid: 11125149
|
[27] |
Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001: 585-591.
|
[28] |
Pelleg D, Moore A W. X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters[C]// Proceedings of the 17th International Conference on Machine Learning. 2000: 727-734.
|
[29] |
Zhang Y T, Zhang F J, Yao P R, et al. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2018: 1002-1011.
|
[30] |
Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1724-1734.
|
[31] |
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
|
[32] |
Fan X M, Wang J Y, Pu X, et al. On Graph-Based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2):Article No.10.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|