|
|
Author Name Disambiguation with Network Embedding |
Yu Chuanming1(),Zhong Yunci1,Lin Aochen1,An Lu2 |
1School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] The paper tries to eliminate the ambiguity of author names in the document system, aiming to solve the problem of incorrect document aggregation.[Methods] First, we constructed three types of networks for authors, documents and author-documents, with structured document data. Then we combined different network embedding methods to obtain the representation of document nodes. Finally, we employed the unsupervised learning model and the hierarchical agglomerative clustering to process the documents.[Results] We conducted empirical studies on datasets from ArnetMiner, CiteSeerX and DBLP. Our method performed well on sparse networks and the macro-F1 value increased by 6%.[Limitations] We only explored author name disambiguation in English.[Conclusions] The proposed method could effectively reduce the ambiguity of author names. It is of great significance for scientific collaboration and citation recommendation, as well as knowledge network related research.
|
Received: 11 June 2019
Published: 26 April 2020
|
|
Corresponding Authors:
Yu Chuanming
E-mail: yuchuanming2003@126.com
|
[1] |
章顺瑞, 游宏梁 . 现代图书情报技术[J]. 现代图书情报技术, 2010(11):64-68.
|
[1] |
( Zhang Shunrui, You Hongliang . Chinese People Name Disambiguation by Hierarchical Clustering[J]. New Technology of Library and Information Service, 2010(11):64-68.)
|
[2] |
肖晶, 梁冰, 张晓丹 , 等. 现代图书情报技术[J]. 现代图书情报技术, 2012(5):55-59.
|
[2] |
( Xiao Jing, Liang Bing, Zhang Xiaodan , et al. Author Disambiguation Rules and Algorithm for Article Level Data[J]. New Technology of Library and Information Service, 2012(5):55-59.)
|
[3] |
刘斌, 赵升, 孙笑明 , 等. 我国专利数据中发明家姓名消歧算法研究[J]. 情报学报, 2016,35(4):405-414.
|
[3] |
( Liu Bin, Zhao Sheng, Sun Xiaoming , et al. Research on Inventors’ Names Disambiguation Algorithm in Chinese Patent Data[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(4):405-414.)
|
[4] |
周杰, 李弼程, 唐永旺 . 基于关键证据与E 2LSH的增量式人名聚类消歧方法 [J]. 情报学报, 2016,35(7):714-722.
|
[4] |
( Zhou Jie, Li Bicheng, Tang Yongwang . Incremental Clustering Method Based on Key Evidence and E 2LSH for Person Name Disambiguation [J]. Journal of the China Society for Scientific and Technical Information, 2016,35(7):714-722.)
|
[5] |
郭舒 . 现代图书情报技术[J]. 现代图书情报技术, 2013(7/8):69-74.
|
[5] |
( Guo Shu . Research on Author Name Disambiguation Algorithm in the Literature Database[J]. New Technology of Library and Information Service, 2013(7/8):69-74.)
|
[6] |
Han H, Giles L, Zha H , et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations [C]//Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital libraries, Tucson, Arizona, USA. New York, USA: ACM, 2004: 296-305.
|
[7] |
Giles C L, Zha H, Han H . Name Disambiguation in Author Citations Using a K-way Spectral Clustering Method [C]//Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Denver, Colorado, USA. New York, USA: ACM, 2005: 334-343.
|
[8] |
Tang J, Fong A C M, Wang B , et al. A Unified Probabilistic Framework for Name Disambiguation in Digital Library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012,24(6):975-987.
|
[9] |
Hermansson L, Kerola T, Johansson F , et al. Entity Disambiguation in Anonymized Graphs Using Graph Kernels [C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, California, USA. New York, USA: ACM, 2013: 1037-1046.
|
[10] |
Saha T K, Zhang B, Hasan M A . Name Disambiguation from Link Data in a Collaboration Graph Using Temporal and Topological Features[J]. Social Network Analysis and Mining, 2015,5(1):1-14.
|
[11] |
涂存超, 杨成, 刘知远 , 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):32-48.
|
[11] |
( Tu Cunchao, Yang Cheng, Liu Zhiyuan , et al. Network Representation Learning: An Overview[J]. Scientia Sinica (Informationis), 2017,47(8):32-48.)
|
[12] |
Perozzi B, Al-Rfou R, Skiena S . DeepWalk: Online Learning of Social Representations [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA. 2014: 701-710.
|
[13] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
|
[14] |
Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA. USA: Curran Associates, 2013: 3111-3119.
|
[15] |
Grover A, Leskovec J . Node2vec: Scalable Feature Learning for Networks [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. 2016: 855-864.
|
[16] |
陈丽, 朱裴松, 钱铁云 , 等. 基于边采样的网络表示学习模型[J]. 软件学报, 2018,29(3):756-771.
|
[16] |
( Chen Li, Zhu Peisong, Qian Tieyun , et al. Edge Sampling Based Network Embedding Model[J]. Journal of Software, 2018,29(3):756-771.)
|
[17] |
Tang J, Qu M, Wang M , et al. LINE: Large-scale Information Network Embedding [C]// Proceedings of the 24th International Conference on World Wide Web, Florence, Italy. 2015: 1067-1077.
|
[18] |
Wang D, Peng C, Zhu W . Structural Deep Network Embedding [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. 2016: 1225-1234.
|
[19] |
Yang C, Liu Z, Zhao D , et al. Network Representation Learning with Rich Text Information[C]// Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina. San Francisco, California, USA: AAAI Press, 2015: 2111-2117.
|
[20] |
Tu C, Liu H, Liu Z , et al. CANE: Context-Aware Network Embedding for Relation Modeling [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. ACL, 2017: 1722-1731.
|
[21] |
刘正铭, 马宏, 刘树新 , 等. 一种融合节点文本属性信息的网络表示学习算法[J]. 计算机工程, 2018,44(11):165-171.
|
[21] |
( Liu Zhengming, Ma Hong, Liu Shuxin , et al. A Network Representation Learning Algorithm Fusing with Textual Attribute Information of Nodes[J]. Computer Engineering, 2018,44(11):165-171.)
|
[22] |
ArnetMiner Name Disambiguation Dataset [EB/OL]. [2019-01-01].https://www.aminer.cn/disambiguation.
|
[23] |
CiteSeerX Name Disambiguation Dataset [EB/OL]. [2019-01-01]. http://clgiles.ist.psu.edu/data/.
|
[24] |
Xu J, Shen S Q, Li D S , et al. A Network-embedding Based Method for Author Disambiguation [C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy. New York, USA: ACM, 2018: 1735-1738.
|
[25] |
Zhang B, Hasan M A . Name Disambiguation in Anonymized Graphs Using Network Embedding [C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management, Singapore. New York, USA: ACM, 2017: 1239-1248.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|