|
|
Disambiguation of Chinese Author Names with Multiple Features |
Lin Kerou,Wang Hao(),Gong Lijuan,Zhang Baolong |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This paper aims to address the issues facing document management systems due to Chinese authors with the same names. [Methods] We built author entities with “author name + institution name” based on bibliographic data. Then, we used the attributes of author entities to construct six similarity features from three aspects. Third, we merged these features by principal component analysis or direct weight assignment. Finally, we evaluated the performance of the proposed method. [Results] Our methods significantly reduced processing time. Their F1 values on the LIS dataset were 70.74% and 70.42%, while their F1 values on the economics dataset were 81.90% and 80.93%. [Limitations] The attributes used in this research were only retrieved from metadata of the papers. [Conclusions] The proposed method could improve weight setting of multiple features.
|
Received: 08 June 2020
Published: 10 October 2020
|
|
Fund:“Six Talent Peaks” Project in Jiangsu Province(JY-001);Jiangsu Young Talents in Social Sciences, the Tang Scholar of Nanjing University |
Corresponding Authors:
Wang Hao
E-mail: ywhaowang@nju.edu.cn
|
[1] |
Strotmann A, Zhao D Z. Author Name Disambiguation: What Difference does It Make in Author-Based Citation Analysis?[J]. Journal of the American Society for Information Science & Technology, 2012,63(9):1820-1833.
|
[2] |
Kang I S, Na S H, Lee S, et al. On Co-Authorship for Author Disambiguation[J]. Information Processing & Management, 2009,45(1):84-97.
doi: 10.1016/j.ipm.2008.06.006
|
[3] |
朱云霞. 中文文献题录数据作者重名消解问题研究[J]. 图书情报工作, 2014,58(23):143-148, 142.
|
[3] |
( Zhu Yunxia. Study on Author Name Disambiguation for Chinese Bibliographic Information[J]. Library and Information Service, 2014,58(23):143-148,142.)
|
[4] |
于夏薇. 基于唯一性特征的中文论文作者姓名消歧实证研究[D]. 北京: 中国科学技术信息研究所, 2017.
|
[4] |
( Yu Xiawei. An Empirical Study of Disambiguation Based on Uniqueness of Chinese Authors Name[D]. Beijing: Institute of Scientific and Technical Information of China, 2017.)
|
[5] |
Haak L L, Fenner M, Paglione L, et al. ORCID: A System to Uniquely Identify Researchers[J]. Learned Publishing, 2012,25(4):259-264.
doi: 10.1087/20120404
|
[6] |
Youtie J, Carley S, Porter A L, et al. Tracking Researchers and Their Outputs: New Insights from ORCIDs[J]. Scientometrics, 2017,113(1):437-453.
doi: 10.1007/s11192-017-2473-0
|
[7] |
Sanyal D K, Bhowmick P K, Das P P. A Review of Author Name Disambiguation Techniques for the PubMed Bibliographic Database[J/OL]. Journal of Information Science. (2019-12-01). [2020-06-01]. https://doi.org/10.1177/0165551519888605.
|
[8] |
陈嘉勇, 周婕, 李玲, 等. 基于文献实体关系模型的高校机构知识库作者认领模式研究[J]. 情报理论与实践, 2015,38(2):59-63.
|
[8] |
( Chen Jiayong, Zhuo Jie, Li Ling, et al. Research on Author Claim Pattern for University Institutional Repository Based on Paper-Entity Relationship Model[J]. Information Studies: Theory & Application, 2015,38(2):59-63.)
|
[9] |
D’Angelo C A, van Eck N J. Collecting Large-Scale Publication Data at the Level of Individual Researchers: A Practical Proposal for Author Name Disambiguation[J]. Scientometrics, 2020,123(2):883-907.
doi: 10.1007/s11192-020-03410-y
|
[10] |
刘巍, 祝忠明, 张旺强, 等. 机构知识库中作者标识与作品认领机制的研究与实现[J]. 现代图书情报技术, 2014(3):8-13.
|
[10] |
( Liu Wei, Zhu Zhongming, Zhang Wangqiang, et al. Development and Research of Author Identifier and Item Claim Service for Institutional Repository[J]. New Technology of Library and Information Service, 2014(3):8-13.)
|
[11] |
张旺强, 祝忠明, 李雅梅, 等. 机构知识库作者名自动消歧框架设计与实践[J]. 数据分析与知识发现, 2019,3(6):92-98.
|
[11] |
( Zhang Wangqiang, Zhu Zhongming, Li Yamei, et al. Disambiguating Author Names Automatically for Institutional Repository[J]. Data Analysis and Knowledge Discovery, 2019,3(6):92-98.)
|
[12] |
孙笑明, 李瑶, 王成军, 等. 基于专家研讨思想的发明人姓名消歧研究[J]. 情报科学, 2019,37(4):116-121.
|
[12] |
( Sun Xiaoming, Li Yao, Wang Chengjun, et al. Research on Inventors’ Names Disambiguation Based on Expert Discussion[J]. Information Science, 2019,37(4):116-121.)
|
[13] |
Han H, Giles L, Zha H Y, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries. New York: ACM, 2004: 296-305.
|
[14] |
邓可君, 华凯, 邓昌明, 等. 基于机器学习的论文作者名消歧方法研究[J]. 四川大学学报(自然科学版), 2019,56(2):241-245.
|
[14] |
( Deng Kejun, Hua Kai, Deng Changming, et al. Research on Author Name Disambiguation Method Based on Machine Learning[J]. Journal of Sichuan University (Natural Science Edition), 2019,56(2):241-245.)
|
[15] |
Ferreira A A, Goncalves M A, Laender A H F. A Brief Survey of Automatic Methods for Author Name Disambiguation[J]. Sigmod Record, 2012,41(2):15-26.
|
[16] |
张雄, 陈福才, 黄瑞阳. 基于融合特征相似度的实体消歧方法研究[J]. 计算机应用研究, 2017,34(2):347-350, 396.
|
[16] |
( Zhang Xiong, Chen Fucai, Huang Ruiyang. Research on Entity Disambiguation Method Based on Fusion Feature Similarity[J]. Application Research of Computers, 2017,34(2):347-350, 396.)
|
[17] |
李孟亚. 基于融合特征的中文图书作者人名消歧方法研究[J]. 电脑知识与技术, 2018,14(11):182-184.
|
[17] |
( Li Mengya. Research on Chinese Book Author’s Name Disambiguation Based on Fusion Features[J]. Computer Knowledge and Technology, 2018,14(11):182-184.)
|
[18] |
杨欣欣, 李培峰, 朱巧明, 等. 一种基于改进的K-means算法的人名消歧系统的设计与实现[J]. 计算机与数字工程, 2010,38(8):10-12,17.
|
[18] |
( Yang Xinxin, Li Peifeng, Zhu Qiaoming, et al. A Name Disambiguation Method Based on Improved K-means Algorithm[J]. Computer and Digital Engineering, 2010,38(8):10-12,17.)
|
[19] |
朱亮亮. 利用改进的K-means算法实现文献著者人名消歧[J]. 软件导刊, 2013,12(5):63-66.
|
[19] |
( Zhu Liangliang. Research on Name Disambiguation Based on Improved K-means Algorithm[J]. Software Guide, 2013,12(5):63-66.)
|
[20] |
任景华. 利用优化的DBSCAN算法进行文献著者人名消歧[J]. 图书馆理论与实践, 2014(12):61-65.
|
[20] |
( Ren Jinghua. Using the Optimized DBSCAN Algorithm for Disambiguation of the Names of the Authors[J]. Library Theory and Practice, 2014(12):61-65.)
|
[21] |
Kim K, Khabsa M, Giles C L. Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN[C]//Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. New York: IEEE, 2016: 269-270.
|
[22] |
Han H Q, Yu Y S, Wang L J, et al. Disambiguating USPTO Inventor Names with Semantic Fingerprinting and DBSCAN Clustering[J]. The Electronic Library, 2019,37(2):225-239.
doi: 10.1108/EL-12-2018-0232
|
[23] |
李维佳. 基于多层次聚类的同名区分算法研究与应用[D]. 大连: 大连理工大学, 2013.
|
[23] |
( Li Weijia. The Research and Application of Name Disambiguation Algorithm Based on Multi-Level Clustering[D]. Dalian: Dalian University of Technology, 2013.)
|
[24] |
Zhu J, Wu X C, Lin X Q, et al. A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering[J]. Scientometrics, 2018,114(3):781-794.
doi: 10.1007/s11192-017-2611-8
|
[25] |
Zhang S Y, E X H, Pan T. A Multi-Level Author Name Disambiguation Algorithm[J]. IEEE Access, 2019,7:104250-104257.
doi: 10.1109/Access.6287639
|
[26] |
郝丹丹, 郭景峰, 郑超. 基于属性关系图的同名实体区分算法[J]. 计算机工程与科学, 2010,32(9):61-64.
|
[26] |
( Hao Dandan, Guo Jingfeng, Zheng Chao. An Algorithm Based on Attributed Relational Graphs for Name Disambiguation[J]. Computer Engineering and Science, 2010,32(9):61-64.)
|
[27] |
黄斌. 社会网络中基于随机游走的名称消歧算法[J]. 计算机应用研究, 2015,32(12):3650-3653.
|
[27] |
( Huang Bin. Random Walk Based Name Disambiguation Algorithm in Social Networks[J]. Application Research of Computers, 2015,32(12):3650-3653.)
|
[28] |
Pooja K M, Mondal S, Chandra J. An Unsupervised Heuristic Based Approach for Author Name Disambiguation[C]//Proceedings of the 10th International Conference on Communication Systems & Networks. New York, USA: IEEE, 2018: 540-542.
|
[29] |
Pooja K M, Mondal S, Chandra J. A Graph Combination with Edge Pruning‐Based Approach for Author Name Disambiguation[J]. Journal of the Association for Information Science and Technology, 2020,71(1):69-83.
doi: 10.1002/asi.v71.1
|
[30] |
Muller M C. Semantic Author Name Disambiguation with Word Embeddings[C]//Proceedings of 2017 International Conference on Theory and Practice of Digital Libraries. Cham, Switzerland: Springer, 2017: 300-311.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|