Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (1): 27-37    DOI: 10.11925/infotech.2096-3467.2018.1363
Current Issue | Archive | Adv Search |
Constructing Name Authority for Research Entities
Jianyong Zhang1,2,Li Qian1,2,Qianqian Yu1(),Zhipeng Dong1,Yongwen Huang3,Jianhua Liu4,Shu Guo5,Feng Wang6
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081, China
4Library of Shanghai Tech University , Shanghai 201210, China
5National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
6Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF(1870 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to construct name authority for authors, institutions, journals, and funding, etc. [Methods] First, we loaded, cleansed, transformed, integrated and merged names from multiple sources to create uniform structured data with unique identifiers. Then, we used the metadata model for name authority to extract research entities and relationships among them. Finally, we proposed disambiguation algorithms, such as Levenshtein Distance, Jaccard, word2vec and CNN, for different research entities. [Results] Our study created name authority databases for authors (23 million records), institutions (2.6 million records), journals (30,000 records), and funding (2 million records). We chose six institutions’ names from NSTL and compared them with those from Incites. We found the average precision reached 86.8%. [Limitations] The proposed disambiguation strategies and algorithms need to be further refined and improved in dealing with the diverse expressions of selected disambiguation feature. The analysis of data from different data sources are needed, in order to apply appropriate algorithms. [Conclusions] The proposed method and disambiguation strategies could improve the performance and comprehensiveness of databases for name authority.

Key wordsName Authority      Journal Authority      Institution Authority      Fund Authority      Author Authority     
Received: 03 December 2018      Published: 04 March 2019

Cite this article:

Jianyong Zhang,Li Qian,Qianqian Yu,Zhipeng Dong,Yongwen Huang,Jianhua Liu,Shu Guo,Feng Wang. Constructing Name Authority for Research Entities. Data Analysis and Knowledge Discovery, 2019, 3(1): 27-37.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1363     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I1/27

[1] 程颖. 资源发现系统元数据的问题与思考[J]. 图书情报工作, 2015, 59(9): 104-110, 126.
[1] (Cheng Ying.Problem and Thought on the Metadata of Resource Discovery System[J]. Library and Information Service, 2015, 59(9): 104-110, 126.)
[2] Niu J.Evolving Landscape in Name Authority Control[J]. Cataloging & Classification Quarterly, 2013, 51(4): 404-419.
[3] 胡小菁. 规范控制:从名称选择到实体管理[J]. 数字图书馆论坛, 2018(1): 2-7.
[3] (Hu Xiaojing.Authority Control: From Selection of a Name to Entity Management[J]. Digital Library Forum, 2018(1): 2-7.)
[4] Youtie J, Carley S, Porter A L, et al.Tracking Researchers and Their Outputs: New Insights from ORCIDs[J]. Scientometrics, 2017, 113(1): 437-453.
[5] Chávezaragón A, Cruz J F R, Reyesgalaviz O F, et al. An Algorithm to Tackle the Name Authority Control Problem Using Semantic Information[C]// Proceedings of the 2009 Mexican International Conference on Computer Science. IEEE, 2010:176-179.
[6] Fader A, Soderland S, Etzioni O.Scaling Wikipedia-based Named Entity Disambiguation to Arbitrary Web Text[C]// Proceedings of the 2009 IJCAI Workshop on User-contributed Knowledge and Artificial Intelligence: An Evolving Synergy. 2009.
[7] 郎君, 秦兵, 宋巍, 等. 基于社会网络的人名检索结果重名消解[J]. 计算机学报, 2009, 32(7): 1365-1374.
[7] (Lang Jun, Qin Bing, Song Wei, et al.Person Name Disambiguation of Searching Results Using Social Network[J]. Chinese Journal of Computers, 2009, 32(7): 1365-1374.)
[8] 朱小婷. 基于本体的中文人名消歧[D]. 上海: 华东师范大学, 2013.
[8] (Zhu Xiaoting.Chinese Person Name Disambiguation Based on Ontology[D]. Shanghai: East China Normal University, 2013.)
[9] Phillips L B.The Temple and the Bazaar: Wikipedia as a Platform for Open Authority in Museums[J]. The Museum Journal, 2013, 56(2): 219-235.
[10] Kiefer C.SimPack Project Page[EB/OL]. [2018-11-11]..
[11] SecondString Project Page [EB/OL]. [2018-11-11]. .
[12] UK Sheffield University. SimMetrics[EB/OL]. [2018-11-11]. .
[13] 孙海霞, 王蕾, 吴英杰, 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[13] (Sun Haixia, Wang Lei, Wu Yingjie, et al.Matching Strategies for Institution Names in Literature Database[J]. Data Analysis and Knowledge Discovery, 2018, 2(8): 88-97. )
[14] Han H, Giles C L, Zha H, et al.Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]// Proceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries. 2004: 296-305.
[15] 汪沛, 线岩团, 郭剑毅, 等. 一种结合词向量和图模型的特定领域实体消歧方法[J]. 智能系统学报, 2016, 11(3): 366-374.
[15] (Wang Pei, Xian Yantuan, Guo Jianyi, et al.A Novel Method Using Word Vector and Graphical Models for Entity Disambiguation in Specific Topic Domains[J].CAAI Transactions on Intelligent Systems, 2016, 11(3): 366-374.)
[16] 马晓军, 郭剑毅, 王红斌, 等. 融合词向量和主题模型的领域实体消歧[J]. 模式识别与人工智能, 2017, 30(12): 1130-1137.
[16] (Ma Xiaojun, Guo Jianyi, Wang Hongbin, et al.Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(12): 1130-1137.)
[17] 黄艳芬. FRAD概念模型与CNMARC规范控制[J]. 图书情报工作, 2009, 53(12): 125-128.
[17] (Huang Yanfen.Conception Model of FRAD and Authority Control of CNMARC[J]. Library and Information Service, 2009, 53(12): 125-128.)
[18] 王景侠. 书目框架(BIBFRAME)模型演进分析及启示[J]. 数字图书馆论坛, 2016(10): 67-72.
[18] (Wang Jingxia.Evolution Analysis of BIBFRAME Model and Its Enlightenment[J]. Digital Library Forum, 2016(10): 67-72.)
[19] 张璇. RDA对规范控制思想的阐释及实践革新探析[J]. 图书馆研究与工作, 2017(10): 31-37.
[19] (Zhang Xuan.Exploration of RDA Interpretation of Authority Control and Practice Reform[J]. Library Science Research & Work, 2017(10): 31-37.)
[20] 名称规范元数据标准[EB/OL]. [2018-11-11]. .
[20] (Name Authority Metadata Specification [EB/OL]. [2018-11-11].
[21] Kainulainen J J. Clustering Algorithms: Basics and Visualization[EB/OL]. [2018-11-11]. .
[22] Baidu NLP[EB/OL]. [2018-11-11]..
[23] Zehnalova S, Horak Z, Kudelka M, et al.Evolution of Author’s Topic in Authorship Network[C]// Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012). IEEE Computer Society, 2012.
[24] Newman M E J. Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality[J]. Physical Review E, 2001, 64: 016132.
[25] Newman M E J. Scientific Collaboration Networks. I. Network Constructionand Fundamental Results[J]. Physical Review E, 2001, 64: 016131.
[26] Newman M E J. The Structure of Scientific Collaboration Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2000, 98(2): 404-409.
[27] 彭以祺, 吴波尔, 沈仲祺. 国家科技图书文献中心“十三五”发展规划[J]. 数字图书馆论坛, 2016(11): 12-20.
[27] (Peng Yiqi, Wu Boer, Shen Zhongqi.The 13th Five-Year Plan for the Development of National Science and Technology Library[J]. Digital Library Forum, 2016(11): 12-20.)
[28] 张建勇, 曾燕. 文献数据库数据加工规范[M]. 北京: 知识产权出版社, 2009.
[28] (Zhang Jianyong, Zeng Yan.NSTL Literature Data Processing Specification[M]. Beijing: Intellectual Property Publishing House, 2009.)
[29] Web of Science Core Collection Schema [EB/OL]. [2018-10-22]. .
[30] Journal Archiving and Interchange Tag Set Versions[EB/OL]. [2018-10-28]..
[31] 沈仲祺, 张建勇. 文献元数据设计指南和实践[M]. 北京: 科学技术文献出版社, 2017.
[31] (Shen Zhongqi, Zhang Jianyong.Guideline and Practice of Literature Metadata Design[M]. Beijing: Scientific and Technical Documentation Press, 2017.)
[1] Hao Jiashu. Enriching Personal Name Authority with Open Semantic Resources:FOAF for Schema Design[J]. 现代图书情报技术, 2016, 32(2): 75-82.
[2] Bai Haiyan. Introduction of Integration Between ORCID and Institutional Repository[J]. 现代图书情报技术, 2015, 31(3): 8-17.
[3] Chen Jinxing,Zhu Zhongming. Research Progress of the Name Authority Control for the Contributor[J]. 现代图书情报技术, 2009, 25(12): 12-17.
[4] Liu Chunhong,Li fengxia,Yang Hui. On the Description of Name Authority Headings at the  Library of  Tsinghua University[J]. 现代图书情报技术, 2005, 21(2): 67-70.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn