Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 69-74    DOI: 10.11925/infotech.1003-3513.2013.07-08.10
article Current Issue | Archive | Adv Search |
Research on Author Name Disambiguation Algorithm in the Literature Database
Guo Shu1,2
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(621 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper firstly analyzes a graphical framework for name disambiguation called GHOST, and then provides a modified name disambiguation algorithm combining with the text mining of literature information. The new algorithm is more suitable for literature database, making up for the limitations existed in GHOST. Based on selecting title and publication name as computing feature from the literature information, the experiment shows that the algorithm achieves high precision and recall value, and F1 reaches 84%, which is good enough for name disambiguation.
Key wordsAuthor name disambiguation      GHOST      Text mining      Disambiguation algorithm     
Received: 22 May 2013      Published: 02 September 2013



Cite this article:

Guo Shu. Research on Author Name Disambiguation Algorithm in the Literature Database. New Technology of Library and Information Service, 2013, 29(7/8): 69-74.

URL:     OR

[1] Han H, Giles L, Zha H, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]. In: Proceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries (JCDL '04). New York: ACM, 2004:296-305.
[2] Treeratpituk P, Giles C L. Disambiguating Authors in Academic Publications Using Random Forests[C]. In:Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'09). New York: ACM,2009:39-48.
[3] Han H, Zha H, Giles C L. Name Disambiguation in Author Citations Using a K-way Spectral Clustering Method[C]. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05). New York: ACM, 2005:334-343.
[4] Fan X M, Wang J Y, Pu X, et al. On Graph-based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2):23-56.
[5] Pereira D A, Ribeiro-Neto B, Ziviani N, et al. Using Web Information for Author Name Disambiguation[C]. In: Proceedings of the 9th ACM/IEEE-CS Joint International Conference on Digital Libraries (JCDL'09). New York: ACM, 2009:49-58.
[6] Song Y, Huang J, Councill I G, et al. Efficient Topic-based Unsupervised Name Disambiguation[C]. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'07). New York: ACM, 2007:342-351.
[7] 蒲旭, 王建勇, 范小明. GHOST:作者名字排歧系统[J]. 计算机研究与发展, 2010,47(S1):512-515.(Pu Xu, Wang Jianyong, Fan Xiaoming. GHOST: An Author Name Disambiguation System[J]. Journal of Computer Research and Development, 2010,47(S1):512-515.)
[8] DBLP[EB/OL].[2013-04-13].
[9] Lucene[EB/OL].[2013-04-04].
[10] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. New York: Cambridge University Press, 2008.
[11] Cota R G, Ferreira A A, Nascimento C, et al. An Unsupervised Heuristic-based Hierarchical Method for Name Disambiguation in Bibliographic Citations[J].Journal of the American Society for Information Science and Technology, 2010, 61(9):1853-1870.
[12] Robertson S. Understanding Inverse Document Frequency: On Theoretical Argument for IDF[J]. Journal of Documentation, 2004, 60(5):503-520.
[13] 肖晶, 梁冰, 张晓丹, 等. 一种面向篇级数据的作者名消歧规则和算法[J]. 现代图书情报技术, 2012(5):55-59.(Xiao Jing, Liang Bing, Zhang Xiaodan, et al. Author Disambiguation Rules and Algorithm for Article Level Data[J]. New Technology of Library and Information Service, 2012(5):55-59.)
[1] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[3] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[4] Ning Zhang,Lemin Yin,Lifeng He. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[5] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[6] Qiangbing Wang,Chengzhi Zhang. Constructing Users Profiles with Content and Gesture Behaviors[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[7] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[8] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[9] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[10] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[11] Lin Yuanyuan,Zhan Hongfei,Yu Junhe,Li Changjiang,Zhang Fan. Using Product Reviews to Analyze Sentiment Fluctuation of Consumer[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[12] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[13] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[14] Ruyi Yang,Dongsu Liu,Hui Li. An Improved Topic Model Integrating Extra-Features[J]. 现代图书情报技术, 2016, 32(1): 48-54.
[15] Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name[J]. 现代图书情报技术, 2015, 31(6): 57-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938