Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (7/8): 69-74    DOI: 10.11925/infotech.1003-3513.2013.07-08.10
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
文献数据库中作者名消歧算法研究
郭舒1,2
1. 中国科学院国家科学图书馆 北京 100190;
2. 中国科学院大学 北京 100049
Research on Author Name Disambiguation Algorithm in the Literature Database
Guo Shu1,2
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF(621 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 在深入分析基于图的人名识别框架GHOST的基础上, 针对其存在的局限性,结合对文献信息的文本挖掘提出一种更适用于文献数据库的作者名消歧算法, 并从中选取标题以及出版物名称这两个特征进行实证研究, 该算法在准确率、召回率等指标方面都有良好的表现, F1平均值达到84%, 具备较好的消歧效果。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
郭舒
关键词 作者名消歧GHOST文本挖掘消歧算法    
Abstract:This paper firstly analyzes a graphical framework for name disambiguation called GHOST, and then provides a modified name disambiguation algorithm combining with the text mining of literature information. The new algorithm is more suitable for literature database, making up for the limitations existed in GHOST. Based on selecting title and publication name as computing feature from the literature information, the experiment shows that the algorithm achieves high precision and recall value, and F1 reaches 84%, which is good enough for name disambiguation.
Key wordsAuthor name disambiguation    GHOST    Text mining    Disambiguation algorithm
收稿日期: 2013-05-22     
: 

TP391

 
通讯作者: 郭舒     E-mail: guoshu@mail.las.ac.cn
引用本文:   
郭舒. 文献数据库中作者名消歧算法研究[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
Guo Shu. Research on Author Name Disambiguation Algorithm in the Literature Database. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2013.07-08.10.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.07-08.10
[1] Han H, Giles L, Zha H, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations[C]. In: Proceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries (JCDL '04). New York: ACM, 2004:296-305.
[2] Treeratpituk P, Giles C L. Disambiguating Authors in Academic Publications Using Random Forests[C]. In:Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'09). New York: ACM,2009:39-48.
[3] Han H, Zha H, Giles C L. Name Disambiguation in Author Citations Using a K-way Spectral Clustering Method[C]. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05). New York: ACM, 2005:334-343.
[4] Fan X M, Wang J Y, Pu X, et al. On Graph-based Name Disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2):23-56.
[5] Pereira D A, Ribeiro-Neto B, Ziviani N, et al. Using Web Information for Author Name Disambiguation[C]. In: Proceedings of the 9th ACM/IEEE-CS Joint International Conference on Digital Libraries (JCDL'09). New York: ACM, 2009:49-58.
[6] Song Y, Huang J, Councill I G, et al. Efficient Topic-based Unsupervised Name Disambiguation[C]. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'07). New York: ACM, 2007:342-351.
[7] 蒲旭, 王建勇, 范小明. GHOST:作者名字排歧系统[J]. 计算机研究与发展, 2010,47(S1):512-515.(Pu Xu, Wang Jianyong, Fan Xiaoming. GHOST: An Author Name Disambiguation System[J]. Journal of Computer Research and Development, 2010,47(S1):512-515.)
[8] DBLP[EB/OL].[2013-04-13]. http://www.informatik.uni-trier.de/~ley/db/index.html.
[9] Lucene[EB/OL].[2013-04-04]. http://lucene.apache.org/.
[10] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. New York: Cambridge University Press, 2008.
[11] Cota R G, Ferreira A A, Nascimento C, et al. An Unsupervised Heuristic-based Hierarchical Method for Name Disambiguation in Bibliographic Citations[J].Journal of the American Society for Information Science and Technology, 2010, 61(9):1853-1870.
[12] Robertson S. Understanding Inverse Document Frequency: On Theoretical Argument for IDF[J]. Journal of Documentation, 2004, 60(5):503-520.
[13] 肖晶, 梁冰, 张晓丹, 等. 一种面向篇级数据的作者名消歧规则和算法[J]. 现代图书情报技术, 2012(5):55-59.(Xiao Jing, Liang Bing, Zhang Xiaodan, et al. Author Disambiguation Rules and Algorithm for Article Level Data[J]. New Technology of Library and Information Service, 2012(5):55-59.)
[1] 杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] 张旺强,祝忠明,李雅梅,卢利农,刘巍. 机构知识库作者名自动消歧框架设计与实践*[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[3] 张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[4] 张宁,尹乐民,何立峰. 网络股评“发布者-关注者”BSI与股票市场关联性研究*[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[5] 范馨月,崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[6] 汪强兵,章成志. 融合内容与用户手势行为的用户画像构建系统设计与实现*[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[7] 谢秀芳,张晓林. 针对科技路线图的文本挖掘研究: 集成分析及可视化*[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[8] 姚兆旭,马静. 面向微博话题的“主题+观点”词条抽取算法研究*[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[9] 兰秋军,刘文星,李卫康,胡星野. 融合句法信息的金融论坛文本情感计算研究*[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[10] 毕强, 刘健, 鲍玉来. 基于语义相似度的文本聚类研究*[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[11] 林园园,战洪飞,余军合,李长江,张凡. 基于产品评论的消费者情感波动分析模型构建及实证研究*[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[12] 隋明爽,崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[13] 杨如意,刘东苏,李慧. 一种融合外部特征的改进主题模型*[J]. 现代图书情报技术, 2016, 32(1): 48-54.
[14] 杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015, 31(6): 57-63.
[15] 王颖, 吴振新, 谢靖. 面向科技文献的语义检索系统研究综述[J]. 现代图书情报技术, 2015, 31(5): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn