Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (6): 57-63    DOI: 10.11925/infotech.1003-3513.2015.06.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于规则的机构名规范化研究
杨波, 杨军威, 阎素兰
南京农业大学信息科学技术学院 南京 210095
Research on Rule-based Normalization of Institution Name
Yang Bo, Yang Junwei, Yan Sulan
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
全文: PDF(393 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

目的】改善基于海量数据的科技评价中的数据可靠性问题, 克服相似度匹配或者频率统计方法在机构名称规范化方面存在的缺陷。【方法】提出基于低词面相似度的机构名称映射算法, 该算法采用规则和统计相结合的策略实现多个机构名称到一个机构实体的映射, 从而达到机构名规范化的目的。【结果】实验结果表明, 基于规则的算法的F值平均为55.50%, 高于其他两种技术策略。【局限】对低词面相似度机构名识别存在不足。【结论】在机构名规范方面的综合表现要优于其他两种技术策略, 但在检全率方面还需要改进。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨波
杨军威
阎素兰
关键词 机构名规范化作者名消歧机构名聚类学术评价    
Abstract

[Objective] To improve the data reliability in large-scale academic assessment and the performance of word-similarity or frequency based techniques in institution name normalization. [Methods] A new rule-based algorithm aided with low-value word similarity is proposed and a series of rules and statistical methods are applied jointly to mapping multiple institution names onto one entity of institution, so as to make institution name normalized. [Results] The experimental results show that the F-value of the rule-based algorithm (55.50%) is higher than the other two strategies. [Limitations] The ability to identify institution names with low value of word similarity is not good enough. [Conclusions] The rule-based algorithm proposed performs better than the other two techniques comprehensively, while the recall value needs to be improved.

Key wordsNormalization of institution name    Author name disambiguation    Clustering of institution name    Academic assessment
收稿日期: 2014-11-21     
:  G312  
基金资助:

本文系国家社会科学基金项目“基于社区发现的学术Web主题显著度研究”(项目编号:13CTQ031)的研究成果之一。

通讯作者: 杨波, ORCID: 0000-0003-1903-6292, E-mail: boyang@njau.edu.cn。     E-mail: boyang@njau.edu.cn
作者简介: 作者贡献声明: 杨波: 提出研究思路, 算法设计, 数据处理, 论文最终版本修订; 杨军威: 部分算法设计, 数据处理; 杨波, 杨军威: 起草论文; 阎素兰: 算法评测。
引用本文:   
杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015, 31(6): 57-63.
Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.06.09.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.06.09

[1] Csajbók E, Berhidi A, Vasas L, et al. Hirsch-index for Countries Based on Essential Science Indicators Data [J]. Scientometrics, 2007, 73(1): 91-117.
[2] Ta?kin Z, Al U. Institutional Name Confusion on Citation Indexes: The Example of the Names of Turkish Hospitals [J]. Procedia-Social and Behavioral Sciences, 2013, 73: 544-550.
[3] van Raan A F J. Fatal Attraction: Conceptual and Methodological Problems in the Ranking of Universities by Bibliometric Methods [J]. Scientometrics, 2005, 62(1): 133-143.
[4] 吴建伟. 面向Twitter信息的机构名消歧技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2012. (Wu Jianwei. Research on Organization Name Disambiguation on Twitter Data [D]. Harbin: Harbin Institute of Technology, 2012.)
[5] 胡万亭, 杨燕, 尹红风, 等. 一种基于词频统计的组织机构名识别方法[J]. 计算机应用研究, 2013, 30(7): 2014-2016. (Hu Wanting, Yang Yan, Yin Hongfeng, et al. Organization Name Recognition Based on Word Frequency Statistics [J]. Application Research of Computers, 2013, 30(7): 2014-2016.)
[6] D'Angelo C A, Giuffrida C, Abramo G. A Heuristic Approach to Author Name Disambiguation in Bibliometrics Databases for Large-scale Research Assessments [J]. Journal of the American Society for Information Science and Technology, 2011, 62(2): 257-269.
[7] Abramo G, Cicero T, D'Angelo C A. A Field-standardized Application of DEA to National-scale Research Assessment of Universities [J]. Journal of Informetrics, 2011, 5(4): 618-628.
[8] Morillo F, Aparicio J, González-Albo B, et al. Towards the Automation of Address Identification [J]. Scientometrics, 2013, 94(1): 207-224.
[9] Jiang Y, Zheng H T, Wang X, et al. Affiliation Disambiguation for Constructing Semantic Digital Libraries [J]. Journal of the American Society for Information Science and Technology, 2011, 62(6): 1029-1041.
[10] Onodera N, Iwasawa M, Midorikawa N, et al. A Method for Eliminating Articles by Homonymous Authors from the Large Number of Articles Retrieved by Author Search [J]. Journal of the American Society for Information Science and Technology, 2011, 62(4): 677-690.
[11] French J C, Powell A L, Schulman E. Using Clustering Strategies for Creating Authority Files [J]. Journal of the American Society for Information Science and Technology, 2000, 51(8): 774-786.
[12] Torvik V I, Weeber M, Swanson D R, et al. A Probabilistic Similarity Metric for Medline Records: A Model for Author Name Disambiguation [J]. Journal of the American Society for Information Science and Technology, 2005, 56(2): 140-158.
[13] Smalheiser N R, Torvik V I. Author Name Disambiguation [J]. Annual Review of Information Science and Technology, 2009, 43(1): 1-43.
[14] Huang S, Yang B, Yan S, et al. Institution Name Disambiguation for Research Assessment [J]. Scientometrics, 2014, 99(3): 823-838.

[1] 张旺强,祝忠明,李雅梅,卢利农,刘巍. 机构知识库作者名自动消歧框架设计与实践*[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[2] 郭舒. 文献数据库中作者名消歧算法研究[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
[3] 肖晶, 梁冰, 张晓丹, 吕世炅. 一种面向篇级数据的作者名消歧规则和算法[J]. 现代图书情报技术, 2012, 28(5): 55-59.
[4] 邓三鸿, 王昊, 苏新宁. 基于CSSCI本体的学术期刊关联分析[J]. 现代图书情报技术, 2011, 27(3): 30-37.
[5] 王昊, 苏新宁. 基于CSSCI本体的学科关联分析[J]. 现代图书情报技术, 2010, 26(10): 10-16.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn