现代图书情报技术  2015, Vol. 31 Issue (6): 57-63    DOI: 10.11925/infotech.1003-3513.2015.06.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
杨波, 杨军威, 阎素兰
南京农业大学信息科学技术学院 南京 210095
Research on Rule-based Normalization of Institution Name
Yang Bo, Yang Junwei, Yan Sulan
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
目的】改善基于海量数据的科技评价中的数据可靠性问题, 克服相似度匹配或者频率统计方法在机构名称规范化方面存在的缺陷。【方法】提出基于低词面相似度的机构名称映射算法, 该算法采用规则和统计相结合的策略实现多个机构名称到一个机构实体的映射, 从而达到机构名规范化的目的。【结果】实验结果表明, 基于规则的算法的F值平均为55.50%, 高于其他两种技术策略。【局限】对低词面相似度机构名识别存在不足。【结论】在机构名规范方面的综合表现要优于其他两种技术策略, 但在检全率方面还需要改进。

关键词 机构名规范化作者名消歧机构名聚类学术评价    

[Objective] To improve the data reliability in large-scale academic assessment and the performance of word-similarity or frequency based techniques in institution name normalization. [Methods] A new rule-based algorithm aided with low-value word similarity is proposed and a series of rules and statistical methods are applied jointly to mapping multiple institution names onto one entity of institution, so as to make institution name normalized. [Results] The experimental results show that the F-value of the rule-based algorithm (55.50%) is higher than the other two strategies. [Limitations] The ability to identify institution names with low value of word similarity is not good enough. [Conclusions] The rule-based algorithm proposed performs better than the other two techniques comprehensively, while the recall value needs to be improved.

Key wordsNormalization of institution name    Author name disambiguation    Clustering of institution name    Academic assessment
收稿日期: 2014-11-21     
:  G312  


通讯作者: 杨波, ORCID: 0000-0003-1903-6292, E-mail:。     E-mail:
作者简介: 作者贡献声明: 杨波: 提出研究思路, 算法设计, 数据处理, 论文最终版本修订; 杨军威: 部分算法设计, 数据处理; 杨波, 杨军威: 起草论文; 阎素兰: 算法评测。
杨波, 杨军威, 阎素兰. 基于规则的机构名规范化研究[J]. 现代图书情报技术, 2015, 31(6): 57-63.
Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.06.09.

