Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (6): 57-63    DOI: 10.11925/infotech.1003-3513.2015.06.09
Current Issue | Archive | Adv Search |
Research on Rule-based Normalization of Institution Name
Yang Bo, Yang Junwei, Yan Sulan
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
Export: BibTeX | EndNote (RIS)      

[Objective] To improve the data reliability in large-scale academic assessment and the performance of word-similarity or frequency based techniques in institution name normalization. [Methods] A new rule-based algorithm aided with low-value word similarity is proposed and a series of rules and statistical methods are applied jointly to mapping multiple institution names onto one entity of institution, so as to make institution name normalized. [Results] The experimental results show that the F-value of the rule-based algorithm (55.50%) is higher than the other two strategies. [Limitations] The ability to identify institution names with low value of word similarity is not good enough. [Conclusions] The rule-based algorithm proposed performs better than the other two techniques comprehensively, while the recall value needs to be improved.

Key wordsNormalization of institution name      Author name disambiguation      Clustering of institution name      Academic assessment     
Received: 21 November 2014      Published: 08 July 2015
:  G312  

Cite this article:

Yang Bo, Yang Junwei, Yan Sulan. Research on Rule-based Normalization of Institution Name. New Technology of Library and Information Service, 2015, 31(6): 57-63.

URL:     OR

[1] Csajbók E, Berhidi A, Vasas L, et al. Hirsch-index for Countries Based on Essential Science Indicators Data [J]. Scientometrics, 2007, 73(1): 91-117.
[2] Ta?kin Z, Al U. Institutional Name Confusion on Citation Indexes: The Example of the Names of Turkish Hospitals [J]. Procedia-Social and Behavioral Sciences, 2013, 73: 544-550.
[3] van Raan A F J. Fatal Attraction: Conceptual and Methodological Problems in the Ranking of Universities by Bibliometric Methods [J]. Scientometrics, 2005, 62(1): 133-143.
[4] 吴建伟. 面向Twitter信息的机构名消歧技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2012. (Wu Jianwei. Research on Organization Name Disambiguation on Twitter Data [D]. Harbin: Harbin Institute of Technology, 2012.)
[5] 胡万亭, 杨燕, 尹红风, 等. 一种基于词频统计的组织机构名识别方法[J]. 计算机应用研究, 2013, 30(7): 2014-2016. (Hu Wanting, Yang Yan, Yin Hongfeng, et al. Organization Name Recognition Based on Word Frequency Statistics [J]. Application Research of Computers, 2013, 30(7): 2014-2016.)
[6] D'Angelo C A, Giuffrida C, Abramo G. A Heuristic Approach to Author Name Disambiguation in Bibliometrics Databases for Large-scale Research Assessments [J]. Journal of the American Society for Information Science and Technology, 2011, 62(2): 257-269.
[7] Abramo G, Cicero T, D'Angelo C A. A Field-standardized Application of DEA to National-scale Research Assessment of Universities [J]. Journal of Informetrics, 2011, 5(4): 618-628.
[8] Morillo F, Aparicio J, González-Albo B, et al. Towards the Automation of Address Identification [J]. Scientometrics, 2013, 94(1): 207-224.
[9] Jiang Y, Zheng H T, Wang X, et al. Affiliation Disambiguation for Constructing Semantic Digital Libraries [J]. Journal of the American Society for Information Science and Technology, 2011, 62(6): 1029-1041.
[10] Onodera N, Iwasawa M, Midorikawa N, et al. A Method for Eliminating Articles by Homonymous Authors from the Large Number of Articles Retrieved by Author Search [J]. Journal of the American Society for Information Science and Technology, 2011, 62(4): 677-690.
[11] French J C, Powell A L, Schulman E. Using Clustering Strategies for Creating Authority Files [J]. Journal of the American Society for Information Science and Technology, 2000, 51(8): 774-786.
[12] Torvik V I, Weeber M, Swanson D R, et al. A Probabilistic Similarity Metric for Medline Records: A Model for Author Name Disambiguation [J]. Journal of the American Society for Information Science and Technology, 2005, 56(2): 140-158.
[13] Smalheiser N R, Torvik V I. Author Name Disambiguation [J]. Annual Review of Information Science and Technology, 2009, 43(1): 1-43.
[14] Huang S, Yang B, Yan S, et al. Institution Name Disambiguation for Research Assessment [J]. Scientometrics, 2014, 99(3): 823-838.

[1] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[2] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[3] Yu Chuanming,Zhong Yunci,Lin Aochen,An Lu. Author Name Disambiguation with Network Embedding[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[4] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[5] Guo Shu. Research on Author Name Disambiguation Algorithm in the Literature Database[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938