[Objective] In order to find synonymous relations for knowledge organization system integration. [Methods] This paper presents an automatic algorithm, which consists of lemmatization and semantic merging, as well as various methods to control the effects induced by vocabulary granularity. [Results] Its efficiency and effectiveness is well demonstrated from large scale data testing using many source vocabularies, compared with well-known integrated knowledge organization system. [Conclusions] The proposed algorithm can be used in large scale knowledge organization system integration, and is helpful for Chinese knowledge organization system integration.
李晓瑛, 李丹亚, 钱庆, 孙海霞, 李军莲, 胡铁军. 面向知识组织系统整合的英文同义关系自动发现算法研究*[J]. 现代图书情报技术, 2014, 30(5): 26-32.
Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration. New Technology of Library and Information Service, 2014, 30(5): 26-32.
[1] 陆勇. 面向信息检索的汉语同义词自动识别[M]. 南京:东南大学出版社, 2009:14-17. (Lu Yong. Automatic Recogni-tion of Chinese Synonyms for Information Retrieval [M]. Nanjing: Southeast University Press, 2009: 14-17.)
[2] Doan A, Madhavan J, Domingos P, et al. Learning to Map between Ontologies on the Semantic Web [C]. In: Proceedings of the 11th International Conference on World Wide Web (WWW'02), Hawaii, USA. New York: ACM, 2002:662-673.
[3] Stoilos G, Stamou G, Kollias S. A String Metric for Ontology Alignment[C]. In: Proceedings of the 4th International Conference on the Semantic Web (ISWC'05). Berlin, Heidelberg: Springer-Verlag, 2005:624-637.
[4] Ehrig M, Staab S. QOM - Quick Ontology Mapping [C]. In: Proceedings of the 3rd International Semantic Web Conference(ISWC'04), Hiroshima, Japan. 2004:683-697.
[5] Huang K, Geller J, Halper M, et al. Using WordNet Synonym Substitution to Enhance UMLS Source Integration[J]. Artificial Intelligence in Medicine, 2009, 46 (2): 97-109.
[6] Mougin F, Burgun A, Bodenreider O. Using WordNet to Improve the Mapping of Data Elements to UMLS for Data Sources Integration[C]. In: Proceedings of AMIA Annual Symposium, 2006: 574-578.
[7] National Library of Medicine. MeSH Browser [EB/OL]. [2013-09-10]. http://www.nlm.nih.gov/mesh/MBrowser.html.
[8] U.S.National Library of Medicine.SNOMED Clinical Terms [EB/OL]. [2012-05-12]. http://www.nlm.nih.gov/research/umls/ Snomed/snomed_main.html.
[9] 吴思竹, 钱庆, 胡铁军, 等. 词干提取方法及工具的对比分析研究[J]. 图书情报工作, 2012, 56(15): 109-115, 142. (Wu Sizhu, Qian Qing, Hu Tiejun, et al. Comparative Analysis of Methods and Tools for Word Stemming[J]. Library and Information Service, 2012, 56(15): 109-115, 142.)
[10] 李晓瑛, 李丹亚, 胡铁军. 基于UMLS专家词典与工具的词形归并算法研究[J]. 情报科学, 2013, 31(4): 134-138. (Li Xiaoying, Li Danya, Hu Tiejun. Investigation of Algorithm for Lemmatisation Based on UMLS Specialist Lexicon and Lexical Tools[J]. Information Science, 2013, 31(4): 134-138.)
[11] 吴思竹, 钱庆, 胡铁军, 等. 词形还原方法及实现工具比较分析[J]. 现代图书情报技术, 2012(3): 27-34. (Wu Sizhu, Qian Qing, Hu Tiejun, et al. Contrast Analysis of Methods and Tools for Lemmatization[J]. New Technology of Library and Information Service, 2012(3): 27-34.)
[12] 吴思竹, 钱庆, 李丹亚, 等. 三种词形还原工具对领域词汇的还原效果评估[J].情报理论与实践, 2013, 36(5): 111-115. (Wu Sizhu, Qian Qing, Li Danya, et al. Evaluation the Effects of 3 Lemmatization Tools on the Field Specialized Vocabulary[J]. Information Studies: Theory & Application, 2013, 36(5): 111-115.)
[13] NUIT. MorphAdoner V 2.0[EB/OL]. [2013-08-07]. http:// morphadorner.northwestern.edu/morphadorner/.
[14] The Stanford Natural Language Processing Group. Stanford CoreNLP[EB/OL].[2013-11-12]. http://nlp.stanford.edu/softw are/corenlp.shtml.
[15] The Lexical Systems Group. Specialist NLP Tools [EB/OL]. [2013-10-17]. http://specialist.nlm.nih.gov/.
[16] The Lexical Systems Group. Specialist Lexicon Growth- Statistics [EB/OL]. [2013-12-10]. http://lexsrv3.nlm.nih.gov/ LexSysGroup/Projects/lexicon/current/docs/designDoc/UDF/statistics/index.html.
[17] Unified Medical Language System.The Norm Program [EB/OL]. [2013-04-09]. http://www.nlm.nih.gov/research/umls/ new_users/online_learning/LEX_005.html.
[18] 李晓瑛, 李丹亚, 钱庆, 等. 面向医学领域知识组织系统整合的缩略语构成方式及歧义性鉴别研究[J]. 医学信息学杂志, 2013, 34(10): 43-46. (Li Xiaoying, Li Danya, Qian Qing, et al. Research on Abbreviation Composition Form and Ambiguity Identification for Medical Knowledge Organiza-tion System Integration [J]. Journal of Medical Informatics, 2013, 34(10): 43-46.)
[19] U.S. National Library of Medicine.MedlinePlus[EB/OL]. [2012-10-20]. http://www.nlm.nih.gov/medlineplus/healthtopics. html.
[20] The Digital Anatomist Information System[EB/OL]. [2014-01-04]. http://sig.biostr.washington.edu/projects/da/.
[21] U.S.National Library of Medicine.Unified Medical Language System [EB/OL]. [2013-11-21]. http://www.nlm.nih.gov/research/ umls/.
[22] Fung K W, Hole W T, Nelson S J, et al. Integrating SNOMED CT into the UMLS: An Exploration of Different Views of Synonymy and Quality of Editing [J]. Journal of the American Medical Informatics Association, 2005, 12(4): 486-494.
[23] University of Utah. Consumer Health Vocabulary Initiative [EB/OL].[2014-01-04]. http://consumerhealthvocab.org/.