Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (5): 26-32    DOI: 10.11925/infotech.1003-3513.2014.05.04
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
面向知识组织系统整合的英文同义关系自动发现算法研究*
李晓瑛, 李丹亚, 钱庆, 孙海霞, 李军莲, 胡铁军
中国医学科学院医学信息研究所 北京 100020
Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration
Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
全文: PDF(510 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】进行基于术语同义关系发现的知识组织系统整合研究。【方法】提出一种英文同义关系自动发现算法, 涉及基于词形还原的词形归并以及基于同义关系传递和来源词表颗粒度控制的语义归并等综合方法。 【结果】通过对多来源领域术语的大规模实验评估, 并与已有整合知识组织系统进行多指标比较, 获得较为满意的归并正确率, 体现出良好的可行性及实用价值。【结论】本算法可应用于大规模领域知识组织系统的整合研究中, 并对中文知识组织系统整合有一定借鉴意义。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李丹亚
李晓瑛
胡铁军
李军莲
钱庆
孙海霞
关键词 知识组织系统整合同义关系发现词形归并语义归并颗粒度    
Abstract

[Objective] In order to find synonymous relations for knowledge organization system integration. [Methods] This paper presents an automatic algorithm, which consists of lemmatization and semantic merging, as well as various methods to control the effects induced by vocabulary granularity. [Results] Its efficiency and effectiveness is well demonstrated from large scale data testing using many source vocabularies, compared with well-known integrated knowledge organization system. [Conclusions] The proposed algorithm can be used in large scale knowledge organization system integration, and is helpful for Chinese knowledge organization system integration.

Key wordsKnowledge organization system integration    Finding synonymous relations    Lemmatization    Semantic merging    Granularity
收稿日期: 2014-01-02     
:  G250  
基金资助:

*本文系国家科技支撑计划课题“科技知识组织体系的协同工作系统和辅助工具开发”(项目编号: 2011BAH10B02)和中央级公益性科研院所基本科研业务费“面向知识组织系统整合的英文同义关系自动发现技术研究”(项目编号: 12R0116)的研究成果之一。

通讯作者: 李丹亚 E-mail:li.danya@imicams.ac.cn   
作者简介: 李丹亚: 提出研究思路, 设计研究方案; 李晓瑛: 进行实验; 李晓瑛, 孙海霞: 采集、清洗和分析数据; 李晓瑛, 李丹亚: 论文起草; 钱庆, 李军莲, 胡铁军: 最终版本修订。
引用本文:   
李晓瑛, 李丹亚, 钱庆, 孙海霞, 李军莲, 胡铁军. 面向知识组织系统整合的英文同义关系自动发现算法研究*[J]. 现代图书情报技术, 2014, 30(5): 26-32.
Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2014.05.04.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.05.04

[1] 陆勇. 面向信息检索的汉语同义词自动识别[M]. 南京:东南大学出版社, 2009:14-17. (Lu Yong. Automatic Recogni-tion of Chinese Synonyms for Information Retrieval [M]. Nanjing: Southeast University Press, 2009: 14-17.)
[2] Doan A, Madhavan J, Domingos P, et al. Learning to Map between Ontologies on the Semantic Web [C]. In: Proceedings of the 11th International Conference on World Wide Web (WWW'02), Hawaii, USA. New York: ACM, 2002:662-673.
[3] Stoilos G, Stamou G, Kollias S. A String Metric for Ontology Alignment[C]. In: Proceedings of the 4th International Conference on the Semantic Web (ISWC'05). Berlin, Heidelberg: Springer-Verlag, 2005:624-637.
[4] Ehrig M, Staab S. QOM - Quick Ontology Mapping [C]. In: Proceedings of the 3rd International Semantic Web Conference(ISWC'04), Hiroshima, Japan. 2004:683-697.
[5] Huang K, Geller J, Halper M, et al. Using WordNet Synonym Substitution to Enhance UMLS Source Integration[J]. Artificial Intelligence in Medicine, 2009, 46 (2): 97-109.
[6] Mougin F, Burgun A, Bodenreider O. Using WordNet to Improve the Mapping of Data Elements to UMLS for Data Sources Integration[C]. In: Proceedings of AMIA Annual Symposium, 2006: 574-578.
[7] National Library of Medicine. MeSH Browser [EB/OL]. [2013-09-10]. http://www.nlm.nih.gov/mesh/MBrowser.html.
[8] U.S.National Library of Medicine.SNOMED Clinical Terms [EB/OL]. [2012-05-12]. http://www.nlm.nih.gov/research/umls/ Snomed/snomed_main.html.
[9] 吴思竹, 钱庆, 胡铁军, 等. 词干提取方法及工具的对比分析研究[J]. 图书情报工作, 2012, 56(15): 109-115, 142. (Wu Sizhu, Qian Qing, Hu Tiejun, et al. Comparative Analysis of Methods and Tools for Word Stemming[J]. Library and Information Service, 2012, 56(15): 109-115, 142.)
[10] 李晓瑛, 李丹亚, 胡铁军. 基于UMLS专家词典与工具的词形归并算法研究[J]. 情报科学, 2013, 31(4): 134-138. (Li Xiaoying, Li Danya, Hu Tiejun. Investigation of Algorithm for Lemmatisation Based on UMLS Specialist Lexicon and Lexical Tools[J]. Information Science, 2013, 31(4): 134-138.)
[11] 吴思竹, 钱庆, 胡铁军, 等. 词形还原方法及实现工具比较分析[J]. 现代图书情报技术, 2012(3): 27-34. (Wu Sizhu, Qian Qing, Hu Tiejun, et al. Contrast Analysis of Methods and Tools for Lemmatization[J]. New Technology of Library and Information Service, 2012(3): 27-34.)
[12] 吴思竹, 钱庆, 李丹亚, 等. 三种词形还原工具对领域词汇的还原效果评估[J].情报理论与实践, 2013, 36(5): 111-115. (Wu Sizhu, Qian Qing, Li Danya, et al. Evaluation the Effects of 3 Lemmatization Tools on the Field Specialized Vocabulary[J]. Information Studies: Theory & Application, 2013, 36(5): 111-115.)
[13] NUIT. MorphAdoner V 2.0[EB/OL]. [2013-08-07]. http:// morphadorner.northwestern.edu/morphadorner/.
[14] The Stanford Natural Language Processing Group. Stanford CoreNLP[EB/OL].[2013-11-12]. http://nlp.stanford.edu/softw are/corenlp.shtml.
[15] The Lexical Systems Group. Specialist NLP Tools [EB/OL]. [2013-10-17]. http://specialist.nlm.nih.gov/.
[16] The Lexical Systems Group. Specialist Lexicon Growth- Statistics [EB/OL]. [2013-12-10]. http://lexsrv3.nlm.nih.gov/ LexSysGroup/Projects/lexicon/current/docs/designDoc/UDF/statistics/index.html.
[17] Unified Medical Language System.The Norm Program [EB/OL]. [2013-04-09]. http://www.nlm.nih.gov/research/umls/ new_users/online_learning/LEX_005.html.
[18] 李晓瑛, 李丹亚, 钱庆, 等. 面向医学领域知识组织系统整合的缩略语构成方式及歧义性鉴别研究[J]. 医学信息学杂志, 2013, 34(10): 43-46. (Li Xiaoying, Li Danya, Qian Qing, et al. Research on Abbreviation Composition Form and Ambiguity Identification for Medical Knowledge Organiza-tion System Integration [J]. Journal of Medical Informatics, 2013, 34(10): 43-46.)
[19] U.S. National Library of Medicine.MedlinePlus[EB/OL]. [2012-10-20]. http://www.nlm.nih.gov/medlineplus/healthtopics. html.
[20] The Digital Anatomist Information System[EB/OL]. [2014-01-04]. http://sig.biostr.washington.edu/projects/da/.
[21] U.S.National Library of Medicine.Unified Medical Language System [EB/OL]. [2013-11-21]. http://www.nlm.nih.gov/research/ umls/.
[22] Fung K W, Hole W T, Nelson S J, et al. Integrating SNOMED CT into the UMLS: An Exploration of Different Views of Synonymy and Quality of Editing [J]. Journal of the American Medical Informatics Association, 2005, 12(4): 486-494.
[23] University of Utah. Consumer Health Vocabulary Initiative [EB/OL].[2014-01-04]. http://consumerhealthvocab.org/.

[1] 刘峰, 张晓林. 科学数据元数据标准述评及其通用化设计研究[J]. 现代图书情报技术, 2015, 31(12): 3-12.
[2] 孙轶楠, 顾立平, 宋秀芳, 刘晶晶, 江娴. 学科数据知识库的政策调研与分析——以生命科学领域为例[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[3] 毕强, 刘健. 数字文献资源内容服务推荐方法研究[J]. 现代图书情报技术, 2015, 31(12): 21-27.
[4] 朱光. 基于零水印的图博档彩色图像资源版权保护策略研究[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[5] 王政军, 俞小怡, 金玉玲. 利用旁路监听技术约束数字资源过量下载[J]. 现代图书情报技术, 2015, 31(12): 95-100.
[6] 金玮, 赵蓉英, 殷鸽. 用户在社会化引文软件中的阅读数据积累程度与有效性分析——以Altmetrics指标为例[J]. 现代图书情报技术, 2015, 31(11): 75-81.
[7] 郑飏飏, 徐健, 肖卓. 情感分析及可视化方法在网络视频弹幕数据分析中的应用[J]. 现代图书情报技术, 2015, 31(11): 82-90.
[8] 刘悦如, 郭利敏. 微信公众号互动功能新开发[J]. 现代图书情报技术, 2015, 31(11): 104-109.
[9] 章成志, 顾晓雪. 区分标签质量的机器生成标签聚类研究[J]. 现代图书情报技术, 2015, 31(10): 22-29.
[10] 顾晓雪, 章成志. 标注内容与用户属性结合的标签聚类研究[J]. 现代图书情报技术, 2015, 31(10): 30-39.
[11] 刘丹. 利用Apache Mahout部署个性化图书推荐服务[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[12] 马雨萌, 郭进京, 王昉. e-Science环境下科学数据语义组织模型框架研究[J]. 现代图书情报技术, 2015, 31(7-8): 48-57.
[13] 吴丹, 冉爱华. 移动阅读应用的用户体验比较研究[J]. 现代图书情报技术, 2015, 31(7-8): 73-79.
[14] 陈挺, 韩涛, 李泽霞, 李国鹏, 王小梅. 科研项目布局差异对比方法研究——以NSF和EUFP项目为例[J]. 现代图书情报技术, 2015, 31(7-8): 89-96.
[15] 郭振英, 赵文兵, 魏育辉. 轻量级书目本体关联数据建设实践[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn