Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (12): 57-64    DOI: 10.11925/infotech.1003-3513.2015.12.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
范雪雪1, 王志荣1, 徐晤1, 梁银2, 马小虎3
1 徐州医学院临床学院 徐州 221004;
2 江苏师范大学计算机科学与技术学院 徐州 221116;
3 苏州大学计算机科学与技术学院 苏州 215006
Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology
Fan Xuexue1, Wang Zhirong1, Xu Wu1, Liang Yin2, Ma Xiaohu3
1 Clinical Medical School, Xuzhou Medical College, Xuzhou 221004, China;
2 School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China;
3 School of Computer Science and Technology, Soochow University, Suzhou 215006, China
全文: PDF(462 KB)   HTML  
输出: BibTeX | EndNote (RIS)      

[目的]借助大型的医学本体, 提升医学术语相似度计算精度。[方法]依据SNOMED CT和MeSH两个医学本体的层级结构和语义关系, 提取概念术语的深度、距离等语义参数, 并用概念密度对其加权得到深度系数和距离系数, 构造相似度函数进行术语相似度计算。[结果]该算法能在两个医学本体中进行术语相似度计算, 较传统算法更加接近人工评分标准。[局限]该方法较为依赖本体结构。[结论]该方法能够提高以医学本体为基础的术语相似度计算精确度。

E-mail Alert

[Objective] Based on the comprehensive medical Ontologies, this paper proposes a new algorithm to enhance the precision of semantic similarity estimation of medical terminology. [Methods] On the basis of the hierarchy and semantic relationships of concepts of SNOMED CT and MeSH, the semantic parameters such as depth and distance are extracted. Then the depth factor and the distance factor are obtained weighted by the concept density, and the function of semantic similarity is thus established. [Results] The algorithm is applicable to both distinctive medical Ontologies, and the experimental results demonstrate that this algorithm has higher correlation coefficient with manual scoring versus conventional algorithms. [Limitations] This algorithm is subject to hierarchy of Ontologies. [Conclusions] The new algorithm benefits the enhanced precision of semantic similarity estimation of medical terminology.

收稿日期: 2015-05-28     
:  TP391  

本文系江苏省现代教育技术研究课题“智能无纸化医学考试系统的开发”(项目编号:19696)和徐州医学院科研课题“基于SNOMED CT的医学术语相似度计算研究”(项目编号:2014KJ31)的研究成果之一。

通讯作者: 范雪雪, ORCID: 0000-0002-0450-480X, E-mail:。     E-mail:
作者简介: 作者贡献声明:范雪雪: 提出研究思路, 设计并实现算法, 撰写论文; 王志荣, 徐晤: 提供实验数据, 进行数据分析; 梁银: 数据分析, 论文修订; 马小虎: 论文修订。
范雪雪, 王志荣, 徐晤, 梁银, 马小虎. 基于医学本体的术语相似度算法研究[J]. 现代图书情报技术, 2015, 31(12): 57-64.
Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.12.09.

[1] Chen M Y, Chu H C, Chen Y M. Developing a Semantic-Enable Information Retrieval Mechanism [J]. Expert Systems with Application, 2010, 37(1): 322-340.
[2] Kimtani D K, Choudhury J, Chakrabarty A. Improvement in Word Sense Disambiguation by Introducing Enhancements in English WordNet Structure [J]. International Journal on Computer Science and Engineering, 2012, 4(7): 1366-1370.
[3] Leroy G, Rindflesch T C. Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets [J]. International Journal of Medical Informatics, 2005, 74(7-8): 573-585
[4] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[5] Stevenson M, Greenwood M A. A Semantic Approach to IE Pattern Introduction [C]. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 379-386.
[6] Asservatham S, Bennani Y. Semi-Structured Document Categorization with a Semantic Kernel [J]. Pattern Recognition, 2009, 42(9): 2067-2076.
[7] Batet M, Valls A, Gibert K. Improving Classical Clustering with Ontologies [C]. In: Proceedings of the 4th World Conference of the IASC, Yokohama, Japan. 2008: 137-146.
[8] Lu H M, Chen H, Zeng D, et al. Multilingual Chief Complaint Classification for Syndromic Surveillance: An Experiment with Chinese Chief Complaints [J]. International Journal of Medical Informatics, 2009, 78(5): 308-320.
[9] Papachristoudis G, Diplaris S, Mitkas P A.SoFoCles: Feature Filtering for Microarray Classification Based on Gene Ontology [J]. Journal of Biomedical Informatics, 2010, 43(1): 1-14.
[10] 盛秋艳. 一种基于本体的语义相似度计算方法[J]. 情报科学, 2012, 30(8): 1238-1241. (Sheng Qiuyan. Research on the Measuring of Semantic Similarity Based Ontology [J]. Information Scinece, 2012, 30(8): 1238-1241.)
[11] 刘宏哲, 须德. 基于本体的语义相似度和相关度计算研究综述[J]. 计算机科学, 2012, 39(2): 8-13. (Liu Hongzhe, Xu De. Ontology Based Semantic Similarity and Relatedness Measures Review [J]. Computer Science, 2012, 39(2): 8-13.)
[12] 秦春秀, 祝婷, 赵捧未, 等. 自然语言语义分析研究进展[J]. 图书情报工作, 2014, 58(22): 130-137. (Qin Chunxiu, Zhu Ting, Zhao Pengwei, et al. Research Review on Semantics Analysis of Natural Language [J]. Library and Information Service, 2014, 58(22): 130-137.)
[13] Landauer T K, Foltz P W, Laham D. An Introduction to Lantent Semantic Analysis [J]. Discourse Processess, 1998, 25(2-3): 259-284.
[14] 陈海燕. 基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015, 42(1): 261-267. (Chen Haiyan. Measuring Semantic Similarity Between Words Using Web Search Engines [J]. Computer Science, 2015, 42(2): 261-267.)
[15] 李赟. 基于中文维基百科的语义知识挖掘相关研究[D]. 北京: 北京邮电大学, 2009. (Li Yun. Mining Semantic Knowledge from Chinese Wikipidia [D]. Beijing: Beijing University of Posts and Telecommunications, 2009.)
[16] Lord P W, Stevens R D, Brass A, et al. Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation [J]. Bioinformatics, 2003, 19(10): 1275-1283.
[17] Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy [C]. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI95). 1995: 448-453.
[18] Lin D. An Information-Theoretic Definition of Similarity [C]. In: Proceedings of the 15th International Conference on Machine Learning (ICML98). 1998: 296-304.
[19] Jiang J J, Conrath D W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy [C]. In: Proceedings of the 10th International Conference on Research in Computational Linguistics. 1997: 19-33.
[20] Batet M, Sanchez D, Valls A. An Ontology-Based Measure to Compute Semantic Similarity in Biomedicine [J]. Journal of Biomedical Informatics, 2011, 44(1): 118-125.
[21] Sanchez D, Batet M. Semantic Similarity Estimation in the Biomedical Domain: An Ontology-Based Information- Theoretic Perspective [J]. Journal of Biomedical Informatics, 2011, 44(5): 749-759.
[22] 游彬, 严岳松, 孙英阁, 等. 基于HowNet的信息量计算语义相似度算法[J]. 计算机系统应用, 2013, 22(1): 129-133. (You Bin, Yan Yuesong, Sun Yingge, et al. Method of Information Content Evaluating Semantic Similarity on HowNet [J]. Computer Systems & Applications, 2013, 22(1): 129-133.)
[23] Rada R, Mili H, Bichnell E, et al. Development and Application of a Metric on Semantic Nets [J]. IEEE Transac­tions on Systems, Man and Cybernetics, 1989, 19(1): 17-30.
[24] Leacock C, Chodorw M. Combining Local Context and WordNet Similarity for Word Sense Identification [A]. // WordNet: An Electronic Lexical Database [M]. MIT Press, 1998: 265-283.
[25] Wu Z, Palmer M. Verb Semantics and Lexical Selection [C]. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Assiciation for Computational Liguistics, 1994: 133-138.
[26] Tversky A. Features of Similarity [J]. Psychological Review, 1977, 84(4): 327-352.
[27] Patwardhan S, Pedersen T. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts [C]. In: Proceedings of the EACL Workshop on Making Sense of Sense: Bringing Computaional Linguistics and Psycho­linguistics Together, Trento, Italy. 2006: 1-8.
[28] Banerjee S, Pedersen T. Extended Gloss Overlaps as a Measure of Semantic Relatedness [C]. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI). 2003: 805-810.
[29] Wan S, Angryk R A. Measuring Semantic Similarity Using Wordnet-Based Context Vectors [C]. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. 2007: 908-913.
[30] Li Y, Bander Z A, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882.
[31] 吴健, 吴朝晖, 李莹, 等. 基于本体论和词汇语义相似度的Web服务发现[J]. 计算机学报, 2005, 28(4): 595-602. (Wu Jian, Wu Zhaohui, Li Ying, et al. Web Service Discovery Based on Ontology and Similarity of Words [J]. Chinese Journal of Computers, 2005, 28(4): 595-602.)
[32] Pedersen T, Pakhomov S, Patwardhan S, et al. Measures of Semantic Similarity and Relatedness in the Biomedical Domain [J]. Journal of Biomedical Informatics, 2007, 40(3): 288-299.
[33] Hliaoutakis A, Varelas G, Voutsakis E, et al. Information Retrieval by Semantic Similarity [J]. International Journal on Semantic Web and Information Systems, 2006, 2(3): 55-73.
[34] Al-Mubaid H, Nguyen H A. A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain [C]. In: Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. New York: IEEE Computer Society, 2006: 2713-2717.
[35] 李文庆, 谢红薇. 基于医疗本体的语义相似度评估方法[J]. 计算机工程与设计, 2013, 34(4): 1287-1291. (Li Wenqing, Xie Hongwei. Semantic Similarity Estimation Method Based on Medical Ontology [J]. Computer Engineering and Design, 2013, 34(4): 1287-1291.)
[36] 孙海霞, 钱庆, 吴英杰, 等. MeSH词表的语义计相似度计算研究[J]. 现代图书情报技术, 2010(6): 12-16. (Sun Haixia, Qian Qing, Wu Yingjie, et al. Research on Semantic Similarity Measuring of MeSH [J]. New Technology of Library and Information Service, 2010(6): 12-16.)

[1] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法
研究 *
[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] 孔贝贝,谢靖,钱力,常志军,吴振新. 科技大数据增值丰富化方法研究与工具研发 *[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] 任海英, 于立婷. 一种基于维基百科的多策略词义消歧方法[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[7] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[8] 叶川, 马静. 多媒体微博评论信息的主题发现算法研究[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[9] 颉夏青, 吴旭. “经典阅读”网络平台可视化技术应用及实现[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[10] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[12] 许德山, 李辉, 张运良. 文献关键词链接标引方法研究[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[13] 敦文杰, 孙一钢, 朱先忠. 互联网络电视多媒体文档格式设计与实现[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[14] 陈诗琴, 李文江. WebSocket在图书馆移动信息服务中的应用[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[15] 童国平, 孙建军. 基于搜索日志的用户行为分析[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
Full text



版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190