Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology
Fan Xuexue1, Wang Zhirong1, Xu Wu1, Liang Yin2, Ma Xiaohu3
1 Clinical Medical School, Xuzhou Medical College, Xuzhou 221004, China;
2 School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China;
3 School of Computer Science and Technology, Soochow University, Suzhou 215006, China
[Objective] Based on the comprehensive medical Ontologies, this paper proposes a new algorithm to enhance the precision of semantic similarity estimation of medical terminology. [Methods] On the basis of the hierarchy and semantic relationships of concepts of SNOMED CT and MeSH, the semantic parameters such as depth and distance are extracted. Then the depth factor and the distance factor are obtained weighted by the concept density, and the function of semantic similarity is thus established. [Results] The algorithm is applicable to both distinctive medical Ontologies, and the experimental results demonstrate that this algorithm has higher correlation coefficient with manual scoring versus conventional algorithms. [Limitations] This algorithm is subject to hierarchy of Ontologies. [Conclusions] The new algorithm benefits the enhanced precision of semantic similarity estimation of medical terminology.
范雪雪, 王志荣, 徐晤, 梁银, 马小虎. 基于医学本体的术语相似度算法研究[J]. 现代图书情报技术, 2015, 31(12): 57-64.
Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology. New Technology of Library and Information Service, 2015, 31(12): 57-64.
[1] Chen M Y, Chu H C, Chen Y M. Developing a Semantic-Enable Information Retrieval Mechanism [J]. Expert Systems with Application, 2010, 37(1): 322-340.
[2] Kimtani D K, Choudhury J, Chakrabarty A. Improvement in Word Sense Disambiguation by Introducing Enhancements in English WordNet Structure [J]. International Journal on Computer Science and Engineering, 2012, 4(7): 1366-1370.
[3] Leroy G, Rindflesch T C. Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets [J]. International Journal of Medical Informatics, 2005, 74(7-8): 573-585
[4] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[5] Stevenson M, Greenwood M A. A Semantic Approach to IE Pattern Introduction [C]. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 379-386.
[6] Asservatham S, Bennani Y. Semi-Structured Document Categorization with a Semantic Kernel [J]. Pattern Recognition, 2009, 42(9): 2067-2076.
[7] Batet M, Valls A, Gibert K. Improving Classical Clustering with Ontologies [C]. In: Proceedings of the 4th World Conference of the IASC, Yokohama, Japan. 2008: 137-146.
[8] Lu H M, Chen H, Zeng D, et al. Multilingual Chief Complaint Classification for Syndromic Surveillance: An Experiment with Chinese Chief Complaints [J]. International Journal of Medical Informatics, 2009, 78(5): 308-320.
[9] Papachristoudis G, Diplaris S, Mitkas P A.SoFoCles: Feature Filtering for Microarray Classification Based on Gene Ontology [J]. Journal of Biomedical Informatics, 2010, 43(1): 1-14.
[10] 盛秋艳. 一种基于本体的语义相似度计算方法[J]. 情报科学, 2012, 30(8): 1238-1241. (Sheng Qiuyan. Research on the Measuring of Semantic Similarity Based Ontology [J]. Information Scinece, 2012, 30(8): 1238-1241.)
[11] 刘宏哲, 须德. 基于本体的语义相似度和相关度计算研究综述[J]. 计算机科学, 2012, 39(2): 8-13. (Liu Hongzhe, Xu De. Ontology Based Semantic Similarity and Relatedness Measures Review [J]. Computer Science, 2012, 39(2): 8-13.)
[12] 秦春秀, 祝婷, 赵捧未, 等. 自然语言语义分析研究进展[J]. 图书情报工作, 2014, 58(22): 130-137. (Qin Chunxiu, Zhu Ting, Zhao Pengwei, et al. Research Review on Semantics Analysis of Natural Language [J]. Library and Information Service, 2014, 58(22): 130-137.)
[13] Landauer T K, Foltz P W, Laham D. An Introduction to Lantent Semantic Analysis [J]. Discourse Processess, 1998, 25(2-3): 259-284.
[14] 陈海燕. 基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015, 42(1): 261-267. (Chen Haiyan. Measuring Semantic Similarity Between Words Using Web Search Engines [J]. Computer Science, 2015, 42(2): 261-267.)
[15] 李赟. 基于中文维基百科的语义知识挖掘相关研究[D]. 北京: 北京邮电大学, 2009. (Li Yun. Mining Semantic Knowledge from Chinese Wikipidia [D]. Beijing: Beijing University of Posts and Telecommunications, 2009.)
[16] Lord P W, Stevens R D, Brass A, et al. Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation [J]. Bioinformatics, 2003, 19(10): 1275-1283.
[17] Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy [C]. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI95). 1995: 448-453.
[18] Lin D. An Information-Theoretic Definition of Similarity [C]. In: Proceedings of the 15th International Conference on Machine Learning (ICML98). 1998: 296-304.
[19] Jiang J J, Conrath D W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy [C]. In: Proceedings of the 10th International Conference on Research in Computational Linguistics. 1997: 19-33.
[20] Batet M, Sanchez D, Valls A. An Ontology-Based Measure to Compute Semantic Similarity in Biomedicine [J]. Journal of Biomedical Informatics, 2011, 44(1): 118-125.
[21] Sanchez D, Batet M. Semantic Similarity Estimation in the Biomedical Domain: An Ontology-Based Information- Theoretic Perspective [J]. Journal of Biomedical Informatics, 2011, 44(5): 749-759.
[22] 游彬, 严岳松, 孙英阁, 等. 基于HowNet的信息量计算语义相似度算法[J]. 计算机系统应用, 2013, 22(1): 129-133. (You Bin, Yan Yuesong, Sun Yingge, et al. Method of Information Content Evaluating Semantic Similarity on HowNet [J]. Computer Systems & Applications, 2013, 22(1): 129-133.)
[23] Rada R, Mili H, Bichnell E, et al. Development and Application of a Metric on Semantic Nets [J]. IEEE Transactions on Systems, Man and Cybernetics, 1989, 19(1): 17-30.
[24] Leacock C, Chodorw M. Combining Local Context and WordNet Similarity for Word Sense Identification [A]. // WordNet: An Electronic Lexical Database [M]. MIT Press, 1998: 265-283.
[25] Wu Z, Palmer M. Verb Semantics and Lexical Selection [C]. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Assiciation for Computational Liguistics, 1994: 133-138.
[26] Tversky A. Features of Similarity [J]. Psychological Review, 1977, 84(4): 327-352.
[27] Patwardhan S, Pedersen T. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts [C]. In: Proceedings of the EACL Workshop on Making Sense of Sense: Bringing Computaional Linguistics and Psycholinguistics Together, Trento, Italy. 2006: 1-8.
[28] Banerjee S, Pedersen T. Extended Gloss Overlaps as a Measure of Semantic Relatedness [C]. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI). 2003: 805-810.
[29] Wan S, Angryk R A. Measuring Semantic Similarity Using Wordnet-Based Context Vectors [C]. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. 2007: 908-913.
[30] Li Y, Bander Z A, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882.
[31] 吴健, 吴朝晖, 李莹, 等. 基于本体论和词汇语义相似度的Web服务发现[J]. 计算机学报, 2005, 28(4): 595-602. (Wu Jian, Wu Zhaohui, Li Ying, et al. Web Service Discovery Based on Ontology and Similarity of Words [J]. Chinese Journal of Computers, 2005, 28(4): 595-602.)
[32] Pedersen T, Pakhomov S, Patwardhan S, et al. Measures of Semantic Similarity and Relatedness in the Biomedical Domain [J]. Journal of Biomedical Informatics, 2007, 40(3): 288-299.
[33] Hliaoutakis A, Varelas G, Voutsakis E, et al. Information Retrieval by Semantic Similarity [J]. International Journal on Semantic Web and Information Systems, 2006, 2(3): 55-73.
[34] Al-Mubaid H, Nguyen H A. A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain [C]. In: Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. New York: IEEE Computer Society, 2006: 2713-2717.
[35] 李文庆, 谢红薇. 基于医疗本体的语义相似度评估方法[J]. 计算机工程与设计, 2013, 34(4): 1287-1291. (Li Wenqing, Xie Hongwei. Semantic Similarity Estimation Method Based on Medical Ontology [J]. Computer Engineering and Design, 2013, 34(4): 1287-1291.)
[36] 孙海霞, 钱庆, 吴英杰, 等. MeSH词表的语义计相似度计算研究[J]. 现代图书情报技术, 2010(6): 12-16. (Sun Haixia, Qian Qing, Wu Yingjie, et al. Research on Semantic Similarity Measuring of MeSH [J]. New Technology of Library and Information Service, 2010(6): 12-16.)