New Technology of Library and Information Service  2013, Vol. 29 Issue (1): 15-21    DOI: 10.11925/infotech.1003-3513.2013.01.03
The Study on Out-of-vocabulary Identification of Chinese Biomedical Field Based on Hybrid Method
Sun Haixia1, Li Junlian1, Wu Yingjie1, Wu Suhui2
1. Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China;
2. Department of Information Management, Nanjing University, Nanjing 210093, China
Abstract  First, the status of research on out-of-vocabulary automatic identification is introduced briefly. Then,combining the word length distribution and morphological characteristics of Chinese biomedical field, this paper presents an hybrid method of out-of-vocabulary identification of Chinese biomedical field, which is based on N-gram, integrating the methods of the field dictionary-based, filtered corpus-based, and rules-based. Finally, on a sample set of pharmaceutical journals data of Chinese BioMedical Literature Database, the authors make an experiment of the proposed hybrid method, and the experimental results achieve a good performance.
Key wordsOut-of-vocabulary      N-gram      Hybrid method      Biomedical     
Received: 17 December 2012      Published: 29 March 2013
:  TP393  

Cite this article:

Sun Haixia, Li Junlian, Wu Yingjie, Wu Suhui. The Study on Out-of-vocabulary Identification of Chinese Biomedical Field Based on Hybrid Method. New Technology of Library and Information Service, 2013, 29(1): 15-21.

