New Technology of Library and Information Service  2015, Vol. 31 Issue (4): 41-49    DOI: 10.11925/infotech.1003-3513.2015.04.06
The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set
Duan Yufeng1, Zhu Wenjing2, Chen Qiao1, Liu Wei3, Liu Fenghong4
1 Business School, East China Normal University, Shanghai 200241, China;
2 Shanghai Library, Shanghai 200031, China;
3 School of Public Economics and Administration, Shanghai University of Finance and Economics, Shanghai 200433, China;
4 Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
[Objective] Establish a model to improve the out-of-vocabulary identification capability, reduce the cost of manual intervention. [Methods] On the basis of the hypothesis, a out-of-vocabulary identification model is set up combining CRFs and domain Ontology elements set. Using biodiversity text as samples, the rationality of the model is verified by comparing the performance differences among models and testing hypothesis. [Results] The experimental results show that the model established by this study has the best identification capability. The results prove that the hypothesis is true, and the model is reasonable and scientific. [Limitations] The tagging accuracy of the model remains to be improved. [Conclusions] The model established in this paper has better identification capability, while greatly reducing the cost of artificial training dataset.

Key wordsCRFs      Domain Ontology      Out-of-vocabulary identification     
Received: 19 September 2014      Published: 21 May 2015
Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set. New Technology of Library and Information Service, 2015, 31(4): 41-49.

