New Technology of Library and Information Service  2014, Vol. 30 Issue (5): 41-49    DOI: 10.11925/infotech.1003-3513.2014.05.06
Research of Ontology Concept Extraction Based on Chinese UGC Sources
Tang Xiaobo, Hu Hua
School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] In order to extract Ontology concepts from Chinese UGC information sources. [Methods] This paper proposes a mixed Ontology extraction method which extracting the fine-grained words and combining them into concepts based on linguistic methods and filters the concepts based on statistical methods. To prove the methods, the paper establishes the Ontology extraction model and develops a prototype system of concept extraction which is based on the UGC sources. [Results] The method has more excellent performance than other four concept extraction methods as the comparative samples in the experiments of concept extraction from UGC. The results of the accuracy rate and the recall rate respectively reaches 68.42% and 85.35%. [Limitations] The test set of concept extraction is from high-quality UGC sources and some of the test set is filtered manually.So the corpus scale is not enough. [Conclusions] This concept extraction method and technology has some significance in the Ontology concept extraction based on UGC.

Key wordsConcept extraction      Speech rules      Seed word      Mutual information      Information entropy     
Received: 11 November 2013      Published: 06 June 2014
Tang Xiaobo, Hu Hua. Research of Ontology Concept Extraction Based on Chinese UGC Sources. New Technology of Library and Information Service, 2014, 30(5): 41-49.

