[Objective] In order to extract Ontology concepts from Chinese UGC information sources. [Methods] This paper proposes a mixed Ontology extraction method which extracting the fine-grained words and combining them into concepts based on linguistic methods and filters the concepts based on statistical methods. To prove the methods, the paper establishes the Ontology extraction model and develops a prototype system of concept extraction which is based on the UGC sources. [Results] The method has more excellent performance than other four concept extraction methods as the comparative samples in the experiments of concept extraction from UGC. The results of the accuracy rate and the recall rate respectively reaches 68.42% and 85.35%. [Limitations] The test set of concept extraction is from high-quality UGC sources and some of the test set is filtered manually.So the corpus scale is not enough. [Conclusions] This concept extraction method and technology has some significance in the Ontology concept extraction based on UGC.
唐晓波, 胡华. 中文UGC信息源的本体概念抽取研究*[J]. 现代图书情报技术, 2014, 30(5): 41-49.
Tang Xiaobo, Hu Hua. Research of Ontology Concept Extraction Based on Chinese UGC Sources. New Technology of Library and Information Service, 2014, 30(5): 41-49.
[1] 姜业庆. 不可小视UGC市场[EB/OL]. [2013-04-04]. http:// finance.eastmoney.com/news/1350, 20130404283364128.html. (Jiang Yeqing. Research of UGC Market [EB/OL]. [2013- 04-04]. http://finance.eastmoney.com/news/13502013040428 3364128.html.)
[2] Billsus D, Pazzani M J. Learning Collaborative Information Filters[C]. In: Proceedings of the 15th International Conference on Machine Learning (ICML'98), Madison. San Francisco: Morgan Kaufmann Publishers Inc., 1998: 46-54.
[3] 于娟, 党延忠. 本体关系学习方法研究——概念特征词法[J]. 系统工程理论与实践, 2012, 32(7): 1582-1590. (Yu Juan, Dang Yanzhong. Learning Ontology Relations from Documents: The Concept-feature Method [J]. Systems Engineering-Theory & Practice, 2012, 32(7): 1582-1590.)
[4] 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6): 68-75.(Hua Bolin. Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6): 68-75.)
[5] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011,34(12): 10-14, 33. (Ding Junjun, Zheng Yanning, Hua Bolin. Rule-based Academic Concepts Attribute Extraction[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14, 33.)
[6] Yang Y H, Du J P, Zi L L. Bootstrapping-based Automatic Acquisition of Domain Concepts for Ontology Construc-tion[J]. Chinese Journal of Electronics, 2013, 22(2): 313-318.
[7] Cohen J D. Highlights: Language-and Domain-Independent Automatic Indexing Terms for Abstracting [J]. Journal of the American Society for Information Science, 1995, 46(3): 162-174.
[8] Ji L, Sum M, Lu Q, et al. Chinese Terminology Extraction Using Window-Based Contextual Information[C]. In: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'07). Berlin, Heidelberg: Springer-Verlag, 2007: 62-74.
[9] Vu T, Aw A, Zhang M. Term Extraction Through Unithood and Termhood Unification[C]. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP-08). 2008: 631-636.
[10] 刘柏嵩. 面向数字图书馆的本体自动构建[J]. 中国图书馆学报, 2006, 32(5): 47-51. (Liu Bosong. Automatic Construction of Ontology Oriented to Digital Library[J]. Journal of Library Science in China, 2006, 32(5): 47-51.)
[11] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis[J]. Library and Information Service, 2013, 57(1): 130-135.)
[12] 周浪, 张亮, 冯冲, 等. 基于词频分布变化统计的术语抽取方法[J]. 计算机科学, 2009, 36(5): 177-180. (Zhou Lang, Zhang Liang, Feng Chong, et al. Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J]. Computer Science, 2009, 36(5): 177-180.)
[13] 罗盛芬, 孙茂松. 基于字串内部结合紧密度的汉语自动抽词实验研究[J]. 中文信息学报, 2003, 17(3): 9-14. (Luo Shengfen, Sun Maosong. Chinese Word Extraction Based on the Internal Associative Strength of Character Strings[J]. Journal of Chinese Information Processing, 2003, 17 (3): 9-14.)
[14] Chien L. PAT-tree-based Keyword Extraction for Chinese Information Retrieval[C]. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97). New York: ACM, 1997: 50-58.
[15] 徐亮. 中文新词识别研究[D]. 大连: 大连理工大学, 2009. (Xu Liang. Research of Chinese New Word Identification[D]. Dalian: Dalian University of Technology, 2009.)
[16] 自然语言处理与信息检索平台[EB/OL]. [2012-11-19]. http://www.nlpir.org/?action-viewnews-itemid-257.(Natural Language Processing & Information Retrieval Sharing Platform[EB/OL]. [2012-11-19]. http://www.nlpir.org/?action- viewnews-itemid-257.)