Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (5): 41-49    DOI: 10.11925/infotech.1003-3513.2014.05.06
Research of Ontology Concept Extraction Based on Chinese UGC Sources
Tang Xiaobo, Hu Hua
School of Information Management, Wuhan University, Wuhan 430072, China
Export: BibTeX | EndNote (RIS)      

[Objective] In order to extract Ontology concepts from Chinese UGC information sources. [Methods] This paper proposes a mixed Ontology extraction method which extracting the fine-grained words and combining them into concepts based on linguistic methods and filters the concepts based on statistical methods. To prove the methods, the paper establishes the Ontology extraction model and develops a prototype system of concept extraction which is based on the UGC sources. [Results] The method has more excellent performance than other four concept extraction methods as the comparative samples in the experiments of concept extraction from UGC. The results of the accuracy rate and the recall rate respectively reaches 68.42% and 85.35%. [Limitations] The test set of concept extraction is from high-quality UGC sources and some of the test set is filtered manually.So the corpus scale is not enough. [Conclusions] This concept extraction method and technology has some significance in the Ontology concept extraction based on UGC.

Key wordsConcept extraction      Speech rules      Seed word      Mutual information      Information entropy     
Received: 11 November 2013      Published: 06 June 2014
:  TP391  

Cite this article:

Tang Xiaobo, Hu Hua. Research of Ontology Concept Extraction Based on Chinese UGC Sources. New Technology of Library and Information Service, 2014, 30(5): 41-49.

URL:     OR

[1] 姜业庆. 不可小视UGC市场[EB/OL]. [2013-04-04]. http://, 20130404283364128.html. (Jiang Yeqing. Research of UGC Market [EB/OL]. [2013- 04-04]. 3364128.html.)
[2] Billsus D, Pazzani M J. Learning Collaborative Information Filters[C]. In: Proceedings of the 15th International Conference on Machine Learning (ICML'98), Madison. San Francisco: Morgan Kaufmann Publishers Inc., 1998: 46-54.
[3] 于娟, 党延忠. 本体关系学习方法研究——概念特征词法[J]. 系统工程理论与实践, 2012, 32(7): 1582-1590. (Yu Juan, Dang Yanzhong. Learning Ontology Relations from Documents: The Concept-feature Method [J]. Systems Engineering-Theory & Practice, 2012, 32(7): 1582-1590.)
[4] 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6): 68-75.(Hua Bolin. Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6): 68-75.)
[5] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011,34(12): 10-14, 33. (Ding Junjun, Zheng Yanning, Hua Bolin. Rule-based Academic Concepts Attribute Extraction[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14, 33.)
[6] Yang Y H, Du J P, Zi L L. Bootstrapping-based Automatic Acquisition of Domain Concepts for Ontology Construc-tion[J]. Chinese Journal of Electronics, 2013, 22(2): 313-318.
[7] Cohen J D. Highlights: Language-and Domain-Independent Automatic Indexing Terms for Abstracting [J]. Journal of the American Society for Information Science, 1995, 46(3): 162-174.
[8] Ji L, Sum M, Lu Q, et al. Chinese Terminology Extraction Using Window-Based Contextual Information[C]. In: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'07). Berlin, Heidelberg: Springer-Verlag, 2007: 62-74.
[9] Vu T, Aw A, Zhang M. Term Extraction Through Unithood and Termhood Unification[C]. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP-08). 2008: 631-636.
[10] 刘柏嵩. 面向数字图书馆的本体自动构建[J]. 中国图书馆学报, 2006, 32(5): 47-51. (Liu Bosong. Automatic Construction of Ontology Oriented to Digital Library[J]. Journal of Library Science in China, 2006, 32(5): 47-51.)
[11] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J]. 图书情报工作, 2013, 57(1): 130-135. (Qu Peng, Wang Huilin. Patent Term Extraction for Information Analysis[J]. Library and Information Service, 2013, 57(1): 130-135.)
[12] 周浪, 张亮, 冯冲, 等. 基于词频分布变化统计的术语抽取方法[J]. 计算机科学, 2009, 36(5): 177-180. (Zhou Lang, Zhang Liang, Feng Chong, et al. Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J]. Computer Science, 2009, 36(5): 177-180.)
[13] 罗盛芬, 孙茂松. 基于字串内部结合紧密度的汉语自动抽词实验研究[J]. 中文信息学报, 2003, 17(3): 9-14. (Luo Shengfen, Sun Maosong. Chinese Word Extraction Based on the Internal Associative Strength of Character Strings[J]. Journal of Chinese Information Processing, 2003, 17 (3): 9-14.)
[14] Chien L. PAT-tree-based Keyword Extraction for Chinese Information Retrieval[C]. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97). New York: ACM, 1997: 50-58.
[15] 徐亮. 中文新词识别研究[D]. 大连: 大连理工大学, 2009. (Xu Liang. Research of Chinese New Word Identification[D]. Dalian: Dalian University of Technology, 2009.)
[16] 自然语言处理与信息检索平台[EB/OL]. [2012-11-19]. Language Processing & Information Retrieval Sharing Platform[EB/OL]. [2012-11-19]. viewnews-itemid-257.)

[1] Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang. Extracting New Words with Mutual Information and Logistic Regression[J]. 数据分析与知识发现, 2019, 3(8): 105-113.
[2] Jia Xiaoting,Wang Mingyang,Cao Yu. Automatic Abstracting of Chinese Document with Doc2Vec and Improved Clustering Algorithm[J]. 数据分析与知识发现, 2018, 2(2): 86-95.
[3] Wang Zhongyi,Zhang Heming,Huang Jing,Li Chunya. Studying Knowledge Dissemination of Online Q&A Community with Social Network Analysis[J]. 数据分析与知识发现, 2018, 2(11): 80-94.
[4] Guo Shunli,Zhang Xiangxian. Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. 现代图书情报技术, 2016, 32(2): 67-74.
[5] He Yue, Song Lingxi, Qi Liyun. Spillover Effect of Internet Word of Mouth in Negative Events——Take the “Deadly Yuantong Express” Event for an Example[J]. 现代图书情报技术, 2015, 31(10): 58-64.
[6] Gu Jun, Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts[J]. 现代图书情报技术, 2011, 27(4): 29-34.
[7] Zhu Weili,Han Yu,Xiao Xiaodan,Chen Xianlai . Study of Automatic Construction of Medicine Keyword-Descriptor Comparison List[J]. 现代图书情报技术, 2006, 1(8): 51-54.
[8] Geng Qian,Geng Chong. Concept Extraction in Automatic OntologyConstruction Using Words Cooccurrence[J]. 现代图书情报技术, 2006, 1(2): 43-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938