Please wait a minute...
New Technology of Library and Information Service  2013, Vol. Issue (12): 19-26    DOI: 10.11925/infotech.1003-3513.2013.12.04
Current Issue | Archive | Adv Search |
Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example
Hong Yunjia, Xu Xin
Department of Information Science, Business School, East China Normal University, Shanghai 200241, China
Download: PDF(713 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The paper puts forward a kind of multi-level text clustering method for the tree structure of knowledge base. In this method, the words are mapped as concepts by the domain Ontology. First the texts are represented by the top-level concepts to realize the big-size clustering, identify the different subjects of texts and formulate the main classification framework. Then the texts are represented by all concepts and non-concept feature words to further realize the small-size clustering and reveal the subjects of the texts with different depth. Finally, this method realizes the multi-level text clustering from big size to small size.
Key wordsDomain Ontology      Text clustering      Knowledge base      Chinese cuisine culture     
Received: 16 August 2013      Published: 08 January 2014
:  G250.7  

Cite this article:

Hong Yunjia, Xu Xin. Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example. New Technology of Library and Information Service, 2013, (12): 19-26.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.12.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V/I12/19

[1] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[2] 上海图书馆. 网上联合知识导航站[EB/OL].[2013-10-01]. http://zsdh.library.sh.cn:8080/. (Shanghai Library. United Knowledge Navigation Site[EB/OL].[2013-10-01]. http://zsdh.library.sh.cn:8080/.)
[3] Hotho A, Staab S, Stumme G. Ontologies Improve Text Document Clustering[C]. In: Proceedings of the 3rd IEEE International Conference on Data Mining(ICDM' 03).2003:541-544.
[4] Sedding J, Kazakov D. WordNet-based Text Document Clustering[C]. In: Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data (ROMAND), Geneva, Swiss. 2004: 104-113.
[5] Recupero D R. A New Unsupervised Method for Document Clustering by Using WordNet Lexical and Conceptual Relations[J]. Information Retrieval, 2007, 10(6): 563-579.
[6] 朱会峰, 左万利, 赫枫龄, 等. 一种基于本体的文本聚类方法[J]. 吉林大学学报:理学版, 2010, 48(2): 277-283. (Zhu Huifeng, Zuo Wanli, He Fengling, et al. A Novel Text Clustering Method Based on Ontology[J]. Journal of Jilin University: Science Edition, 2010, 48(2): 277-283.)
[7] Luo N, Zuo W L, Yuan F Y, et al. Using Ontology Semantics to Improve Text Documents Clustering[J]. Journal of Southeast University: English Edition, 2006, 22(3): 370-374.
[8] Hensman S. Construction of Conceptual Graph Representation of Texts[C].In: Proceedings of the Student Research Workshop at HLT-NAACL, Boston, USA. 2004:49-54.
[9] 明均仁. 基于本体图的文本聚类模型研究[J]. 情报科学, 2013, 31(2): 29-33. (Ming Junren. Research on Text Clustering Model Based on Ontology Graph[J]. Information Science, 2013, 31(2): 29-33.)
[10] Hotho A, Staab S, Stumme G. Ontology-based Text Document Clustering[J]. Kunstliche Intelligenz, 2002, 16(4): 48-54.
[11] 张玉峰, 何超. 基于领域本体的竞争情报聚类分析研究[J]. 情报科学, 2011, 29(11): 1613-1615. (Zhang Yufeng, He Chao. Reaserch on Competitive Intelligence Clustering Analysis Based on Domain Ontology[J]. Information Science, 2011, 29(11): 1613-1615.)
[12] 龚光明, 王薇, 蒋艳辉, 等. 基于领域本体的文本资料聚类算法改进研究[J]. 情报科学, 2013, 31(6): 129-134. (Gong Guangming, Wang Wei, Jiang Yanhui, et al. Improvement of Texts Clustering Algorithm Based on the Domain-Ontology[J]. Information Science, 2013, 31(6): 129-134.)
[13] 孙海霞, 钱庆, 成颖. 基于本体的语义相似度计算方法研究综述[J]. 现代图书情报技术, 2010(1): 51-56. (Sun Haixia, Qian Qing, Cheng Ying. Review of Ontology-based Semantic Similarity Measuring[J]. New Technology of Library and Information Service, 2010(1): 51-56.)
[14] 赵捧未, 袁颖. 基于领域本体的语义相似度计算方法研究[J]. 科技情报开发与经济, 2010, 20(8): 74-77. (Zhao Pengwei, Yuan Ying. Research on Semantic Similarity Computing Methods Based on Domain-Ontology[J]. Sci-Tech Information Development & Economy, 2010, 20(8): 74-77.)
[15] 吕刚, 郑诚. 基于加权的本体相似度计算方法[J]. 计算机工程与设计, 2010, 31(5): 1093-1095. (Lv Gang, Zheng Cheng. Method of Ontology Similarity Calculation Based on Weighted[J]. Computer Engineering and Design, 2010, 31(5): 1093-1095.)
[16] 谢红薇, 颜小林, 余雪丽. 基于本体的Web页面聚类研究[J]. 计算机科学, 2008, 35(9): 153-155. (Xie Hongwei, Yan Xiaolin, Yu Xueli. Research on Web Page Clustering Based on Ontology[J]. Computer Science, 2008, 35(9): 153-155.)
[17] 王刚, 邱玉辉, 蒲国林. 一个基于语义元的相似度计算方法研究[J]. 计算机应用研究, 2008, 25(11): 3253-3255. (Wang Gang, Qiu Yuhui, Pu Guolin. Research on Similarity Based on Semantic Unit[J]. Application Research of Computers, 2008, 25(11): 3253-3255.)
[18] 王刚, 邱玉辉. 基于本体及相似度的文本聚类研究[J]. 计算机应用研究, 2010, 27(7): 2494-2497. (Wang Gang, Qiu Yuhui. Study on Text Clustering Based on Ontology and Similarity[J]. Application Research of Computers, 2010, 27(7): 2494-2497.)
[19] Basili R, Cammisa M, Moschitti A. A Semantic Kernel to Classify Texts with Very Few Training Examples[J]. Informatica, 2006, 30(2): 163-172.
[20] Zhang L, Wang Z. Ontology-based Clustering Algorithm with Feature Weights[J]. Journal of Computational Information Systems, 2010, 6(9): 2959-2966.
[21] 张玉峰, 何超, 王志芳, 等. 融合语义聚类的企业竞争力影响因素分析研究[J]. 现代图书情报技术, 2012(9): 49-55. (Zhang Yufeng, He Chao, Wang Zhifang, et al. Research on Enterprise Competitiveness Factor Analysis Combining Semantic Clustering[J]. New Technology of Library and Information Service, 2012(9): 49-55.)
[22] 王晓东, 郭雷, 方俊, 等. 一种基于本体的抽象度可调文档聚类[J]. 计算机工程与应用, 2007, 43(29): 172-175. (Wang Xiaodong, Guo Lei, Fang Jun, et al. Ontology-based Adjustable Text Clustering Using Abstract Degree of Concept[J]. Computer Engineering and Applications, 2007, 43(29): 172-175.)
[23] 林利. 基于本体的文本聚类的应用研究[D]. 天津:天津大学, 2012. (Lin Li. Research and Application of Document Clustering Based on Ontology[D]. Tianjin: Tianjin University, 2012.)
[24] 张爱琦, 左万利, 王英, 等. 基于多个领域本体的文本层次被定义聚类方法[J]. 计算机科学, 2010, 37(3): 199-204. (Zhang Aiqi, Zuo Wanli, Wang Ying, et al. Text Hierarchical Clustering Based on Several Domain Ontologies[J]. Computer Science, 2010, 37(3): 199-204.)
[25] Richardon R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words[EB/OL].[2013-02-23]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.4773&rep=repl&type=pdf.
[26] 高茂庭. 文本聚类分析若干问题研究[D]. 天津:天津大学, 2007. (Gao Maoting. Study on Several Issues of Text Clustering[D]. Tianjin: Tianjin University, 2007.)
[1] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[2] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[3] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[4] Youshi He,Shufang He. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[5] Qin Guan, Sanhong Deng, Hao Wang. Chinese Stopwords for Text Clustering: A Comparative Study[J]. 数据分析与知识发现, 2017, 1(3): 72-80.
[6] Guo Chen,Lu Xiao. Linking Knowledge Elements from Online Community[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[7] Zhou Pengcheng,Wu Chuan,Lu Wei. Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase[J]. 现代图书情报技术, 2016, 32(6): 1-11.
[8] Lu Jiaying,Yuan Qinjian,Huang Qi,Qian Yunjie. Building Product Domain Ontology with Concept Lattice Theory[J]. 现代图书情报技术, 2016, 32(5): 38-46.
[9] Chen Dongyi,Zhou Zicheng,Jiang Shengyi,Wang Lianxi,Wu Jialin. A Framework for Customer Segmentation on Enterprises’ Microblog[J]. 现代图书情报技术, 2016, 32(2): 43-51.
[10] Dongsheng Zhai, He Liu, Jie Zhang, Liwei Cai. Managing Patent Semantic Knowledge with Graph Database[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[11] Bao Yulai,Bi Qiang. Semantic Retrieval for Mongolian Music: An Explorative Study[J]. 现代图书情报技术, 2016, 32(11): 94-100.
[12] Gong Kaile,Cheng Ying,Sun Jianjun. Clustering Blog Posts with Co-occurrence Analysis[J]. 现代图书情报技术, 2016, 32(10): 50-58.
[13] Zhang Fan, Le Xiaoqiu. Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. 现代图书情报技术, 2015, 31(5): 15-23.
[14] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[15] Duan Yufeng, Huang Sisi. Research on Construction of Chinese Plant Species Diversity Domain Ontology Based on BFO[J]. 现代图书情报技术, 2015, 31(12): 72-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn