|
|
Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example |
Hong Yunjia, Xu Xin |
Department of Information Science, Business School, East China Normal University, Shanghai 200241, China |
|
|
Abstract The paper puts forward a kind of multi-level text clustering method for the tree structure of knowledge base. In this method, the words are mapped as concepts by the domain Ontology. First the texts are represented by the top-level concepts to realize the big-size clustering, identify the different subjects of texts and formulate the main classification framework. Then the texts are represented by all concepts and non-concept feature words to further realize the small-size clustering and reveal the subjects of the texts with different depth. Finally, this method realizes the multi-level text clustering from big size to small size.
|
Received: 16 August 2013
Published: 08 January 2014
|
|
[1] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. [2] 上海图书馆. 网上联合知识导航站[EB/OL].[2013-10-01]. http://zsdh.library.sh.cn:8080/. (Shanghai Library. United Knowledge Navigation Site[EB/OL].[2013-10-01]. http://zsdh.library.sh.cn:8080/.) [3] Hotho A, Staab S, Stumme G. Ontologies Improve Text Document Clustering[C]. In: Proceedings of the 3rd IEEE International Conference on Data Mining(ICDM' 03).2003:541-544. [4] Sedding J, Kazakov D. WordNet-based Text Document Clustering[C]. In: Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data (ROMAND), Geneva, Swiss. 2004: 104-113. [5] Recupero D R. A New Unsupervised Method for Document Clustering by Using WordNet Lexical and Conceptual Relations[J]. Information Retrieval, 2007, 10(6): 563-579. [6] 朱会峰, 左万利, 赫枫龄, 等. 一种基于本体的文本聚类方法[J]. 吉林大学学报:理学版, 2010, 48(2): 277-283. (Zhu Huifeng, Zuo Wanli, He Fengling, et al. A Novel Text Clustering Method Based on Ontology[J]. Journal of Jilin University: Science Edition, 2010, 48(2): 277-283.) [7] Luo N, Zuo W L, Yuan F Y, et al. Using Ontology Semantics to Improve Text Documents Clustering[J]. Journal of Southeast University: English Edition, 2006, 22(3): 370-374. [8] Hensman S. Construction of Conceptual Graph Representation of Texts[C].In: Proceedings of the Student Research Workshop at HLT-NAACL, Boston, USA. 2004:49-54. [9] 明均仁. 基于本体图的文本聚类模型研究[J]. 情报科学, 2013, 31(2): 29-33. (Ming Junren. Research on Text Clustering Model Based on Ontology Graph[J]. Information Science, 2013, 31(2): 29-33.) [10] Hotho A, Staab S, Stumme G. Ontology-based Text Document Clustering[J]. Kunstliche Intelligenz, 2002, 16(4): 48-54. [11] 张玉峰, 何超. 基于领域本体的竞争情报聚类分析研究[J]. 情报科学, 2011, 29(11): 1613-1615. (Zhang Yufeng, He Chao. Reaserch on Competitive Intelligence Clustering Analysis Based on Domain Ontology[J]. Information Science, 2011, 29(11): 1613-1615.) [12] 龚光明, 王薇, 蒋艳辉, 等. 基于领域本体的文本资料聚类算法改进研究[J]. 情报科学, 2013, 31(6): 129-134. (Gong Guangming, Wang Wei, Jiang Yanhui, et al. Improvement of Texts Clustering Algorithm Based on the Domain-Ontology[J]. Information Science, 2013, 31(6): 129-134.) [13] 孙海霞, 钱庆, 成颖. 基于本体的语义相似度计算方法研究综述[J]. 现代图书情报技术, 2010(1): 51-56. (Sun Haixia, Qian Qing, Cheng Ying. Review of Ontology-based Semantic Similarity Measuring[J]. New Technology of Library and Information Service, 2010(1): 51-56.) [14] 赵捧未, 袁颖. 基于领域本体的语义相似度计算方法研究[J]. 科技情报开发与经济, 2010, 20(8): 74-77. (Zhao Pengwei, Yuan Ying. Research on Semantic Similarity Computing Methods Based on Domain-Ontology[J]. Sci-Tech Information Development & Economy, 2010, 20(8): 74-77.) [15] 吕刚, 郑诚. 基于加权的本体相似度计算方法[J]. 计算机工程与设计, 2010, 31(5): 1093-1095. (Lv Gang, Zheng Cheng. Method of Ontology Similarity Calculation Based on Weighted[J]. Computer Engineering and Design, 2010, 31(5): 1093-1095.) [16] 谢红薇, 颜小林, 余雪丽. 基于本体的Web页面聚类研究[J]. 计算机科学, 2008, 35(9): 153-155. (Xie Hongwei, Yan Xiaolin, Yu Xueli. Research on Web Page Clustering Based on Ontology[J]. Computer Science, 2008, 35(9): 153-155.) [17] 王刚, 邱玉辉, 蒲国林. 一个基于语义元的相似度计算方法研究[J]. 计算机应用研究, 2008, 25(11): 3253-3255. (Wang Gang, Qiu Yuhui, Pu Guolin. Research on Similarity Based on Semantic Unit[J]. Application Research of Computers, 2008, 25(11): 3253-3255.) [18] 王刚, 邱玉辉. 基于本体及相似度的文本聚类研究[J]. 计算机应用研究, 2010, 27(7): 2494-2497. (Wang Gang, Qiu Yuhui. Study on Text Clustering Based on Ontology and Similarity[J]. Application Research of Computers, 2010, 27(7): 2494-2497.) [19] Basili R, Cammisa M, Moschitti A. A Semantic Kernel to Classify Texts with Very Few Training Examples[J]. Informatica, 2006, 30(2): 163-172. [20] Zhang L, Wang Z. Ontology-based Clustering Algorithm with Feature Weights[J]. Journal of Computational Information Systems, 2010, 6(9): 2959-2966. [21] 张玉峰, 何超, 王志芳, 等. 融合语义聚类的企业竞争力影响因素分析研究[J]. 现代图书情报技术, 2012(9): 49-55. (Zhang Yufeng, He Chao, Wang Zhifang, et al. Research on Enterprise Competitiveness Factor Analysis Combining Semantic Clustering[J]. New Technology of Library and Information Service, 2012(9): 49-55.) [22] 王晓东, 郭雷, 方俊, 等. 一种基于本体的抽象度可调文档聚类[J]. 计算机工程与应用, 2007, 43(29): 172-175. (Wang Xiaodong, Guo Lei, Fang Jun, et al. Ontology-based Adjustable Text Clustering Using Abstract Degree of Concept[J]. Computer Engineering and Applications, 2007, 43(29): 172-175.) [23] 林利. 基于本体的文本聚类的应用研究[D]. 天津:天津大学, 2012. (Lin Li. Research and Application of Document Clustering Based on Ontology[D]. Tianjin: Tianjin University, 2012.) [24] 张爱琦, 左万利, 王英, 等. 基于多个领域本体的文本层次被定义聚类方法[J]. 计算机科学, 2010, 37(3): 199-204. (Zhang Aiqi, Zuo Wanli, Wang Ying, et al. Text Hierarchical Clustering Based on Several Domain Ontologies[J]. Computer Science, 2010, 37(3): 199-204.) [25] Richardon R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words[EB/OL].[2013-02-23]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.4773&rep=repl&type=pdf. [26] 高茂庭. 文本聚类分析若干问题研究[D]. 天津:天津大学, 2007. (Gao Maoting. Study on Several Issues of Text Clustering[D]. Tianjin: Tianjin University, 2007.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|