Please wait a minute...
Advanced Search
现代图书情报技术  2011, Vol. 27 Issue (1): 31-38    DOI: 10.11925/infotech.1003-3513.2011.01.05
  专题 本期目录 | 过刊浏览 | 高级检索 |
基于SUMO和WordNet本体集成的文本分类模型研究
胡泽文, 王效岳, 白如江
山东理工大学科技信息研究所 淄博 255049
Study on Text Classification Model Based on SUMO and WordNet Ontology Integration
Hu Zewen, Wang Xiaoyue, Bai Rujiang
Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049, China
全文: PDF(524 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

针对传统文本分类方法和目前语义分类方法中存在的问题,提出基于SUMO和WordNet本体集成的文本分类模型,该模型利用WordNet同义词集与SUMO本体概念之间的映射关系,将文档-词向量空间中的词条映射成本体中相应的概念,形成文档-概念向量空间进行文本自动分类。实验表明,该方法能够极大降低向量空间维度,提高文本分类性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
胡泽文
王效岳
白如江
关键词 SUMO本体WordNet本体集成文本分类模型词向量空间概念向量空间    
Abstract

Aiming at the existing problems in the traditional text classification methods and the current semantic classification methods, a new text classification model based on SUMO and WordNet Ontology integration is proposed. This model utilizes the mapping relations between WordNet synsets and SUMO Ontology concepts to map terms in document-words vector space into the corresponding concepts in Ontology, and forms document-concepts vector space to classify texts automatically. The experiment results show that the proposed method can greatly decrease the dimensionality of vector space and improve the text classification performance.

Key wordsSUMO Ontology    WordNet    Ontology integration    Text classification model    Word vector space    Concept vector space
收稿日期: 2010-11-02     
: 

G250 TP391

 
基金资助:

本文系国家社会科学基金一般项目“海量网络学术文献自动分类研究”(项目编号:10BTQ047)和教育部人文社会科学研究规划一般项目“基于本体集成的文本分类关键技术研究”(项目编号:09YJA870019)的研究成果之一。

引用本文:   
胡泽文, 王效岳, 白如江. 基于SUMO和WordNet本体集成的文本分类模型研究[J]. 现代图书情报技术, 2011, 27(1): 31-38.
Hu Zewen, Wang Xiaoyue, Bai Rujiang. Study on Text Classification Model Based on SUMO and WordNet Ontology Integration. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2011.01.05.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2011.01.05


[1] Bloehdorn S, Hotho A.Boosting for Text Classification with Semantic Features. In: Proceedings of the Workshop on the Mining for and from the Semantic Web at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA. 2004:70-87.

[2] Mitra V, Wang C J, Banerjee S. A Neuro-SVM Model for Text Classification Using Latent Semantic Indexing. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005, Montreal, QC, Canada. 2005:564-569.

[3] Marina L, Mark L, Slava K. Classification of Web Documents Using Concept Extraction from Ontologies. In: Proceedings of the 2nd International Workshop Autonomous Intelligent Systems: Multi-Agents and Data Mining, AIS-ADM 2007. LNAI 4476. Heidelberg: Springer-Verlag, 2007:287–292.

[4] Carpineto C, Michini C, Nicolussi R. A Concept Lattice-Based Kernel for SVM Text Classification. In: Proceedings of the 7th International Conference on Formal Concept Analysis, ICFCA 2009. LNAI 5548. Heidelberg: Springer-Verlag, 2009:237-250.

[5] Suggested Upper Merged Ontology (SUMO). http://www.ontologyportal.org/.

[6] About WordNet. http://wordnet.princeton.edu/.

[7] Ginte F, Pyysalo S, Boberg J, et al. Ontology-based Feature Transformations: A Data-driven Approach. In: Proceedings of the 4th International Conference, EsTAL 2004-Advances in Natural Language Processing. Berlin: Springer, 2004: 279-290.

[8] 李文,陈叶旺,彭鑫,等.一种有效的基于本体的词语-概念映射方法
[J]. 计算机科学 ,2010,37(10):138-142.

[9] 张剑,李春平. 基于WordNet 概念向量空间模型的文本分类
[J]. 计算机工程与应用 ,2006,42(4):174-178.

[10] Lee Y H, Tsao W J, Chu T H. Use of Ontology to Support Concept-based Text Categorization. In: Proceedings of Designing E-Business Systems: Markets, Services, and Networks - 7th Workshop on E-Business, Web 2008. Heidelberg: Springer-Verlag, 2009: 201-213.

[11] Ontology Portal- Publications. http://www.ontologyportal.org/Pubs.html#FOIS.

[12] Ahrens K, Chung S F,Huang C R. From Lexical Semantics to Conceptual Metaphors: Mapping Principle Verification with WordNet and SUMO. In: Proceedings of the 5th Chinese Lexical Semantics Worksho P(CLSW-5), Singapore. 2004:99-106.

[13] George A M. WordNet: A Lexical Database for English
[J]. Communications of the ACM, 1995, 38(11): 39-41.

[14] Pease A, Niles I, Li J. The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and Its Applications. In: Working Notes of the AAAI-2002 Workshop on Ontologies and the Semantic Web, Edmonton, Canada. 2002:2002.

[15] 于娟,党延忠.本体集成研究综述
[J]. 计算机科学 ,2008,35(7):9-13,18.

[16] The DBpedia Data Set. http://wiki.dbpedia.org/Datasets#h18-3.

[17] Reed S L, Lenat D B. Mapping Ontologies into Cyc. http://www.cyc.com/doc/white_papers/mapping-ontologies-into-cyc_v31.pdf.

[18] Image_GraphViz. http://pear.php.net/package/Image_GraphViz/download.

[19] Rapid Miner 4.6. http://rapid-i.com/downloads/tutorial/rapidminer-4.6-tu-torial.pdf.

[20] 20 Newsgroups. http://people.csail.mit.edu/jrennie/20Newsgroups/.

[1] 曲云鹏,王文玲. 一种分布式语义增强的词汇链文本表示模型构建方法[J]. 现代图书情报技术, 2016, 32(9): 34-41.
[2] 白如江, 于晓繁, 王效岳. 国内外主要本体库比较分析研究[J]. 现代图书情报技术, 2011, 27(1): 3-13.
[3] 于晓繁, 王效岳, 白如江. 本体集成方法和工具综述[J]. 现代图书情报技术, 2011, 27(1): 14-21.
[4] 王效岳, 胡泽文, 白如江. WordNet与SUMO本体之间的映射机制研究[J]. 现代图书情报技术, 2011, 27(1): 22-30.
[5] 翟东升,刘晨,欧阳轶慧. 专利信息获取分析系统设计与实现*[J]. 现代图书情报技术, 2009, 25(5): 55-60.
[6] 卢胜军,李法勇,钱建军,真溱. WCONS+:一种基于WCONS的本体集成[J]. 现代图书情报技术, 2009, 3(2): 18-22.
[7] 饶洋辉,叶良,程洁. WordNet在文本聚类中的应用研究*[J]. 现代图书情报技术, 2009, (10): 67-70.
[8] 贾君枝,董刚. 汉语框架网络本体与VerbNet、WordNet集成研究*[J]. 现代图书情报技术, 2008, 24(6): 6-10.
[9] 张会平,吕学强,施水才,李渝勤 . 基于WordNet的语义分布词典建设*[J]. 现代图书情报技术, 2007, 2(3): 55-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn