|
|
Research on Chinese Short Text Classification Based on Wikipedia |
Fan Yunjie, Liu Huailiang |
School of Economics and Management, Xidian University, Xi’an 710071, China |
|
|
Abstract According to the characteristics of Chinese short texts, a method of feature extension is introduced to help text classification. Firstly, related concepts are extracted from Wikipedia and concept associativity is calculated based on the combination of statistical laws and categories. Then the semantic related concept sets are built to extend the eigenvector of short text in order to supply its semantic features. The contrast experiment shows that the algorithm of short text classification based on Wikipedia can get a better classified effect.
|
Received: 01 February 2012
Published: 19 April 2012
|
|
[1] 王细薇, 沈云琴. 中文短文本分类方法研究[J]. 现代计算机, 2010(7): 28-31.(Wang Xiwei, Shen Yunqin. Research on Chinese Short Text Classification Method[J]. Modern Computer, 2010(7):28-31.)[2] Metaler D, Dumais S C, Meek C. Similarity Measures for Short Segments of Text[C]. In: Proceedings of the 29th European Conference on Information Retrieval. Berlin: Springer-Verlag, 2007.[3] Sahami M, Heilman T D. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]. In: Proceedings of the 15th International World Wide Web Conference Committee (IW3C2), Edinburgh, Scotland. New York: ACM Press, 2006: 377-386.[4] Hynek J, Jezek K, Rohlik O. Short Document Categorization-Itemsets Method[C]. In : Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, Workshop Machine Learning and Textual Information Access, Lyon, France. 2000:14-19.[5] Zelikovitz S, Transductive M F. Learning for Short-Text Classification Problem Using Latent Semantic Indexing International[J]. Journal of Pattern Recognition and Artificial Intelligence, 2005, 19(2):143-163.[6] 王鹏, 樊兴华. 中文文本分类中利用依存关系的实验研究[J]. 计算机工程与应用, 2010, 46(3): 131-133.(Wang Peng, Fan Xinghua. Study on Chinese Text Classification Based on Dependency Relation[J]. Computer Engineering and Applications, 2010, 46(3): 131-133.)[7] 宁亚辉, 樊兴华, 吴渝. 基于领域词语本体的短文本分类[J]. 计算机科学, 2009,36(3): 142-145.(Ning Yahui, Fan Xinghua, Wu Yu. Short Text Classification Based on Domain Word Ontology[J]. Computer Science, 2009,36(3): 142-145.)[8] 王盛, 樊兴华, 陈现麟. 利用上下位关系的中文短文本分类[J]. 计算机应用, 2010,30(3): 603-611.(Wang Sheng, Fan Xinghua, Chen Xianlin. Chinese Short Text Classification Based on Hyponymy Relation[J]. Journal of Computer Application, 2010,30(3): 603-611.)[9] 张海粟, 马大明, 邓智龙. 基于维基百科的语义知识库及其构建方法研究[J]. 计算机应用研究, 2011,28(8): 2807-2811. (Zhang Haisu, Ma Daming, Deng Zhilong. Semantic Knowledge Bases Construction Based on Wikipedia[J]. Application Research of Computers, 2011,28(8): 2807-2811.)[10] Wang P, Domeniconi C. Building Semantic Kernels for Text Classification Using Wikipedia[C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,USA. ACM:New York,2008:713-721.[11] 裘江南, 秦璇, 仲秋雁. 异质知识网络相关度算法研究[J]. 情报学报, 2011,30(5): 495-502.(Qiu Jiangnan, Qin Xuan, Zhong Qiuyan. Research on Relatedness Algorithms in Heterogeneous Knowledge Network[J]. Journal of the China Society for Scientific and Technical Information, 2011,30(5): 495-502.)[12] Wikipedia[EB/OL].[2011-12-08]. http://zh.wikipedia.org.[13] 盛志超, 陶晓鹏. 基于维基百科的语义相似度计算方法[J]. 计算机工程, 2011,37(7): 193-195.(Sheng Zhichao, Tao Xiaopeng. Semantic Similarity Computing Method Based on Wikipedia[J]. Computer Engineering, 2011,37(7): 193-195.)[14] 苏小康. 基于维基百科构建语义知识库及其在文本分类领域的应用研究[D]. 武汉:华中师范大学, 2010.(Su Xiaokang. Research on Building Wikipedia Semantic Knowledge Base and Its Application in Text Classification[D]. Wuhan: Central China Normal University, 2010)[15] 王元珍, 钱铁云, 冯小年. 基于关联规则挖掘的中文文本自动分类[J]. 小型微型计算机系统, 2005, 26(8): 1380-1383.(Wang Yuanzhen, Qian Tieyun, Feng Xiaonian. Association Rules Based Automatic Chinese Text Categorization[J]. Mini-micro Systems, 2005, 26(8):1380-1383)[16] Salton G, McGillM J. Introduction to Modern Information Retrieval[M]. New York, NY, USA:McGraw Hill, 1983. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|