Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (3): 47-52    DOI: 10.11925/infotech.1003-3513.2012.03.08
Current Issue | Archive | Adv Search |
Research on Chinese Short Text Classification Based on Wikipedia
Fan Yunjie, Liu Huailiang
School of Economics and Management, Xidian University, Xi’an 710071, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  According to the characteristics of Chinese short texts, a method of feature extension is introduced to help text classification. Firstly, related concepts are extracted from Wikipedia and concept associativity is calculated based on the combination of statistical laws and categories. Then the semantic related concept sets are built to extend the eigenvector of short text in order to supply its semantic features. The contrast experiment shows that the algorithm of short text classification based on Wikipedia can get a better classified effect.
Key wordsShort text      Wikipedia      Text classification      Feature extension     
Received: 01 February 2012      Published: 19 April 2012
: 

TP391.1

 

Cite this article:

Fan Yunjie, Liu Huailiang. Research on Chinese Short Text Classification Based on Wikipedia. New Technology of Library and Information Service, 2012, 28(3): 47-52.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.03.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I3/47

[1] 王细薇, 沈云琴. 中文短文本分类方法研究[J]. 现代计算机, 2010(7): 28-31.(Wang Xiwei, Shen Yunqin. Research on Chinese Short Text Classification Method[J]. Modern Computer, 2010(7):28-31.)

[2] Metaler D, Dumais S C, Meek C. Similarity Measures for Short Segments of Text[C]. In: Proceedings of the 29th European Conference on Information Retrieval. Berlin: Springer-Verlag, 2007.

[3] Sahami M, Heilman T D. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]. In: Proceedings of the 15th International World Wide Web Conference Committee (IW3C2), Edinburgh, Scotland. New York: ACM Press, 2006: 377-386.

[4] Hynek J, Jezek K, Rohlik O. Short Document Categorization-Itemsets Method[C]. In : Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, Workshop Machine Learning and Textual Information Access, Lyon, France. 2000:14-19.

[5] Zelikovitz S, Transductive M F. Learning for Short-Text Classification Problem Using Latent Semantic Indexing International[J]. Journal of Pattern Recognition and Artificial Intelligence, 2005, 19(2):143-163.

[6] 王鹏, 樊兴华. 中文文本分类中利用依存关系的实验研究[J]. 计算机工程与应用, 2010, 46(3): 131-133.(Wang Peng, Fan Xinghua. Study on Chinese Text Classification Based on Dependency Relation[J]. Computer Engineering and Applications, 2010, 46(3): 131-133.)

[7] 宁亚辉, 樊兴华, 吴渝. 基于领域词语本体的短文本分类[J]. 计算机科学, 2009,36(3): 142-145.(Ning Yahui, Fan Xinghua, Wu Yu. Short Text Classification Based on Domain Word Ontology[J]. Computer Science, 2009,36(3): 142-145.)

[8] 王盛, 樊兴华, 陈现麟. 利用上下位关系的中文短文本分类[J]. 计算机应用, 2010,30(3): 603-611.(Wang Sheng, Fan Xinghua, Chen Xianlin. Chinese Short Text Classification Based on Hyponymy Relation[J]. Journal of Computer Application, 2010,30(3): 603-611.)

[9] 张海粟, 马大明, 邓智龙. 基于维基百科的语义知识库及其构建方法研究[J]. 计算机应用研究, 2011,28(8): 2807-2811. (Zhang Haisu, Ma Daming, Deng Zhilong. Semantic Knowledge Bases Construction Based on Wikipedia[J]. Application Research of Computers, 2011,28(8): 2807-2811.)

[10] Wang P, Domeniconi C. Building Semantic Kernels for Text Classification Using Wikipedia[C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,USA. ACM:New York,2008:713-721.

[11] 裘江南, 秦璇, 仲秋雁. 异质知识网络相关度算法研究[J]. 情报学报, 2011,30(5): 495-502.(Qiu Jiangnan, Qin Xuan, Zhong Qiuyan. Research on Relatedness Algorithms in Heterogeneous Knowledge Network[J]. Journal of the China Society for Scientific and Technical Information, 2011,30(5): 495-502.)

[12] Wikipedia[EB/OL].[2011-12-08]. http://zh.wikipedia.org.

[13] 盛志超, 陶晓鹏. 基于维基百科的语义相似度计算方法[J]. 计算机工程, 2011,37(7): 193-195.(Sheng Zhichao, Tao Xiaopeng. Semantic Similarity Computing Method Based on Wikipedia[J]. Computer Engineering, 2011,37(7): 193-195.)

[14] 苏小康. 基于维基百科构建语义知识库及其在文本分类领域的应用研究[D]. 武汉:华中师范大学, 2010.(Su Xiaokang. Research on Building Wikipedia Semantic Knowledge Base and Its Application in Text Classification[D]. Wuhan: Central China Normal University, 2010)

[15] 王元珍, 钱铁云, 冯小年. 基于关联规则挖掘的中文文本自动分类[J]. 小型微型计算机系统, 2005, 26(8): 1380-1383.(Wang Yuanzhen, Qian Tieyun, Feng Xiaonian. Association Rules Based Automatic Chinese Text Categorization[J]. Mini-micro Systems, 2005, 26(8):1380-1383)

[16] Salton G, McGillM J. Introduction to Modern Information Retrieval[M]. New York, NY, USA:McGraw Hill, 1983.
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[4] Wu Xu,Chen Chunxu. Detecting Topics of Group Chats with Multiple Strategies[J]. 数据分析与知识发现, 2021, 5(5): 1-9.
[5] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[6] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[7] Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[8] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[9] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[10] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[11] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[12] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[13] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[14] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[15] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn