[Objective] This paper proposes a short text classification method based on category feature extension, aiming to address the issue of sparse content in short texts. [Methods] We used the improved TF-IDF model and LDA topic model to construct the keyword set and topic distribution set, which were all based on category features. Then, we expanded the content and vector representations of short texts. Finally, we classified short texts with the help of convolutional neural network. [Results] The classification precision rate of the proposed method was improved by 3.0%, and the recall rate was improved by 4.1%. [Limitations] Only examined the new method with convolutional neural network. [Conclusions] The proposed method can improve the effectiveness of categorization procedures for short texts.
( Wang Zheng, Liu Shipei, Peng Yanbing . An Essay Context Recognition Model Based on Syntax Decision Tree and SVM Algorithm[J]. Computer and Modernization, 2017(3):13-17.)
( Li Jingmei, Sun Lihua, Zhang Qiaorong , et al. Application of Navie Bayes Classifier to Text Classification[J]. Journal of Harbin Engineering University, 2003,24(1):71-74.)
( Fan Yunjie, Liu Huailiang . Research on Chinese Short Text Classification Based on Wikipedia[J]. New Technology of Library and Information Service, 2012(3):47-52.)
( Li Xiangdong, Ruan Tao, Liu Kang . Research on Automatic Classification of Various Documents Based on Wikipedia[J]. Data Analysis and Knowledge Discovery, 2017,1(10):43-52.)
( Ding Lianhong, Sun Bin, Zhang Hongwei . Short Text Classification Based on Knowledge Graph Extension[J]. Technology Intelligence Engineering, 2018,4(5):38-46.)
[6]
Fan X, Hu H. A New Model for Chinese Short-text Classification Considering Feature Extension [C]// Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence. 2010,2:7-11.
( Yuan Man, Ouyang Yuanxin, Xiong Zhang , et al. Short Text Feature Extension Method Based on Frequent Term Sets[J]. Journal of Southeast University: Natural Science Edition, 2014,44(2):256-260.)
[8]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
( Lv Chaozhen, Ji Donghong, Wu Feifei . Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,51(4):123-127.)
( Hu Yongjun, Jiang Jiaxin, Chang Huiyou . A New Method of Keywords Extraction for Chinese Short Text Classification[J]. New Technology of Library and Information Service, 2013(6):42-48.)
( Zhang Qun, Wang Hongjun, Wang Lunwen . Classifying Short Texts with Word Embedding and LDA Model[J]. New Technology of Library and Information Service, 2016(12):27-35.)
( Lei Shuo, Liu Xumin, Xu Weixiang . Chinese Short Text Classification Based on Word Vector Extension[J]. Computer Applications and Software, 2018,35(8):269-274.)
( Qin Shian, Li Fayun . Improved TF-IDF Method in Text Classification[J]. New Technology of Library and Information Service, 2013(10):27-30.)
[14]
Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408. 5882.
[15]
GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation (LDA) Using Gibbs Sampling for Parameter Estimation and Inference[EB/OL]. [2016-05-15].https://sourceforge.net/projects/jgibblda/.
( Huang Xianying, Xiong Liyuan, Liu Yingtao , et al. An Improved KNN Short Text Classification Algorithm Based on Category Feature Words[J]. Computer Engineering & Science, 2018,40(1):148-154.)