[Objective] This paper tries to improve the quality of text representation, and correlate contents with text label vectors, aiming to improve the classification results. [Methods] Firstly, we modified the keyword extraction method (KE). We used the keyword vectors to represent the text, and adopted a category label representation algorithm (CLR) to create the text vectors. Then, we employed the attention-based capsule network (Attention-Capsnet) as the classifier, to construct the KACC (KE-Attention-Capsnet-CLR) model. Finally, we compared our classification results with other methods. [Results] KACC model effectively improved the data quality, which led to better Precision, Recall and F-Measure than existing models. The classification precision reached 97.4%. [Limitations] The experimental data size needs to be expanded, and more research is needed to examine the category discrimination rules with other corpuses. [Conclusions] KACC model is an effective classification model for texts.
( Jiang Wei, Jin Zhong . Text Classification Based on Phrase Attention Mechanism[J]. Journal of Chinese Information Processing, 2018,32(2):102-109, 119.)
( Sun Fei, Guo Jiafeng, Lan Yanyan , et al. Supervised Explicit Semantic Representation for Text Categorization[J]. Journal of Data Acquisition and Processing, 2017,32(3):550-558.)
[3]
Salton G, Yu C T. On the Construction of Effective Vocabularies for Information Retrieval [C]// Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval. ACM, 1973: 48-60.
[4]
杨凯艳 . 基于改进的TFIDF关键词自动提取算法研究[D]. 湘潭: 湘潭大学, 2015.
[4]
( Yang Kaiyan . Research on Automatic Keyword Extraction Algorithm Based on Improved TFIDF[D]. Xiangtan:Xiangtan University, 2015.)
[5]
程岚岚 . 面向领域的中文搜索引擎若干关键技术研究[D]. 天津: 天津大学, 2006.
[5]
( Cheng Lanlan . The Study of Key Technologies for Chinese Domain-Oriented Search Engine[D]. Tianjin: Tianjin University, 2006.)
[6]
李华灿 . 基于统计与协同过滤的关键词提取研究[D]. 西安: 西安电子科技大学, 2015.
[6]
( Li Huacan . Keyword Extraction Base on Statistical and Collaborative Filtering[D]. Xi’an: Xidian University, 2015.)
( Xie Jin . Chinese Keyword Extraction Method Based on Word Span and Its Application in Text Classification[D]. Hangzhou: Zhejiang University of Technology, 2011.)
( Chen Kai, Huang Yinglai, Gao Wentao , et al. An Improved Naive Bayesian Text Classification Algorithm Based on Weighted Features and Its Complementary Set[J]. Journal of Harbin University of Science and Technology, 2018,23(4):69-74.)
( Yao Quanzhu, Song Zhili, Peng Cheng . Research on Text Categorization Based on LDA[J]. Computer Engineering and Applications, 2011,47(13):150-153.)
doi: 10.3778/j.issn.1002-8331.2011.13.043
[10]
Routray S, Ray A K, Mishra C , et al. Efficient Hybrid Image Denoising Scheme Based on SVM Classification[J]. Optik, 2018,157:503-511.
( Wei Yong . A Text Classification Method Based on Associative Semantics and Convolution Neural Network[J]. Control Engineering of China, 2018,25(2):367-370.)
( Lu Ling, Yang Wu, Wang Yuanlun , et al. Long Text Classification Combined with Attention Mechanism[J]. Journal of Computer Applications, 2018,38(5):1272-1277.)
[14]
Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules [C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017: 3856-3866.
[15]
Afshar P, Mohammadi A, Plataniotis K N. Brain Tumor Type Classification via Capsule Networks [C]// Proceedings of the 25th IEEE International Conference on Image Processing. 2018: 3129-3133.
[16]
Zhao Z, Wu Y. Attention-based Convolutional Neural Networks for Sentence Classification [C]// Proceedings of the 2016 Annual Conference of the International Speech Communication Association, San Francisico, CA, USA. ISCA, 2016: 705-709.