|
|
Classifying Texts with KACC Model |
Yuman Li,Zhibo Chen(),Fu Xu |
School of Information Science & Technology, Beijing Forestry University, Beijing 100083, China |
|
|
Abstract [Objective] This paper tries to improve the quality of text representation, and correlate contents with text label vectors, aiming to improve the classification results. [Methods] Firstly, we modified the keyword extraction method (KE). We used the keyword vectors to represent the text, and adopted a category label representation algorithm (CLR) to create the text vectors. Then, we employed the attention-based capsule network (Attention-Capsnet) as the classifier, to construct the KACC (KE-Attention-Capsnet-CLR) model. Finally, we compared our classification results with other methods. [Results] KACC model effectively improved the data quality, which led to better Precision, Recall and F-Measure than existing models. The classification precision reached 97.4%. [Limitations] The experimental data size needs to be expanded, and more research is needed to examine the category discrimination rules with other corpuses. [Conclusions] KACC model is an effective classification model for texts.
|
Received: 18 January 2019
Published: 25 November 2019
|
|
Corresponding Authors:
Zhibo Chen
E-mail: zhibo@bjfu.edu.cn
|
[1] |
江伟, 金忠 . 基于短语注意机制的文本分类[J]. 中文信息学报, 2018,32(2):102-109, 119.
|
[1] |
( Jiang Wei, Jin Zhong . Text Classification Based on Phrase Attention Mechanism[J]. Journal of Chinese Information Processing, 2018,32(2):102-109, 119.)
|
[2] |
孙飞, 郭嘉丰, 兰艳艳 , 等. 面向文本分类的有监督显式语义表示[J]. 数据采集与处理, 2017,32(3):550-558.
|
[2] |
( Sun Fei, Guo Jiafeng, Lan Yanyan , et al. Supervised Explicit Semantic Representation for Text Categorization[J]. Journal of Data Acquisition and Processing, 2017,32(3):550-558.)
|
[3] |
Salton G, Yu C T. On the Construction of Effective Vocabularies for Information Retrieval [C]// Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval. ACM, 1973: 48-60.
|
[4] |
杨凯艳 . 基于改进的TFIDF关键词自动提取算法研究[D]. 湘潭: 湘潭大学, 2015.
|
[4] |
( Yang Kaiyan . Research on Automatic Keyword Extraction Algorithm Based on Improved TFIDF[D]. Xiangtan:Xiangtan University, 2015.)
|
[5] |
程岚岚 . 面向领域的中文搜索引擎若干关键技术研究[D]. 天津: 天津大学, 2006.
|
[5] |
( Cheng Lanlan . The Study of Key Technologies for Chinese Domain-Oriented Search Engine[D]. Tianjin: Tianjin University, 2006.)
|
[6] |
李华灿 . 基于统计与协同过滤的关键词提取研究[D]. 西安: 西安电子科技大学, 2015.
|
[6] |
( Li Huacan . Keyword Extraction Base on Statistical and Collaborative Filtering[D]. Xi’an: Xidian University, 2015.)
|
[7] |
谢晋 . 基于词跨度的中文文本关键词提取及在文本分类中的应用[D]. 杭州: 浙江工业大学, 2011.
|
[7] |
( Xie Jin . Chinese Keyword Extraction Method Based on Word Span and Its Application in Text Classification[D]. Hangzhou: Zhejiang University of Technology, 2011.)
|
[8] |
陈凯, 黄英来, 高文韬 , 等. 一种基于属性加权补集的朴素贝叶斯文本分类算法[J]. 哈尔滨理工大学学报, 2018,23(4):69-74.
|
[8] |
( Chen Kai, Huang Yinglai, Gao Wentao , et al. An Improved Naive Bayesian Text Classification Algorithm Based on Weighted Features and Its Complementary Set[J]. Journal of Harbin University of Science and Technology, 2018,23(4):69-74.)
|
[9] |
姚全珠, 宋志理, 彭程 . 基于LDA模型的文本分类研究[J]. 计算机工程与应用, 2011,47(13):150-153.
doi: 10.3778/j.issn.1002-8331.2011.13.043
|
[9] |
( Yao Quanzhu, Song Zhili, Peng Cheng . Research on Text Categorization Based on LDA[J]. Computer Engineering and Applications, 2011,47(13):150-153.)
doi: 10.3778/j.issn.1002-8331.2011.13.043
|
[10] |
Routray S, Ray A K, Mishra C , et al. Efficient Hybrid Image Denoising Scheme Based on SVM Classification[J]. Optik, 2018,157:503-511.
|
[11] |
魏勇 . 关联语义结合卷积神经网络的文本分类方法[J]. 控制工程, 2018,25(2):367-370.
|
[11] |
( Wei Yong . A Text Classification Method Based on Associative Semantics and Convolution Neural Network[J]. Control Engineering of China, 2018,25(2):367-370.)
|
[12] |
谢志峰, 吴佳萍, 马利庄 . 基于卷积神经网络的中文财经新闻分类方法[J]. 山东大学学报: 工学版, 2018,48(3):34-39, 66.
|
[12] |
( Xie Zhifeng, Wu Jiaping, Ma Lizhuang . Chinese Financial News Classification Method Based on Convolutional Neural Network[J]. Journal of Shandong University: Engineering Science, 2018,48(3):34-39, 66.)
|
[13] |
卢玲, 杨武, 王远伦 , 等. 结合注意力机制的长文本分类方法[J]. 计算机应用, 2018,38(5):1272-1277.
|
[13] |
( Lu Ling, Yang Wu, Wang Yuanlun , et al. Long Text Classification Combined with Attention Mechanism[J]. Journal of Computer Applications, 2018,38(5):1272-1277.)
|
[14] |
Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules [C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017: 3856-3866.
|
[15] |
Afshar P, Mohammadi A, Plataniotis K N. Brain Tumor Type Classification via Capsule Networks [C]// Proceedings of the 25th IEEE International Conference on Image Processing. 2018: 3129-3133.
|
[16] |
Zhao Z, Wu Y. Attention-based Convolutional Neural Networks for Sentence Classification [C]// Proceedings of the 2016 Annual Conference of the International Speech Communication Association, San Francisico, CA, USA. ISCA, 2016: 705-709.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|