|
|
Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model |
Xie Xingyu1,Yu Bengong1,2( ) |
1School of Management, Hefei University of Technology, Hefei 230009, China 2Key Laboratory of Process Optimization & Intelligent Decision-making of Ministry of Education, Hefei University of Technology, Hefei 230009, China |
|
|
Abstract [Objective] This paper designs a text classification method based on the BERT model and multi-channel feature extraction, aiming to accurately conduct automatic classification for e-commence comments. The new model will also address the issues of polysemy and sparse information of comments from public online forums and enterprise data warehouses. [Methods] First, we used BERT's TextCNN to reduce the polysemy of Chinese words. Then, our model utilized the BERT linkage Bi-LSTM channel to capture the long-distance context semantics. Third, we used BERT's fine-tuning mechanism to adjust the word vector coding with the extracted features. Finally, the model fused the feature vectors and finished the text classification. [Results] The accuracy of the MFFMB (Multi-Features Fusion Model BERT-based) reached 90.07% on the public data sets of e-commerce comments. Compared with the popular baseline models, the accuracy of the proposed one was improved by 2.36, 8.55, 4.61 and 5.11 percentage points. Meanwhile, combining the BERT and attention mechanism improved our models' accuracy by 1.48 and 4.81 percentage points than their best baseline counterparts. [Limitations] The attention mechanism was only used with the BiLSTM channel. Future research is needed to examine our model with more data sets. [Conclusions] The proposed model could effectively improve the accuracy of text classification.
|
Received: 19 May 2021
Published: 22 February 2022
|
|
Fund:National Natural Science Foundation of China(71671057) |
Corresponding Authors:
Yu Bengong,ORCID:0000-0003-4170-2335
E-mail: bgyu19@163.com
|
[1] |
王婷, 杨文忠. 文本情感分析方法研究综述[J]. 计算机工程与应用, 2021, 57(12):11-24.
|
[1] |
( Wang Ting, Yang Wenzhong. Review of Text Sentiment Analysis Methods[J]. Computer Engineering and Applications, 2021, 57(12):11-24.)
|
[2] |
孙毅, 裘杭萍, 郑雨, 等. 自然语言预训练模型知识增强方法综述[J]. 中文信息学报, 2021, 35(7):10-29.
|
[2] |
( Sun Yi, Qiu Hangping, Zheng Yu, et al. Knowledge Enhancement for Pre-trained Language Models: A Survey[J]. Journal of Chinese Information Processing, 2021, 35(7):10-29.)
|
[3] |
黄金杰, 蔺江全, 何勇军, 等. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6):94-100.
|
[3] |
( Huang Jinjie, Lin Jiangquan, He Yongjun, et al. Chinese Short Text Classification Algorithm Based on Local Semantics and Context[J]. Computer Engineering and Applications, 2021, 57(6):94-100.)
|
[4] |
郑飞, 韦德壕, 黄胜. 基于LDA和深度学习的文本分类方法[J]. 计算机工程与设计, 2020, 41(8):2184-2189.
|
[4] |
( Zheng Fei, Wei Dehao, Huang Sheng. Text Classification Method Based on LDA and Deep Learning[J]. Computer Engineering and Design, 2020, 41(8):2184-2189.)
|
[5] |
朱晓亮, 石昀东. 基于TextRank和字符级卷积神经网络的小学作文素材自动分类模型研究[J]. 计算机应用与软件, 2019, 36(1):220-226.
|
[5] |
( Zhu Xiaoliang, Shi Yundong. Automatic Classification Model of Composition Material in Primary School Based on Textrank and Char-level CNN[J]. Computer Applications and Software, 2019, 36(1):220-226.)
|
[6] |
张谦, 高章敏, 刘嘉勇. 基于Word2vec的微博短文本分类研究[J]. 信息网络安全, 2017(1):57-62.
|
[6] |
( Zhang Qian, Gao Zhangmin, Liu Jiayong. Research of Weibo Short Text Classification Based on Word2vec[J]. Netinfo Security, 2017(1):57-62.)
|
[7] |
杨宇婷, 王名扬, 田宪允, 等. 基于文档分布式表达的新浪微博情感分类研究[J]. 情报杂志, 2016, 35(2):151-156.
|
[7] |
( Yang Yuting, Wang Mingyang, Tian Xianyun, et al. Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016, 35(2):151-156.)
|
[8] |
邵云飞, 刘东苏. 基于类别特征扩展的短文本分类方法研究[J]. 数据分析与知识发现, 2019, 3(9):60-67.
|
[8] |
( Shao Yunfei, Liu Dongsu. Classifying Short-texts with Class Feature Extension[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):60-67.)
|
[9] |
陶志勇, 李小兵, 刘影, 等. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019, 3(12):21-29.
|
[9] |
( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019, 3(12):21-29.)
|
[10] |
段丹丹, 唐加山, 温勇, 等. 基于BERT模型的中文短文本分类算法[J]. 计算机工程, 2021, 47(1):79-86.
|
[10] |
( Duan Dandan, Tang Jiashan, Wen Yong, et al. Chinese Short Text Classification Algorithm Based on BERT Model[J]. Computer Engineering, 2021, 47(1):79-86.)
|
[11] |
杜琳, 曹东, 林树元, 等. 基于BERT与Bi-LSTM融合注意力机制的中医病历文本的提取与自动分类[J]. 计算机科学, 2020, 47(S2):416-420.
|
[11] |
( Du Lin, Cao Dong, Lin Shuyuan, et al. Extraction and Automatic Classification of TCM Medical Records Based on Attention Mechanism of BERT and Bi-LSTM[J]. Computer Science, 2020, 47(S2):416-420.)
|
[12] |
谢润忠, 李烨. 基于BERT和双通道注意力的文本情感分类模型[J]. 数据采集与处理, 2020, 35(4):642-652.
|
[12] |
( Xie Runzhong, Li Ye. Text Sentiment Classification Model Based on BERT and Dual Channel Attention[J]. Journal of Data Acquisition and Processing, 2020, 35(4):642-652.)
|
[13] |
温超东, 曾诚, 任俊伟, 等. 结合ALBERT和双向门控循环单元的专利文本分类[J]. 计算机应用, 2021, 41(2):407-412.
|
[13] |
( Weng Chaodong, Zeng Cheng, Ren Junwei, et al. Patent Text Classification Based on ALBERT and Bidirectional Gated Recurrent Unit[J]. Journal of Computer Applications, 2021, 41(2):407-412.)
|
[14] |
余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23):12-22.
|
[14] |
( Yu Tongrui, Jin Ran, Han Xiaozhen, et al. Review of Pre-training Models for Natural Language Processing[J]. Computer Engineering and Applications, 2020, 56(23):12-22.)
|
[15] |
余本功, 陈杨楠, 杨颖. 基于nBD-SVM模型的投诉短文本分类[J]. 数据分析与知识发现, 2019, 3(5):77-85.
|
[15] |
( Yu Bengong, Chen Yangnan, Yang Ying. Classifying Short Text Complaints with nBD-SVM Model[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):77-85.)
|
[16] |
葛晓伟, 李凯霞, 程铭. 基于CNN-SVM的护理不良事件文本分类研究[J]. 计算机工程与科学, 2020, 42(1):161-166.
|
[16] |
( Ge Xiaowei, Li Kaixia, Cheng Ming. Text Classification of Nursing Adverse Events Based on CNN-SVM[J]. Computer Engineering & Science, 2020, 42(1):161-166.)
|
[17] |
王海涛, 宋文, 王辉. 一种基于LSTM和CNN混合模型的文本分类方法[J]. 小型微型计算机系统, 2020, 41(6):1163-1168.
|
[17] |
( Wang Haitao, Song Wen, Wang Hui. Text Classification Method Based on Hybrid Model of LSTM and CNN[J]. Journal of Chinese Computer Systems, 2020, 41(6):1163-1168.)
|
[18] |
田梓函, 李欣. 基于BERT-CRF模型的中文事件检测方法研究[J]. 计算机工程与应用, 2021, 57(11):135-139.
|
[18] |
( Tian Zihan, Li Xin. Research on Chinese Event Detection Method Based on BERT-CRF Model[J]. Computer Engineering and Applications, 2021, 57(11):135-139.)
|
[19] |
李心蕾, 王昊, 刘小敏, 等. 面向微博短文本分类的文本向量化方法比较研究[J]. 数据分析与知识发现, 2018, 2(8):41-50.
|
[19] |
( Li Xinlei, Wang Hao, Liu Xiaomin, et al. Comparing Text Vector Generators for Weibo Short Text Classification[J]. Data Analysis and Knowledge Discovery, 2018, 2(8):41-50.)
|
[20] |
宋明, 刘彦隆. Bert在微博短文本情感分类中的应用与优化[J]. 小型微型计算机系统, 2021, 42(4):714-718.
|
[20] |
( Song Ming, Liu Yanlong. Application and Optimization of Bert in Sentiment Classification of Weibo Short Text[J]. Journal of Chinese Computer Systems, 2021, 42(4):714-718.)
|
[21] |
Kim Y. Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Palo Alto,USA: AAAI Press, 2014: 1746-1751.
|
[22] |
Guo B, Zhang C, Liu J, et al. Improving Text Classification with Weighted Word Embeddings via a Multi-channel TextCNN Model[J]. Neurocomputing, 2019, 363:366-374.
doi: 10.1016/j.neucom.2019.07.052
|
[23] |
Li H. Deep Learning for Natural Language Processing: Advantages and Challenges[J]. National Science Review, 2018, 5(1):24-26.
doi: 10.1093/nsr/nwx110
|
[24] |
Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto,USA: AAAI Press, 2015: 2267-2273.
|
[25] |
田园, 马文. 基于Attention-BiLSTM的电网设备故障文本分类[J]. 计算机应用, 2020, 40(S2):24-29.
|
[25] |
( Tian Yuan, Ma Wen. Attention-BiLSTM-based Fault Text Classification for Power Grid Equipment[J]. Journal of Computer Applications, 2020, 40(S2):24-29.)
|
[26] |
姚苗, 杨文忠, 袁婷婷, 等. 自注意力机制的短文本分类算法[J]. 计算机工程与设计, 2020, 41(6):1592-1598.
|
[26] |
( Yao Miao, Yang Wenzhong, Yuan Tingting, et al. Short Text Classification Algorithm of Self-attention Mechanism[J]. Computer Engineering and Design, 2020, 41(6):1592-1598.)
|
[27] |
邓钰, 李晓瑜, 崔建, 等. 用于短文本情感分类的多头注意力记忆网络[J]. 计算机应用, 2021, 41(11):3132-3138.
|
[27] |
( Deng Yu, Li Xiaoyu, Cui Jian, et al. Multi-head Attention Memory Network for Short Text Sentiment Classification[J]. Journal of Computer Applications, 2021, 41(11):3132-3138.)
|
[28] |
陆敬筠, 龚玉. 基于自注意力的扩展卷积神经网络情感分类[J]. 计算机工程与设计, 2020, 41(6):1645-1651.
|
[28] |
( Lu Jingyun, Gong Yu. Text Sentiment Classification Model Based on Self-attention and Expanded Convolutional Neural Network[J]. Computer Engineering and Design, 2020, 41(6):1645-1651.)
|
[29] |
Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 207-212.
|
[30] |
Cho K, Van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Palo Alto, USA: AAAI Press, 2014: 1724-1734.
|
[31] |
李启行, 廖薇, 孟静雯 基于注意力机制的双通道DAC-RNN文本分类模型[J/OL]. 计算机工程与应用, 2021-04-21.
|
[31] |
( Li Qihang, Liao Wei, Meng Jingwen. Dual-channel DAC-RNN Text Categorization Model Based on Attention Mechanism[J/OL]. Computer Engineering and Application, 2021-04-21.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|