Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model
Xie Xingyu1,Yu Bengong1,2()
1School of Management, Hefei University of Technology, Hefei 230009, China 2Key Laboratory of Process Optimization & Intelligent Decision-making of Ministry of Education, Hefei University of Technology, Hefei 230009, China
[Objective] This paper designs a text classification method based on the BERT model and multi-channel feature extraction, aiming to accurately conduct automatic classification for e-commence comments. The new model will also address the issues of polysemy and sparse information of comments from public online forums and enterprise data warehouses. [Methods] First, we used BERT's TextCNN to reduce the polysemy of Chinese words. Then, our model utilized the BERT linkage Bi-LSTM channel to capture the long-distance context semantics. Third, we used BERT's fine-tuning mechanism to adjust the word vector coding with the extracted features. Finally, the model fused the feature vectors and finished the text classification. [Results] The accuracy of the MFFMB (Multi-Features Fusion Model BERT-based) reached 90.07% on the public data sets of e-commerce comments. Compared with the popular baseline models, the accuracy of the proposed one was improved by 2.36, 8.55, 4.61 and 5.11 percentage points. Meanwhile, combining the BERT and attention mechanism improved our models' accuracy by 1.48 and 4.81 percentage points than their best baseline counterparts. [Limitations] The attention mechanism was only used with the BiLSTM channel. Future research is needed to examine our model with more data sets. [Conclusions] The proposed model could effectively improve the accuracy of text classification.
( Sun Yi, Qiu Hangping, Zheng Yu, et al. Knowledge Enhancement for Pre-trained Language Models: A Survey[J]. Journal of Chinese Information Processing, 2021, 35(7):10-29.)
( Huang Jinjie, Lin Jiangquan, He Yongjun, et al. Chinese Short Text Classification Algorithm Based on Local Semantics and Context[J]. Computer Engineering and Applications, 2021, 57(6):94-100.)
( Zheng Fei, Wei Dehao, Huang Sheng. Text Classification Method Based on LDA and Deep Learning[J]. Computer Engineering and Design, 2020, 41(8):2184-2189.)
( Zhu Xiaoliang, Shi Yundong. Automatic Classification Model of Composition Material in Primary School Based on Textrank and Char-level CNN[J]. Computer Applications and Software, 2019, 36(1):220-226.)
( Yang Yuting, Wang Mingyang, Tian Xianyun, et al. Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016, 35(2):151-156.)
( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019, 3(12):21-29.)
( Duan Dandan, Tang Jiashan, Wen Yong, et al. Chinese Short Text Classification Algorithm Based on BERT Model[J]. Computer Engineering, 2021, 47(1):79-86.)
( Du Lin, Cao Dong, Lin Shuyuan, et al. Extraction and Automatic Classification of TCM Medical Records Based on Attention Mechanism of BERT and Bi-LSTM[J]. Computer Science, 2020, 47(S2):416-420.)
( Xie Runzhong, Li Ye. Text Sentiment Classification Model Based on BERT and Dual Channel Attention[J]. Journal of Data Acquisition and Processing, 2020, 35(4):642-652.)
( Weng Chaodong, Zeng Cheng, Ren Junwei, et al. Patent Text Classification Based on ALBERT and Bidirectional Gated Recurrent Unit[J]. Journal of Computer Applications, 2021, 41(2):407-412.)
( Yu Tongrui, Jin Ran, Han Xiaozhen, et al. Review of Pre-training Models for Natural Language Processing[J]. Computer Engineering and Applications, 2020, 56(23):12-22.)
( Yu Bengong, Chen Yangnan, Yang Ying. Classifying Short Text Complaints with nBD-SVM Model[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):77-85.)
( Ge Xiaowei, Li Kaixia, Cheng Ming. Text Classification of Nursing Adverse Events Based on CNN-SVM[J]. Computer Engineering & Science, 2020, 42(1):161-166.)
( Wang Haitao, Song Wen, Wang Hui. Text Classification Method Based on Hybrid Model of LSTM and CNN[J]. Journal of Chinese Computer Systems, 2020, 41(6):1163-1168.)
( Tian Zihan, Li Xin. Research on Chinese Event Detection Method Based on BERT-CRF Model[J]. Computer Engineering and Applications, 2021, 57(11):135-139.)
( Li Xinlei, Wang Hao, Liu Xiaomin, et al. Comparing Text Vector Generators for Weibo Short Text Classification[J]. Data Analysis and Knowledge Discovery, 2018, 2(8):41-50.)
( Song Ming, Liu Yanlong. Application and Optimization of Bert in Sentiment Classification of Weibo Short Text[J]. Journal of Chinese Computer Systems, 2021, 42(4):714-718.)
[21]
Kim Y. Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Palo Alto,USA: AAAI Press, 2014: 1746-1751.
[22]
Guo B, Zhang C, Liu J, et al. Improving Text Classification with Weighted Word Embeddings via a Multi-channel TextCNN Model[J]. Neurocomputing, 2019, 363:366-374.
doi: 10.1016/j.neucom.2019.07.052
[23]
Li H. Deep Learning for Natural Language Processing: Advantages and Challenges[J]. National Science Review, 2018, 5(1):24-26.
doi: 10.1093/nsr/nwx110
[24]
Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto,USA: AAAI Press, 2015: 2267-2273.
( Tian Yuan, Ma Wen. Attention-BiLSTM-based Fault Text Classification for Power Grid Equipment[J]. Journal of Computer Applications, 2020, 40(S2):24-29.)
( Yao Miao, Yang Wenzhong, Yuan Tingting, et al. Short Text Classification Algorithm of Self-attention Mechanism[J]. Computer Engineering and Design, 2020, 41(6):1592-1598.)
( Deng Yu, Li Xiaoyu, Cui Jian, et al. Multi-head Attention Memory Network for Short Text Sentiment Classification[J]. Journal of Computer Applications, 2021, 41(11):3132-3138.)
( Lu Jingyun, Gong Yu. Text Sentiment Classification Model Based on Self-attention and Expanded Convolutional Neural Network[J]. Computer Engineering and Design, 2020, 41(6):1645-1651.)
[29]
Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 207-212.
[30]
Cho K, Van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Palo Alto, USA: AAAI Press, 2014: 1724-1734.
( Li Qihang, Liao Wei, Meng Jingwen. Dual-channel DAC-RNN Text Categorization Model Based on Attention Mechanism[J/OL]. Computer Engineering and Application, 2021-04-21.