Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (5): 77-85    DOI: 10.11925/infotech.2096-3467.2018.0758
Current Issue | Archive | Adv Search |
Classifying Short Text Complaints with nBD-SVM Model
Bengong Yu1,2,Yangnan Chen1(),Ying Yang1,2
1(School of Management, Hefei University of Technology, Hefei 230009, China)
2(Key Laboratory of Process Optimization & Intelligent Decision-making, Ministry of Education, Hefei University of Technology, Hefei 230009, China)
Download: PDF(779 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to find an effective way to classify the non-structured and short-text business complaints, aiming to improve the efficiency of corporate problem solving. [Methods] We first combined the topic model and distributed representation technique to construct a SVM input space vector. Then, we integrated ensemble learning method to build the nBD-SVM text classification model. [Results] We examined the proposed model with business complaint texts and found its precision reached 81.83%, which is much higher than the traditional methods. [Limitations] We only evaluate our model with complaints from one company. [Conclusions] The proposed nBD-SVM model could process short text business complaints effectively.

Key wordsComplaint Short Text Classification      Topic Model      Word Vector      Ensemble Learning      nBD-SVM     
Received: 15 July 2018      Published: 03 July 2019

Cite this article:

Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model. Data Analysis and Knowledge Discovery, 2019, 3(5): 77-85.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0758     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I5/77

[1] 梁昕露, 李美娟. 电信业投诉分类方法及其应用研究[J]. 中国管理科学, 2015, 23(S1): 188-192.
[1] (Liang Xinlu, Li Meijuan.Text Categorization of Complain in Telecommunication Industry and Its Applied Research[J]. Chinese Journal of Management Science, 2015, 23(S1): 188-192.)
[2] Gao L, Zhou S, Guan J.Effectively Classifying Short Texts by Structured Sparse Representation with Dictionary Filtering[J]. Information Sciences, 2015, 323: 130-142.
[3] Zhang H, Zhong G.Improving Short Text Classification by Learning Vector Representations of both Words and Hidden Topics[J]. Knowledge-Based Systems, 2016, 102: 76-86.
[4] Yang L, Li C, Ding Q, et al.Combining Lexical and Semantic Features for Short Text Classification[J]. Procedia Computer Science, 2013, 22: 78-86.
[5] Wang P, Xu B, Xu J, et al.Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174: 806-814.
[6] 卢玲, 杨武, 杨有俊, 等. 结合语义扩展和卷积神经网络的中文短文本分类方法[J].计算机应用, 2017, 37(12): 3498-3503.
[6] (Lu Ling, Yang Wu, Yang Youjun, et al.Chinese Short Text Classification Method by Combining Semantic Expansion and Convolutional Neural Network[J]. Journal of Computer Applications, 2017, 37(12): 3498-3503.)
[7] 陈培新, 郭武. 融合潜在主题信息和卷积语义特征的文本主题分类[J]. 信号处理, 2017, 33(8): 1090-1096.
[7] (Chen Peixin, Guo Wu.Document Topic Categorization Combining Latent Topic Information and Convolutional Semantic Features[J]. Journal of Signal Processing, 2007, 33(8): 1090-1096.)
[8] 王儒, 刘培玉, 王培培. 基于吸引子传播聚类的改进双通道CNN短文本分类算法[J]. 小型微型计算机系统, 2017, 38(8): 1730-1734.
[8] (Wang Ru, Liu Peiyu, Wang Peipei.Improved Two Channel CNN Short Text Classification Algorithm Based on Affinity Propagation Clustering[J]. Journal of Chinese Computer Systems, 2017, 38(8): 1730-1734.)
[9] 殷亚博, 杨文忠, 杨慧婷, 等. 基于卷积神经网络和KNN的短文本分类算法研究[J].计算机工程, 2018, 44(7): 193-198.
[9] (Yin Yabo, Yang Wenzhong, Yang Huiting, et al.Research on Short Text Classification Algorithm Based on Convolutional Neural Network and KNN[J]. Computer Engineering, 2018, 44(7): 193-198.)
[10] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[11] 邓淑卿, 徐健. 我国情报学研究主题内容分析[J]. 情报科学, 2017, 35(11): 83-88.
[11] (Deng Shuqing, Xu Jian.Research Topics and Trends of Information Science in China[J]. Information Science, 2017, 35(11): 83-88.)
[12] 林萍, 黄卫东. 基于LDA模型的网络突发事件话题演化路径研究[J]. 情报科学, 2014, 32(10): 20-23.
[12] (Lin Ping, Huang Weidong.Topic Evolution Analysis of Internet Emergency Based on LDA Model[J]. Information Science, 2014, 32(10): 20-23.)
[13] Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 1445-1456.
[14] 李慧, 王丽婷. 基于词项热度的微博热点话题发现研究[J]. 情报科学, 2018, 36(4): 45-50.
[14] (Li Hui, Wang Liting.Micro-blog Hot Topic Discovery Based on Heat Term[J]. Information Science, 2018, 36(4): 45-50.)
[15] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124.
[15] (Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 119-124.)
[16] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 2013 International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[18] Le Q, Mikolov T.Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[19] 逯万辉, 谭宗颖. 学术成果主题新颖性测度方法研究——基于Doc2Vec和HMM算法[J]. 数据分析与知识发现, 2018, 2(3): 22-29.
[19] (Lu Wanhui, Tan Zongying.Measuring Novelty of Scholarly Articles[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 22-29.)
[20] 杨宇婷, 王名扬, 田宪允, 等. 基于文档分布式表达的新浪微博情感分类研究[J]. 情报杂志, 2016, 35(2): 151-156.
[20] (Yang Yuting, Wang Mingyang, Tian Xianyun, et al.Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016, 35(2): 151-156.)
[21] Yu C T, Salton G.Precision Weighting—An Effective Automatic Indexing Method[R]. Cornell University, 1975.
[22] Cortes C, Vapnik V.Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[23] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016.
[23] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[24] Breiman L.Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[25] 孙锐, 郭晟, 姬东鸿. 融入事件知识的主题表示方法[J]. 计算机学报, 2017, 40(4): 791-804.
[25] (Sun Rui, Guo Sheng, Ji Donghong.Topic Representation Integrated with Event Knowledge[J]. Chinese Journal of Computers, 2017, 40(4): 791-804.)
[26] 刘泽锦, 王洁. 同主题词短文本分类算法中BTM的应用与改进[J]. 计算机系统应用, 2017, 26(11): 213-219.
[26] (Liu Zejin, Wang Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic[J]. Computer Systems & Applications, 2017, 26(11): 213-219.)
[1] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[2] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[3] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[4] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[5] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[6] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[7] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[8] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[9] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[10] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[11] Yan Yu,Naixuan Zhao. Weighted Topic Model for Patent Text Analysis[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
[12] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[13] Weilin He,Guohe Feng,Hongling Xie. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[14] Tingting Wang,Yu Wang,Linjie Qin. Dividing Time Windows of Dynamic Topic Model[J]. 数据分析与知识发现, 2018, 2(10): 54-64.
[15] Wei Cao,Can Li,Tingting He,Weidong Zhu. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn