Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (5): 77-85     https://doi.org/10.11925/infotech.2096-3467.2018.0758
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于nBD-SVM模型的投诉短文本分类*
余本功1,2,陈杨楠1(),杨颖1,2
1(合肥工业大学管理学院 合肥 230009)
2(合肥工业大学过程优化与智能决策教育部重点实验室 合肥 230009)
Classifying Short Text Complaints with nBD-SVM Model
Bengong Yu1,2,Yangnan Chen1(),Ying Yang1,2
1(School of Management, Hefei University of Technology, Hefei 230009, China)
2(Key Laboratory of Process Optimization & Intelligent Decision-making, Ministry of Education, Hefei University of Technology, Hefei 230009, China)
全文: PDF (779 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】对投诉短文本进行有效分类以提高问题处理效率。【方法】针对投诉文本所呈现出的弱结构化、长度较短等特征, 提出一种结合主题模型和词向量方法构建SVM输入空间向量, 并融入集成学习方法的nBD-SVM文本分类模型。【结果】采用企业投诉文本进行实证分析, 对比相关分类方法, nBD-SVM准确率可达81.13%, 说明其能够有效提升投诉文本分类的准确性和效率。【局限】实验仅以某公司投诉文本为例。【结论】nBD-SVM分类模型能够适应企业投诉文本分类任务, 满足企业的分类应用需求。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
余本功
陈杨楠
杨颖
关键词 投诉短文本分类主题模型词向量方法集成学习nBD-SVM    
Abstract

[Objective] This paper tries to find an effective way to classify the non-structured and short-text business complaints, aiming to improve the efficiency of corporate problem solving. [Methods] We first combined the topic model and distributed representation technique to construct a SVM input space vector. Then, we integrated ensemble learning method to build the nBD-SVM text classification model. [Results] We examined the proposed model with business complaint texts and found its precision reached 81.83%, which is much higher than the traditional methods. [Limitations] We only evaluate our model with complaints from one company. [Conclusions] The proposed nBD-SVM model could process short text business complaints effectively.

Key wordsComplaint Short Text Classification    Topic Model    Word Vector    Ensemble Learning    nBD-SVM
收稿日期: 2018-07-15      出版日期: 2019-07-03
基金资助:*本文系国家自然科学基金项目“基于制造大数据的产品研发知识集成与服务机制研究”(项目编号: 71671057)、国家自然科学基金项目“不确定环境下的复杂产品研发协同绩效动态评价研究”(项目编号: 71573071)和过程优化与智能决策教育部重点实验室开放课题的研究成果之一
引用本文:   
余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model. Data Analysis and Knowledge Discovery, 2019, 3(5): 77-85.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0758      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I5/77
[1] 梁昕露, 李美娟. 电信业投诉分类方法及其应用研究[J]. 中国管理科学, 2015, 23(S1): 188-192.
[1] (Liang Xinlu, Li Meijuan.Text Categorization of Complain in Telecommunication Industry and Its Applied Research[J]. Chinese Journal of Management Science, 2015, 23(S1): 188-192.)
[2] Gao L, Zhou S, Guan J.Effectively Classifying Short Texts by Structured Sparse Representation with Dictionary Filtering[J]. Information Sciences, 2015, 323: 130-142.
[3] Zhang H, Zhong G.Improving Short Text Classification by Learning Vector Representations of both Words and Hidden Topics[J]. Knowledge-Based Systems, 2016, 102: 76-86.
[4] Yang L, Li C, Ding Q, et al.Combining Lexical and Semantic Features for Short Text Classification[J]. Procedia Computer Science, 2013, 22: 78-86.
[5] Wang P, Xu B, Xu J, et al.Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174: 806-814.
[6] 卢玲, 杨武, 杨有俊, 等. 结合语义扩展和卷积神经网络的中文短文本分类方法[J].计算机应用, 2017, 37(12): 3498-3503.
[6] (Lu Ling, Yang Wu, Yang Youjun, et al.Chinese Short Text Classification Method by Combining Semantic Expansion and Convolutional Neural Network[J]. Journal of Computer Applications, 2017, 37(12): 3498-3503.)
[7] 陈培新, 郭武. 融合潜在主题信息和卷积语义特征的文本主题分类[J]. 信号处理, 2017, 33(8): 1090-1096.
[7] (Chen Peixin, Guo Wu.Document Topic Categorization Combining Latent Topic Information and Convolutional Semantic Features[J]. Journal of Signal Processing, 2007, 33(8): 1090-1096.)
[8] 王儒, 刘培玉, 王培培. 基于吸引子传播聚类的改进双通道CNN短文本分类算法[J]. 小型微型计算机系统, 2017, 38(8): 1730-1734.
[8] (Wang Ru, Liu Peiyu, Wang Peipei.Improved Two Channel CNN Short Text Classification Algorithm Based on Affinity Propagation Clustering[J]. Journal of Chinese Computer Systems, 2017, 38(8): 1730-1734.)
[9] 殷亚博, 杨文忠, 杨慧婷, 等. 基于卷积神经网络和KNN的短文本分类算法研究[J].计算机工程, 2018, 44(7): 193-198.
[9] (Yin Yabo, Yang Wenzhong, Yang Huiting, et al.Research on Short Text Classification Algorithm Based on Convolutional Neural Network and KNN[J]. Computer Engineering, 2018, 44(7): 193-198.)
[10] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[11] 邓淑卿, 徐健. 我国情报学研究主题内容分析[J]. 情报科学, 2017, 35(11): 83-88.
[11] (Deng Shuqing, Xu Jian.Research Topics and Trends of Information Science in China[J]. Information Science, 2017, 35(11): 83-88.)
[12] 林萍, 黄卫东. 基于LDA模型的网络突发事件话题演化路径研究[J]. 情报科学, 2014, 32(10): 20-23.
[12] (Lin Ping, Huang Weidong.Topic Evolution Analysis of Internet Emergency Based on LDA Model[J]. Information Science, 2014, 32(10): 20-23.)
[13] Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 1445-1456.
[14] 李慧, 王丽婷. 基于词项热度的微博热点话题发现研究[J]. 情报科学, 2018, 36(4): 45-50.
[14] (Li Hui, Wang Liting.Micro-blog Hot Topic Discovery Based on Heat Term[J]. Information Science, 2018, 36(4): 45-50.)
[15] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124.
[15] (Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 119-124.)
[16] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 2013 International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[18] Le Q, Mikolov T.Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[19] 逯万辉, 谭宗颖. 学术成果主题新颖性测度方法研究——基于Doc2Vec和HMM算法[J]. 数据分析与知识发现, 2018, 2(3): 22-29.
[19] (Lu Wanhui, Tan Zongying.Measuring Novelty of Scholarly Articles[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 22-29.)
[20] 杨宇婷, 王名扬, 田宪允, 等. 基于文档分布式表达的新浪微博情感分类研究[J]. 情报杂志, 2016, 35(2): 151-156.
[20] (Yang Yuting, Wang Mingyang, Tian Xianyun, et al.Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016, 35(2): 151-156.)
[21] Yu C T, Salton G.Precision Weighting—An Effective Automatic Indexing Method[R]. Cornell University, 1975.
[22] Cortes C, Vapnik V.Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[23] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016.
[23] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[24] Breiman L.Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[25] 孙锐, 郭晟, 姬东鸿. 融入事件知识的主题表示方法[J]. 计算机学报, 2017, 40(4): 791-804.
[25] (Sun Rui, Guo Sheng, Ji Donghong.Topic Representation Integrated with Event Knowledge[J]. Chinese Journal of Computers, 2017, 40(4): 791-804.)
[26] 刘泽锦, 王洁. 同主题词短文本分类算法中BTM的应用与改进[J]. 计算机系统应用, 2017, 26(11): 213-219.
[26] (Liu Zejin, Wang Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic[J]. Computer Systems & Applications, 2017, 26(11): 213-219.)
[1] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[2] 徐良辰, 郭崇慧. 基于集成学习的胃癌生存预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[3] 伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[4] 王楠,李海荣,谭舒孺. 基于改进SMOTE算法与集成学习的舆情反转预测研究*[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[5] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[6] 邱云飞, 郭蕾. 面向非均衡数据的糖尿病并发症预测[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[7] 赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 *[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[8] 陈浩, 张梦毅, 程秀峰. 融合主题模型与决策树的跨地区专利合作关系发现与推荐*——以广东省和武汉市高校专利库为例[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[9] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[10] 潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[11] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[12] 陈文杰. 基于翻译模型的科研合作预测研究*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[13] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[14] 凌洪飞,欧石燕. 面向主题模型的主题自动语义标注研究综述 *[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[15] 聂维民,陈永洲,马静. 融合多粒度信息的文本向量表示模型 *[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn