Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (5): 77-85    DOI: 10.11925/infotech.2096-3467.2018.0758
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于nBD-SVM模型的投诉短文本分类*
余本功1,2,陈杨楠1(),杨颖1,2
1(合肥工业大学管理学院 合肥 230009)
2(合肥工业大学过程优化与智能决策教育部重点实验室 合肥 230009)
Classifying Short Text Complaints with nBD-SVM Model
Bengong Yu1,2,Yangnan Chen1(),Ying Yang1,2
1(School of Management, Hefei University of Technology, Hefei 230009, China)
2(Key Laboratory of Process Optimization & Intelligent Decision-making, Ministry of Education, Hefei University of Technology, Hefei 230009, China)
全文: PDF(779 KB)   HTML ( 11
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】对投诉短文本进行有效分类以提高问题处理效率。【方法】针对投诉文本所呈现出的弱结构化、长度较短等特征, 提出一种结合主题模型和词向量方法构建SVM输入空间向量, 并融入集成学习方法的nBD-SVM文本分类模型。【结果】采用企业投诉文本进行实证分析, 对比相关分类方法, nBD-SVM准确率可达81.13%, 说明其能够有效提升投诉文本分类的准确性和效率。【局限】实验仅以某公司投诉文本为例。【结论】nBD-SVM分类模型能够适应企业投诉文本分类任务, 满足企业的分类应用需求。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
余本功
陈杨楠
杨颖
关键词 投诉短文本分类主题模型词向量方法集成学习nBD-SVM    
Abstract

[Objective] This paper tries to find an effective way to classify the non-structured and short-text business complaints, aiming to improve the efficiency of corporate problem solving. [Methods] We first combined the topic model and distributed representation technique to construct a SVM input space vector. Then, we integrated ensemble learning method to build the nBD-SVM text classification model. [Results] We examined the proposed model with business complaint texts and found its precision reached 81.83%, which is much higher than the traditional methods. [Limitations] We only evaluate our model with complaints from one company. [Conclusions] The proposed nBD-SVM model could process short text business complaints effectively.

Key wordsComplaint Short Text Classification    Topic Model    Word Vector    Ensemble Learning    nBD-SVM
收稿日期: 2018-07-15     
基金资助:*本文系国家自然科学基金项目“基于制造大数据的产品研发知识集成与服务机制研究”(项目编号: 71671057)、国家自然科学基金项目“不确定环境下的复杂产品研发协同绩效动态评价研究”(项目编号: 71573071)和过程优化与智能决策教育部重点实验室开放课题的研究成果之一
引用本文:   
余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0758.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0758
[1] 梁昕露, 李美娟. 电信业投诉分类方法及其应用研究[J]. 中国管理科学, 2015, 23(S1): 188-192.
[1] (Liang Xinlu, Li Meijuan.Text Categorization of Complain in Telecommunication Industry and Its Applied Research[J]. Chinese Journal of Management Science, 2015, 23(S1): 188-192.)
[2] Gao L, Zhou S, Guan J.Effectively Classifying Short Texts by Structured Sparse Representation with Dictionary Filtering[J]. Information Sciences, 2015, 323: 130-142.
[3] Zhang H, Zhong G.Improving Short Text Classification by Learning Vector Representations of both Words and Hidden Topics[J]. Knowledge-Based Systems, 2016, 102: 76-86.
[4] Yang L, Li C, Ding Q, et al.Combining Lexical and Semantic Features for Short Text Classification[J]. Procedia Computer Science, 2013, 22: 78-86.
[5] Wang P, Xu B, Xu J, et al.Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174: 806-814.
[6] 卢玲, 杨武, 杨有俊, 等. 结合语义扩展和卷积神经网络的中文短文本分类方法[J].计算机应用, 2017, 37(12): 3498-3503.
[6] (Lu Ling, Yang Wu, Yang Youjun, et al.Chinese Short Text Classification Method by Combining Semantic Expansion and Convolutional Neural Network[J]. Journal of Computer Applications, 2017, 37(12): 3498-3503.)
[7] 陈培新, 郭武. 融合潜在主题信息和卷积语义特征的文本主题分类[J]. 信号处理, 2017, 33(8): 1090-1096.
[7] (Chen Peixin, Guo Wu.Document Topic Categorization Combining Latent Topic Information and Convolutional Semantic Features[J]. Journal of Signal Processing, 2007, 33(8): 1090-1096.)
[8] 王儒, 刘培玉, 王培培. 基于吸引子传播聚类的改进双通道CNN短文本分类算法[J]. 小型微型计算机系统, 2017, 38(8): 1730-1734.
[8] (Wang Ru, Liu Peiyu, Wang Peipei.Improved Two Channel CNN Short Text Classification Algorithm Based on Affinity Propagation Clustering[J]. Journal of Chinese Computer Systems, 2017, 38(8): 1730-1734.)
[9] 殷亚博, 杨文忠, 杨慧婷, 等. 基于卷积神经网络和KNN的短文本分类算法研究[J].计算机工程, 2018, 44(7): 193-198.
[9] (Yin Yabo, Yang Wenzhong, Yang Huiting, et al.Research on Short Text Classification Algorithm Based on Convolutional Neural Network and KNN[J]. Computer Engineering, 2018, 44(7): 193-198.)
[10] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[11] 邓淑卿, 徐健. 我国情报学研究主题内容分析[J]. 情报科学, 2017, 35(11): 83-88.
[11] (Deng Shuqing, Xu Jian.Research Topics and Trends of Information Science in China[J]. Information Science, 2017, 35(11): 83-88.)
[12] 林萍, 黄卫东. 基于LDA模型的网络突发事件话题演化路径研究[J]. 情报科学, 2014, 32(10): 20-23.
[12] (Lin Ping, Huang Weidong.Topic Evolution Analysis of Internet Emergency Based on LDA Model[J]. Information Science, 2014, 32(10): 20-23.)
[13] Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 1445-1456.
[14] 李慧, 王丽婷. 基于词项热度的微博热点话题发现研究[J]. 情报科学, 2018, 36(4): 45-50.
[14] (Li Hui, Wang Liting.Micro-blog Hot Topic Discovery Based on Heat Term[J]. Information Science, 2018, 36(4): 45-50.)
[15] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124.
[15] (Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 119-124.)
[16] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 2013 International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[18] Le Q, Mikolov T.Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[19] 逯万辉, 谭宗颖. 学术成果主题新颖性测度方法研究——基于Doc2Vec和HMM算法[J]. 数据分析与知识发现, 2018, 2(3): 22-29.
[19] (Lu Wanhui, Tan Zongying.Measuring Novelty of Scholarly Articles[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 22-29.)
[20] 杨宇婷, 王名扬, 田宪允, 等. 基于文档分布式表达的新浪微博情感分类研究[J]. 情报杂志, 2016, 35(2): 151-156.
[20] (Yang Yuting, Wang Mingyang, Tian Xianyun, et al.Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016, 35(2): 151-156.)
[21] Yu C T, Salton G.Precision Weighting—An Effective Automatic Indexing Method[R]. Cornell University, 1975.
[22] Cortes C, Vapnik V.Support-Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[23] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016.
[23] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[24] Breiman L.Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[25] 孙锐, 郭晟, 姬东鸿. 融入事件知识的主题表示方法[J]. 计算机学报, 2017, 40(4): 791-804.
[25] (Sun Rui, Guo Sheng, Ji Donghong.Topic Representation Integrated with Event Knowledge[J]. Chinese Journal of Computers, 2017, 40(4): 791-804.)
[26] 刘泽锦, 王洁. 同主题词短文本分类算法中BTM的应用与改进[J]. 计算机系统应用, 2017, 26(11): 213-219.
[26] (Liu Zejin, Wang Jie.Application and Improvement of BTM in Short Text Classification Algorithm of the Same Topic[J]. Computer Systems & Applications, 2017, 26(11): 213-219.)
[1] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[2] 肖连杰,郜梦蕊,苏新宁. 一种基于模糊C-均值聚类的欠采样集成不平衡数据分类算法*[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[3] 席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[4] 张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[5] 刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[6] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[7] 张涛,马海群. 一种基于LDA主题模型的政策文本聚类方法研究*[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[8] 俞琰,赵乃瑄. 加权专利文本主题模型研究*[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
[9] 王丽,邹丽雪,刘细文. 基于LDA主题模型的文献关联分析及可视化研究[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[10] 李贺,祝琳琳,闫敏,刘金承,洪闯. 开放式创新社区用户信息有用性识别研究*[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[11] 何伟林,奉国和,谢红玲. 基于CSToT模型的科技文献主题发现与演化研究*[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[12] 王婷婷,王宇,秦琳杰. 基于动态主题模型的时间窗口划分研究*[J]. 数据分析与知识发现, 2018, 2(10): 54-64.
[13] 操玮,李灿,贺婷婷,朱卫东. 基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[14] 王婷婷,韩满,王宇. LDA模型的优化及其主题数量选择研究*——以科技文献为例[J]. 数据分析与知识发现, 2018, 2(1): 29-40.
[15] 曲佳彬,欧石燕. 基于主题过滤与主题关联的学科主题演化分析*[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn