1School of Management, Hefei University of Technology, Hefei 230009, China 2Key Laboratory of Process Optimization & Intelligent Decision-making, Ministry of Education,Hefei University of Technology, Hefei 230009, China
[Objective] This paper addresses the issue of data sparseness due to short texts, which also improves the performance of short texts classification.[Methods] We proposed a multi-channel text model for the input of short text classifier by integrating the semantics, word order features and topic features. Then, we created the classification method named nLD-SVM-RF with the help of SVM and random forest algorithms. Finally, we examined the new model with short text of complaints.[Results] We compared the performance of our new model with the SVM and RF single classifiers using Doc2vec as the feature. When n =5, the accuracy of the nLD-SVM-RF method increased by 9.70% and 6.25%, respectively.[Limitations] The experimental data size needs to be expanded.[Conclusions] The nLD-SVM-RF model provides a practical solution for the business community to analyse short texts and improve decision-making.
余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model. Data Analysis and Knowledge Discovery, 2020, 4(1): 111-120.
( Liang Xinlu, Li Meijuan . Text Categorization of Complain in Telecommunication Industry and Its Applied Research[J]. Chinese Journal of Management Science, 2015,23(S1):188-192.)
[2]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
( Zhou Yuan, Liu Huailan, Du Pengpeng , et al. Research of Text Classification Model Based on the Improved TF-IDF Feature Extraction[J]. Information Science, 2017,35(5):111-118.)
( Ma Jianhong, Liu Guangsen, Yao Shuang , et al. Text Feature Selection and Text Representation for Short Essays[J].Computer and Modernization, 2019(3):95-101,126.)
( Li Xiangdong, Ruan Tao, Liu Kang . Automatic Classification of Documents from Wikipedia[J]. Data Analysis and Knowledge Discovery, 2017,1(10):43-52.)
( Hu Yongjun, Jiang Jiaxin, Chang Huiyou . A New Method of Keywords Extraction for Chinese Short-text Classification[J]. New Technology of Library and Information Service, 2013(6):42-48.)
[8]
Burkhardt S, Kramer S . Online Multi-Label Dependency Topic Models for Text Classification[J]. Machine Learning, 2018,107(5):859-886.
[9]
Zhang H, Zhong G . Improving Short Text Classification by Learning Vector Representations of Both Words and Hidden Topics[J]. Knowledge-Based Systems, 2016,102:76-86.
[10]
Blei D. Probabilistic Topic Models [C]// Proceedings of the 17th ACM SIGKDD International Conference Tutorials. 2011.
( Sun Jianwang, Lv Xueqiang, Zhang Leihan . Short Text Classification Based on Semantics and Maximum Matching Degree[J]. Computer Engineering and Design, 2013,34(10):3613-3618.)
( Chen Yanfang . Research on Reliability Classification Model of Online Product Reviews Based on DDAG-SVM[J]. Information Studies: Theory & Application, 2017,40(7):132-137.)
[15]
张浩, 钟敏 . 计算机与现代化[J].计算机与现代化,2019(3):102-106.
[15]
( Zhang Hao, Zhong Min . Chinese Short Text Classification Based on Sentence-LDA Topic Model[J]. Computer and Modernization, 2019(3):102-106.)
( Huang Peijie, Wang Jundong, Ke Zixuan , et al. Dialogue Act Recognition for Out-of-Domain Utterances in Spoken Dialogue System[J]. Journal of Chinese Information Processing, 2016,30(6):182-189,200.)
( Han Dong, Wang Chunhua, Xiao Min . Improved CNN Based on Sentence-Level Supervised Learning for Short Text Classification[J]. Computer Engineering and Design, 2019,40(1):264-268,292.)
( Liu Jingxue, Meng Fanrong, Zhou Yong , et al. Character-Level Convolutional Neural Networks for Short Text Classification[J]. Computer Engineering and Applications, 2019,55(5):135-142.)
[19]
高元 . 面向个性化推荐的海量学术资源分类研究[D]. 宁波:宁波大学, 2017.
[19]
( Gao Yuan . Massive Academic Resources Classification Research for Personalized Recommender[D]. Ningbo: Ningbo University, 2017.)
( Zhu Qing, Wei Kezhen, Ding Lanlin , et al. Count Judgement Decision System Based on Text-mining and Machine Learning[J]. Chinese Journal of Management Science, 2018,26(1):170-178.)
( Shi Ruilang . Text Categorization Algorithm Based on Social Platform Data[J]. Electronic Science and Technology, 2018,31(10):69-70,75.)
[22]
Le Q, Mikolov T. Distributed Representations of Sentences and Documents [C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
( Chen Xiaomei, Gao Cheng, Guan Xinhui . Extraction Method of Network Public Opinion Based on LDA Topic Model[J]. Library and Information Service, 2015,59(21):21-26.)
( Yang Yuting, Wang Mingyang, Tian Xianyun , et al. Sina Microblog Sentiment Classification Based on Distributed Representation of Documents[J]. Journal of Intelligence, 2016,35(2):151-156.)
[25]
Cortes C, Vapnik V . Support-Vector Networks[J]. Machine Learning, 1995,20(3):273-297.
[26]
周志华 . 机器学习[M]. 北京: 清华大学出版社, 2016.
[26]
( Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[27]
Cutler A, Cutler D R, Stevens J R . Random Forests[A]// Zhang C, Ma Y. Ensemble Machine Learning[M]. Springer, 2004: 157-176.
( Yu Bengong, Chen Yangnan, Yang Ying . Classifying Short Text Complaints with nBD-SVM Model[J]. Data Analysis and Knowledge Discovery, 2019,3(5):77-85.)