Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (7-8): 97-103    DOI: 10.11925/infotech.1003-3513.2015.07.13
Current Issue | Archive | Adv Search |
Complaint Text Classification Based on Guiding Words
Hu Juxiang1, Lv Xueqiang1, Liu Kehui2,3
1 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2 School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China;
3 Beijing Research Center of Urban System Engineering, Beijing 100035, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] For complaint text has the characteristics of informative, unstructured, weak regularity etc., the current information management of city complaint needs an efficient classification method to improve the efficiency of the management staff.[Methods] Analyze the characteristics of complaints and go for text preprocessing; Then use the parser, synonyms forest, and through the contribution of the document to filter guide word; At last, calculate the guide word weighting coefficients with TF-IDF, use VSM model to represent guide words and use SVM model to classify the complaint text. [Results] In multiple categories of complaint text, the average precision of the method is up to 82.1% and the average recall is up to 82.3%. [Limitations] Thesparsity of complaint text affects the classification results to a certain extent.[Conclusions] The experiment results show that the method is effective and feasible in the text classification of complaints, and it can improve categorization effect of thecomplaint text.

Received: 19 January 2015      Published: 25 August 2015
:  TP391.1  

Cite this article:

Hu Juxiang, Lv Xueqiang, Liu Kehui. Complaint Text Classification Based on Guiding Words. New Technology of Library and Information Service, 2015, 31(7-8): 97-103.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.07.13     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I7-8/97

[1] 杨柳, 殷钊, 滕建斌, 等.改进贝叶斯分类的智能短信分类方法[J].计算机科学, 2014, 41(10): 31-35.(Yang Liu, Yin Zhao, Teng Jianbin, et al. Intelligent SMS Classification Method Based on Improved Bayes Classification Algorithm[J].Computer Science, 2014, 41(10): 31-35.)
[2] 苑迪文.基于 KNN 的专利文本分类算法研究[D]. 焦作: 河南理工大学, 2012.(Yuan Diwen. Research of Patent Text Classification Algorithm Based on KNN[D]. Jiaozuo: Henan Polytechnic University, 2012.)
[3] Basu A, Walters C, Shepherd M. Support Vector Machines for Text Categorization [C]. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences. IEEE, 2003.
[4] 何国辉, 吴礼发.基于机器学习的文本分类技术的研究[J].计算机与现代化, 2009, 8(8): 4-6.(He Guohui, Wu Lifa. Research on Text Categorization Based on Machine[J].Computer and Modernization, 2009, 8(8): 4-6.)
[5] 王鹏, 樊兴华.中文文本分类中利用依存关系的实验研究[J].计算机工程与应用, 2010, 46(3): 131-133.(Wang Peng, Fan Xinghua. Study on Chinese Text Classification Based on Dependency Relation [J].Computer Engineering and Applications, 2010, 46(3): 131-133.)
[6] 韩永峰, 郭志刚, 陈翰, 等.基于领域特征词的突发事件层次分类方法[J].信息工程大学学报, 2012, 13(5):593-600.(Han Yongfeng, Guo Zhigang, Chen Han, et al. Hierarchical Text Classification of Emergency Events Based on Domain Features[J]. Journal of Information Engineering University, 2012, 13(5): 593-600.)
[7] 夏海峰, 陈军华.基于文本挖掘的投诉热点智能分类[J]. 上海师范大学学报: 自然科学版, 2013, 42 (5): 470-475.(Xia Haifeng, Chen Junhua. Hot Complaint Intelligent Classification Based on Text Mining[J].Journal of Shanghai Normal University: Natural Sciences, 2013, 42(5): 470-475.)
[8] Yoon Y, Lee G G. Two Scalable Algorithms for Associative Text Classification[J]. Information Processing & Management, 2013, 49(2): 484-496.
[9] Chi N, Lin K, Hsieh S. Using Ontology-based Text Classification to Assist Job Hazard Analysis[J]. Advanced Engineering Informatics, 2014, 28(4): 381-394.
[10] Javed K, Maruf S, Babri H A. A Two-stage Markov Blanket Based Feature Selection Algorithm for Text Classification[J]. Neurocomputing, 2015, 157: 91-104.
[11] ICTCLAS [EB/OL].[2014-07-28].http://ictclas.nlpir.org/.
[12] 语言云(语言技术平台云LTP-Cloud)[EB/OL].[2014-09-20].http://www.ltp-cloud.com/.(LTP-Cloud[EB/OL].[2014-09-20].http://www.ltp-cloud.com/.)
[13] 李太白.短文本分类中特征选择算法的研究[D]. 重庆: 重庆师范大学, 2013.(Li Taibai. Research of Feature Selection Algorithm in Short Text Classification[D]. Chongqing: Chongqing Normal University, 2013.)
[14] 汪廷华, 田盛丰, 黄厚宽.特征加权支持向量机[J].电子与信息学报, 2009, 31(3): 514-518.(Wang Tinghua, Tian Shengfeng, Huang Houkuan. Feature Weighted Support Vector Machine [J]. Journal of Electronics & Information Technology, 2009, 31(3): 514-518.)
[15] 施聪莺, 徐朝军, 杨晓江.TFIDF 算法研究综述 [J].计算机应用, 2009, 29(6): 167-170.(Shi Congying, Xu Chaojun, Yang Xiaojiang. Study of TFIDF Algorithm [J]. Journal of Computer Applications, 2009, 29(6): 167-170.)
[16] 高金勇, 徐朝军, 冯奕竸.基于迭代的TFIDF在短文本分类中的应用[J].情报理论与实践, 2011, 34(6): 120-122.(Gao Jinyong, Xu Chaojun, Feng Yijing. Application of the Iteration-based TFIDF in Short Text Classification[J].Information Studies: Theory & Application, 2011, 34(6): 120-122.)
[17] 宗成庆.统计自然语言处理[M].北京: 清华大学出版社, 2008.(Zong Chengqing. Statistical Natural Language Processing [M]. Beijing: Tsinghua University Press, 2008.)
[18] 苏金树, 张博锋, 徐昕.基于机器学习的文本分类技术研究进展[J].软件学报, 2006, 17(9): 1848-1859.(Su Jinshu, Zhang Bofeng, Xu Xin. Advance in Machine Learning Based Text Categorization[J]. Journal of Software, 2006, 17(9): 1848-1859.)
[19] Vapnik V. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag, 1995.
[20] 张启蕊, 董守斌, 张凌.文本分类的性能评估指标[J].广西师范大学学报: 自然科学版, 2007, 25(2): 119-122.(Zhang Qirui, Dong Shoubin, Zhang Ling.Performance Evaluation in Text Classification[J].Journal of Guangxi Normal University: Natural Science Edition, 2007, 25(2): 119-122.)

[1] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[2] Liu Huan, Zhang Zhixiong, Wang Yufei. A Review on Main Optimization Methods of BERT [J]. 数据分析与知识发现, 0, (): 1-.
[3] Ye Guanghui, Xu Tong, Bi Chongwu, Li Xinyue. The Analysis of City Tourism Portrait Evolution Based on Multi-Dimensional Features and LDA Model [J]. 数据分析与知识发现, 0, (): 1-.
[4] Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition [J]. 数据分析与知识发现, 0, (): 0-.
[5] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[6] Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[7] Liu Shurui,Tian Jidong,Chen Puchun,Lai Li,Song Guojie. New Sample Selection Algorithm with Textual Data[J]. 数据分析与知识发现, 2020, 4(2/3): 223-230.
[8] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[9] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[10] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[11] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[12] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[13] Yu Chuanming,Gong Yutian,Wang Feng,An Lu. Predicting Stock Prices with Text and Price Combined Model[J]. 数据分析与知识发现, 2018, 2(12): 33-42.
[14] Zeng Ziming,Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[15] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn