[Objective] This paper aims to extract valuable information from large amount of complaint texts with the help of Chinese message processing technologies. [Methods] First, we analyzed the characteristics of the complaint texts, and then clustered them by k-means algorithm. Second, we extracted topics from the texts of each category with the LDA model. In the mean time, we calculated the weight of the word of each topic, as well as the mean of document probability distribution. Third, we analyzed topics with the highest means and used the document supporting rates to identify the trending ones. [Results] The document supporting rates of the topics extracted by this study was three times higher than the average ones. [Limitations] We did not investigate the semantic relationship among the topics. [Conclusions] The LDA model is an effective method to detect hot topics of the mobile complaints and indicates some future studies.
(Zhang Peijing, Song Lei.Overview on Topic Modeling of Microblogs Text Based on LDA[J]. Library and Information Service, 2012, 56(24): 120-126.)
[3]
Weng J, Lim E P, Jiang J, et al.TwitterRank: Finding Topic-sensitive Influential Twitterers[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010: 261-270.
[4]
Hong L, Davison B D.Empirical Study of Topic Modeling in Twitter[C]//Proceedings of the 1st Workshop on Social Media Analytics. ACM, 2010: 80-88.
[5]
Rosen-Zvi M, Griffiths T, Steyvers M, et al.The Author- Topic Model for Authors and Documents[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494.
[6]
Zhao W X, Jiang J, Weng J, et al.Comparing Twitter and Traditional Media Using Topic Models[C]// Proceedings of the 33rd European Conference on Information Retrieval. Springer Berlin Heidelberg, 2011: 338-349.
[7]
Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 248-256.
(Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
(Tang Xiaobo, Xiang Kun.Hotspot Mining Based on LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
doi: 10.13266/j.issn.0252-3116.2014.05.010
(Wu Wankun, Wu Qinglie, Gu Jinjiang.Hot Topic Extraction from E-commerce Microblog Based on EM-LDA Integrated Model[J]. New Technology of Library and Information, 2015(11): 33-40.)
[12]
Rosen-Zvi M, Chemudugunta C, Griffiths T, et al. Learning Author-topic Models from Text Corpora [J]. ACM Transactions on Information Systems, 2010, 28(1): Article No.4.
doi: 10.1145/1658377.1658381
[13]
Zhao W X, Jiang J, He J, et al.Topical Key Phrase Extraction from Twitter[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 2011: 379-388.
[14]
Ramage D, Dumais S T, Liebling D J.Characterizing Microblogs with Topic Models[C]//Proceedings of the 4th International Conference on Weblogs and Social Media. 2010.
(Zhu Chengwen, Li Bing, Hu Kui.Algorithm of Parameter Estimation of HMM via Gibbs Sampling. Computer Engineering and Applications, 2012, 48(18): 57-60.)
doi: 10.3778/j.issn.1002-8331.2012.18.012
(Guan Peng, Wang Yuefen.Identifying Optionan Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information, 2016, 32(9): 42-50.)
(Xu Jiajun, Yang Yang, Yao Tianfang, et al.LDA Based Hot Topic Detection and Tracking for the Forum[J]. Journal of Chinese Information Processing, 2016, 30(1): 43-50.)
(Zhang Liangjun, Wang Lu, Tan Liyun, et al.Python Practice of Data Analysis and Mining [M]. Machinery Industry Press, 2015.)
[20]
jieba [CP/OL].[2016-11-23]. .
[21]
哈尔滨工业大学停用词词典[OL]. [2016-11-23]. .
[21]
(Stop Word Dictionary by Harbin Institute of Technology [OL]. [2016-11-23].
[22]
JGibbLDA: A Java Implementation of Latent Dirichlet Allocation (LDA) Using Gibbs Sampling for Parameter Estimation and Inference [CP/OL]. [2016-11-23]. .