|
|
Identifying Hot Topics from Mobile Complaint Texts |
Fang Xiaofei1, Huang Xiaoxi1(), Wang Rongbo1, Chen Zhiqun1, Wang Xiaohua1,2 |
1Department of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China 2China Jiliang University, Hangzhou 310018, China |
|
|
Abstract [Objective] This paper aims to extract valuable information from large amount of complaint texts with the help of Chinese message processing technologies. [Methods] First, we analyzed the characteristics of the complaint texts, and then clustered them by k-means algorithm. Second, we extracted topics from the texts of each category with the LDA model. In the mean time, we calculated the weight of the word of each topic, as well as the mean of document probability distribution. Third, we analyzed topics with the highest means and used the document supporting rates to identify the trending ones. [Results] The document supporting rates of the topics extracted by this study was three times higher than the average ones. [Limitations] We did not investigate the semantic relationship among the topics. [Conclusions] The LDA model is an effective method to detect hot topics of the mobile complaints and indicates some future studies.
|
Received: 10 November 2016
Published: 27 March 2017
|
|
[1] |
David M B, John D L.Dynamic Topic Model[C]//Proceedings of the 23rd International Conference on M achine Learning. Pittsburgh. 2006: 113-120.
|
[2] |
张培晶, 宋蕾. 基于LDA的微博文本主题建模方法研究述评[J]. 图书情报工作, 2012, 56(24): 120-126.
|
[2] |
(Zhang Peijing, Song Lei.Overview on Topic Modeling of Microblogs Text Based on LDA[J]. Library and Information Service, 2012, 56(24): 120-126.)
|
[3] |
Weng J, Lim E P, Jiang J, et al.TwitterRank: Finding Topic-sensitive Influential Twitterers[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010: 261-270.
|
[4] |
Hong L, Davison B D.Empirical Study of Topic Modeling in Twitter[C]//Proceedings of the 1st Workshop on Social Media Analytics. ACM, 2010: 80-88.
|
[5] |
Rosen-Zvi M, Griffiths T, Steyvers M, et al.The Author- Topic Model for Authors and Documents[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2004: 487-494.
|
[6] |
Zhao W X, Jiang J, Weng J, et al.Comparing Twitter and Traditional Media Using Topic Models[C]// Proceedings of the 33rd European Conference on Information Retrieval. Springer Berlin Heidelberg, 2011: 338-349.
|
[7] |
Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 248-256.
|
[8] |
张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802.
|
[8] |
(Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
|
[9] |
唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63.
doi: 10.13266/j.issn.0252-3116.2014.05.010
|
[9] |
(Tang Xiaobo, Xiang Kun.Hotspot Mining Based on LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
doi: 10.13266/j.issn.0252-3116.2014.05.010
|
[10] |
朱颖. 基于微博的热点话题发现[D]. 重庆: 西南大学, 2014.
|
[10] |
(Zhu Ying.Hot Topic Extraction from Microblogs [D]. Chongqing: Southwest University, 2014.)
|
[11] |
伍万坤, 吴清烈, 顾锦江. 基于EM-LDA综合模型的电商微博热点话题发现[J]. 现代图书情报技术, 2015(11): 33-40.
|
[11] |
(Wu Wankun, Wu Qinglie, Gu Jinjiang.Hot Topic Extraction from E-commerce Microblog Based on EM-LDA Integrated Model[J]. New Technology of Library and Information, 2015(11): 33-40.)
|
[12] |
Rosen-Zvi M, Chemudugunta C, Griffiths T, et al. Learning Author-topic Models from Text Corpora [J]. ACM Transactions on Information Systems, 2010, 28(1): Article No.4.
doi: 10.1145/1658377.1658381
|
[13] |
Zhao W X, Jiang J, He J, et al.Topical Key Phrase Extraction from Twitter[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 2011: 379-388.
|
[14] |
Ramage D, Dumais S T, Liebling D J.Characterizing Microblogs with Topic Models[C]//Proceedings of the 4th International Conference on Weblogs and Social Media. 2010.
|
[15] |
吴夙慧, 成颖, 郑彦宁, 等. K-means算法研究综述[J]. 现代图书情报技术, 2011(5): 28-35.
|
[15] |
(Wu Suhui, Cheng Ying, Zheng Yanning, et al.Survey on K-means Algorithm[J]. New Technology of Library and Information Service, 2011(5): 28-35.)
|
[16] |
朱成文, 李兵, 胡奎. HMM参数估计的Gibbs抽样算法[J]. 计算机工程与应用, 2012, 18(18): 57-60.
doi: 10.3778/j.issn.1002-8331.2012.18.012
|
[16] |
(Zhu Chengwen, Li Bing, Hu Kui.Algorithm of Parameter Estimation of HMM via Gibbs Sampling. Computer Engineering and Applications, 2012, 48(18): 57-60.)
doi: 10.3778/j.issn.1002-8331.2012.18.012
|
[17] |
关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数的确定方法研究[J]. 现代图书情报技术, 2016, 32(9): 42-50.)
|
[17] |
(Guan Peng, Wang Yuefen.Identifying Optionan Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information, 2016, 32(9): 42-50.)
|
[18] |
徐佳俊, 杨飏, 姚天昉, 等. 基于LDA模型的论坛热点话题识别和追踪[J]. 中文信息学报, 2016, 30(1): 43-50.
|
[18] |
(Xu Jiajun, Yang Yang, Yao Tianfang, et al.LDA Based Hot Topic Detection and Tracking for the Forum[J]. Journal of Chinese Information Processing, 2016, 30(1): 43-50.)
|
[19] |
张良均, 王路, 谭立云, 等.Python 数据分析与挖掘实战[M]. 机械工业出版社, 2015.
|
[19] |
(Zhang Liangjun, Wang Lu, Tan Liyun, et al.Python Practice of Data Analysis and Mining [M]. Machinery Industry Press, 2015.)
|
[20] |
jieba [CP/OL].[2016-11-23]. .
|
[21] |
哈尔滨工业大学停用词词典[OL]. [2016-11-23]. .
|
[21] |
(Stop Word Dictionary by Harbin Institute of Technology [OL]. [2016-11-23].
|
[22] |
JGibbLDA: A Java Implementation of Latent Dirichlet Allocation (LDA) Using Gibbs Sampling for Parameter Estimation and Inference [CP/OL]. [2016-11-23]. .
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|