Identifying Trending Topics in Q&A Community with CART Decision Tree
Cheng Xiufeng1, Zhang Xinyi2, Wang Ning2()
1Institute of Scientific and Technical Information of China, Beijing 100038, China 2School of Information Management, Central China Normal University, Wuhan 430079, China
[Objective] This paper tries to identify the trending topics, aiming to help the decision-making agencies manage online public opinion. [Methods] Firstly, we proposed the criteria to detect the trending topics of Q&A community. Then, we conducted an empirical study on China’s Zhihu Q&A community using the CART decision tree algorithm. [Results] The CART decision tree predicted the trending topics. [Limitations] We only collected data from a small portion of all topics on Zhihu. More data is needed for future studies. [Conclusions] The proposed method based on the CART decision tree algorithm could effectively predict trending topics in the Q&A community, which help us choose popular contents.
Guo J, Xu S, Bao S, et al.Tapping on the Potential of Q&A Community by Recommending Answer Providers[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 921-930.
(Gou Chengcheng, Du Pan, Liu Yue, et al.Emerging Topic Detection in Online Social Networks: A Survey[J]. Journal of Chinese Information Processing, 2016, 30(5): 9-18.)
[3]
Wikipedia. Decision Tree[EB/OL].[2018-05-20]. .
[4]
Franco-Arcega A, Carrasco-Ochoa J A, Sánchez-Díaz G, et al. Building Fast Decision Trees from Large Training Sets[J]. Intelligent Data Analysis, 2012, 16(4): 649-664.
doi: 10.3233/IDA-2012-0542
(Wang Hongwei, Gao Song, Lu Ting.Identifying Hot Topics of Online News Based on LDA and SNA[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(10): 1022-1037.)
doi: 10.3772/j.issn.1000-0135.2016.010.002
[6]
Yang Y, Carbonell J G, Brown R D, et al.Learning Approaches for Detecting and Tracking News Events[J]. IEEE Intelligent Systems and Their Applications, 1999, 14(4): 32-43.
doi: 10.1109/5254.784083
(Fan Yunman, Ma Jianxia.Review on the LDA-Based Techniques Detection for the Field Emerging Topic[J]. New Technology of Library and Information Service, 2012(12): 58-65.)
[8]
Deerwester S, Dumais S T, Furnas G W, et al.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
doi: 10.1002/(ISSN)1097-4571
[9]
Mehrotra R, Sanner S, Buntine W, et al.Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 889-892.
[10]
Takahashi T, Tomioka R, Yamanishi K.Discovering Emerging Topics in Social Streams via Link-Anomaly Detection[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 120-130.
doi: 10.1109/TKDE.2012.239
(He Min, Xu Jie, Du Pan, et al.Bursty Topic Detection Method for Microblog Based on Time Series Analysis[J]. Journal on Communications, 2016, 37(3): 48-54.)
doi: 10.11959/j.issn.1000-436x.2016052
(Huang Lucheng, Jiang Linshan, Miao Hong, et al.Detection and Analysis of the Topic Based on the Social Q&A Website: A Case Study of “The Elderly” on Zhihu Website[J]. Library and Information Service, 2016, 60(5): 93-100.)
doi: 10.13266/j.issn.0252-3116.2016.05.014
[13]
Seni G, Elder J F.Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions[M]. Williston: Morgan and Claypool Publishers, 2010.
(Zhang Yan, Cao Jian.Decision Tree Algorithms for Big Data Analysis[J]. Computer Science, 2016, 43(S1): 374-379, 383.)
[15]
Quinlan J R.Simplifying Decision Trees[J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
doi: 10.1016/S0020-7373(87)80053-6
[16]
Kretowski M, Grzes M.Evolutionary Induction of Mixed Decision Trees[J]. International Journal of Data Warehousing and Mining, 2007, 3(4): 68-82.
doi: 10.4018/IJDWM
(Zong Hui, Liu Jinling.Hot Topic Detection Based on Short Text Information Flow[J]. Journal of Data Acquisition and Processing, 2015, 30(2): 464-468.)
doi: 10.16337/j.1004-9037.2015.02.026
[19]
Tu Y N, Seng J L.Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
doi: 10.1016/j.ipm.2011.07.006
(Wan Yue, Sui Jie.Bursty Topic Detection Method for Microblog Based on Influence from User Behaviors[J]. Journal of University of Science and Technology of China, 2017, 47(4): 328-335.)
doi: 10.3969/j.issn.0253-2778.2017.04.007
[21]
Dang Q, Gao F, Zhou Y.Early Detection Method for Emerging Topics Based on Dynamic Bayesian Networks in Micro-Blogging Networks[J]. Expert Systems with Applications, 2016, 57: 285-295.
doi: 10.1016/j.eswa.2016.03.050
(Kong Weize, Liu Yiqun, Zhang Min, et al.Answer Quality Analysis on Community Question Answering[J]. Journal of Chinese Information Processing, 2011, 25(1): 3-8.)
doi: 10.3969/j.issn.1003-0077.2011.01.001
[23]
Yang Y, Pierce T, Carbonell J.A Study of Retrospective and Online Event Detection[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[24]
Zhang J, Ackerman M S, Adamic L.Expertise Networks in Online Communities: Structure and Algorithms[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 221-230.
(Luan Lihua, Ji Genlin.The Study on Decision Tree Classification Techniques[J]. Computer Engineering, 2004, 30(9): 94-96, 105.)
doi: 10.3969/j.issn.1000-3428.2004.09.038
[29]
Han J, Kambr M.Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
[30]
周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[30]
(Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
(Cui Ruifei, Yu Hongtao, Yang Yun, et al.Hot Topic Detection Method on Micro-blog Based on Comments Tree[J]. Application Research of Computers, 2014, 31(12): 3776-3779, 3827.)
doi: 10.3969/j.issn.1001-3695.2014.12.066