|
|
Identifying Trending Topics in Q&A Community with CART Decision Tree |
Cheng Xiufeng1, Zhang Xinyi2, Wang Ning2() |
1Institute of Scientific and Technical Information of China, Beijing 100038, China 2School of Information Management, Central China Normal University, Wuhan 430079, China |
|
|
Abstract [Objective] This paper tries to identify the trending topics, aiming to help the decision-making agencies manage online public opinion. [Methods] Firstly, we proposed the criteria to detect the trending topics of Q&A community. Then, we conducted an empirical study on China’s Zhihu Q&A community using the CART decision tree algorithm. [Results] The CART decision tree predicted the trending topics. [Limitations] We only collected data from a small portion of all topics on Zhihu. More data is needed for future studies. [Conclusions] The proposed method based on the CART decision tree algorithm could effectively predict trending topics in the Q&A community, which help us choose popular contents.
|
Received: 13 April 2018
Published: 16 January 2019
|
|
[1] |
Guo J, Xu S, Bao S, et al.Tapping on the Potential of Q&A Community by Recommending Answer Providers[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 921-930.
|
[2] |
笱程成, 杜攀, 刘悦, 等. 在线社交网络中的新兴话题检测技术综述[J]. 中文信息学报, 2016, 30(5): 9-18.
|
[2] |
(Gou Chengcheng, Du Pan, Liu Yue, et al.Emerging Topic Detection in Online Social Networks: A Survey[J]. Journal of Chinese Information Processing, 2016, 30(5): 9-18.)
|
[3] |
Wikipedia. Decision Tree[EB/OL].[2018-05-20]. .
|
[4] |
Franco-Arcega A, Carrasco-Ochoa J A, Sánchez-Díaz G, et al. Building Fast Decision Trees from Large Training Sets[J]. Intelligent Data Analysis, 2012, 16(4): 649-664.
doi: 10.3233/IDA-2012-0542
|
[5] |
王洪伟, 高松, 陆頲. 基于LDA和SNA的在线新闻热点识别研究[J]. 情报学报, 2016, 35(10): 1022-1037.
doi: 10.3772/j.issn.1000-0135.2016.010.002
|
[5] |
(Wang Hongwei, Gao Song, Lu Ting.Identifying Hot Topics of Online News Based on LDA and SNA[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(10): 1022-1037.)
doi: 10.3772/j.issn.1000-0135.2016.010.002
|
[6] |
Yang Y, Carbonell J G, Brown R D, et al.Learning Approaches for Detecting and Tracking News Events[J]. IEEE Intelligent Systems and Their Applications, 1999, 14(4): 32-43.
doi: 10.1109/5254.784083
|
[7] |
范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12): 58-65.
|
[7] |
(Fan Yunman, Ma Jianxia.Review on the LDA-Based Techniques Detection for the Field Emerging Topic[J]. New Technology of Library and Information Service, 2012(12): 58-65.)
|
[8] |
Deerwester S, Dumais S T, Furnas G W, et al.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
doi: 10.1002/(ISSN)1097-4571
|
[9] |
Mehrotra R, Sanner S, Buntine W, et al.Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 889-892.
|
[10] |
Takahashi T, Tomioka R, Yamanishi K.Discovering Emerging Topics in Social Streams via Link-Anomaly Detection[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 120-130.
doi: 10.1109/TKDE.2012.239
|
[11] |
贺敏, 徐杰, 杜攀, 等. 基于时间序列分析的微博突发话题检测方法[J]. 通信学报, 2016, 37(3): 48-54.
doi: 10.11959/j.issn.1000-436x.2016052
|
[11] |
(He Min, Xu Jie, Du Pan, et al.Bursty Topic Detection Method for Microblog Based on Time Series Analysis[J]. Journal on Communications, 2016, 37(3): 48-54.)
doi: 10.11959/j.issn.1000-436x.2016052
|
[12] |
黄鲁成, 蒋林杉, 苗红, 等. 基于网络问答社区的话题识别与分析——以知乎“老年人”话题为例[J]. 图书情报工作, 2016, 60(5): 93-100.
doi: 10.13266/j.issn.0252-3116.2016.05.014
|
[12] |
(Huang Lucheng, Jiang Linshan, Miao Hong, et al.Detection and Analysis of the Topic Based on the Social Q&A Website: A Case Study of “The Elderly” on Zhihu Website[J]. Library and Information Service, 2016, 60(5): 93-100.)
doi: 10.13266/j.issn.0252-3116.2016.05.014
|
[13] |
Seni G, Elder J F.Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions[M]. Williston: Morgan and Claypool Publishers, 2010.
|
[14] |
张棪, 曹健. 面向大数据分析的决策树算法[J]. 计算机科学, 2016, 43(S1): 374-379, 383.
|
[14] |
(Zhang Yan, Cao Jian.Decision Tree Algorithms for Big Data Analysis[J]. Computer Science, 2016, 43(S1): 374-379, 383.)
|
[15] |
Quinlan J R.Simplifying Decision Trees[J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
doi: 10.1016/S0020-7373(87)80053-6
|
[16] |
Kretowski M, Grzes M.Evolutionary Induction of Mixed Decision Trees[J]. International Journal of Data Warehousing and Mining, 2007, 3(4): 68-82.
doi: 10.4018/IJDWM
|
[17] |
奚浩瀚, 刘云, 熊菲. 微博噪声过滤和话题检测[J]. 铁路计算机应用, 2015, 24(3): 19-21, 32.
doi: 10.3969/j.issn.1005-8451.2015.03.005
|
[17] |
(Xi Haohan, Liu Yun, Xiong Fei.Micro-Blog Noise Filtering and Topic Detection[J]. Railway Computer Application, 2015, 24(3): 19-21, 32.)
doi: 10.3969/j.issn.1005-8451.2015.03.005
|
[18] |
宗慧, 刘金岭. 基于短文本信息流的热点话题检测[J]. 数据采集与处理, 2015, 30(2): 464-468.
doi: 10.16337/j.1004-9037.2015.02.026
|
[18] |
(Zong Hui, Liu Jinling.Hot Topic Detection Based on Short Text Information Flow[J]. Journal of Data Acquisition and Processing, 2015, 30(2): 464-468.)
doi: 10.16337/j.1004-9037.2015.02.026
|
[19] |
Tu Y N, Seng J L.Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
doi: 10.1016/j.ipm.2011.07.006
|
[20] |
万越, 隋杰. 基于用户行为影响的微博突发话题检测方法[J]. 中国科学技术大学学报, 2017, 47(4): 328-335.
doi: 10.3969/j.issn.0253-2778.2017.04.007
|
[20] |
(Wan Yue, Sui Jie.Bursty Topic Detection Method for Microblog Based on Influence from User Behaviors[J]. Journal of University of Science and Technology of China, 2017, 47(4): 328-335.)
doi: 10.3969/j.issn.0253-2778.2017.04.007
|
[21] |
Dang Q, Gao F, Zhou Y.Early Detection Method for Emerging Topics Based on Dynamic Bayesian Networks in Micro-Blogging Networks[J]. Expert Systems with Applications, 2016, 57: 285-295.
doi: 10.1016/j.eswa.2016.03.050
|
[22] |
孔维泽, 刘奕群, 张敏, 等. 问答社区中回答质量的评价方法研究[J]. 中文信息学报, 2011, 25(1): 3-8.
doi: 10.3969/j.issn.1003-0077.2011.01.001
|
[22] |
(Kong Weize, Liu Yiqun, Zhang Min, et al.Answer Quality Analysis on Community Question Answering[J]. Journal of Chinese Information Processing, 2011, 25(1): 3-8.)
doi: 10.3969/j.issn.1003-0077.2011.01.001
|
[23] |
Yang Y, Pierce T, Carbonell J.A Study of Retrospective and Online Event Detection[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
|
[24] |
Zhang J, Ackerman M S, Adamic L.Expertise Networks in Online Communities: Structure and Algorithms[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 221-230.
|
[25] |
Quinlan J R.Induction of Decision Trees[J]. Machine Learning, 1986, 1(1): 81-106.
|
[26] |
Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
|
[27] |
Dunham M H.Data Mining: Introductory and Advanced Topics[M]. 2006.
|
[28] |
栾丽华, 吉根林. 决策树分类技术研究[J]. 计算机工程, 2004, 30(9): 94-96, 105.
doi: 10.3969/j.issn.1000-3428.2004.09.038
|
[28] |
(Luan Lihua, Ji Genlin.The Study on Decision Tree Classification Techniques[J]. Computer Engineering, 2004, 30(9): 94-96, 105.)
doi: 10.3969/j.issn.1000-3428.2004.09.038
|
[29] |
Han J, Kambr M.Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
|
[30] |
周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
|
[30] |
(Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
|
[31] |
崔瑞飞, 于洪涛, 杨赟, 等. 基于评论树的微博社区热门话题检测方法[J]. 计算机应用研究, 2014, 31(12): 3776-3779, 3827.
doi: 10.3969/j.issn.1001-3695.2014.12.066
|
[31] |
(Cui Ruifei, Yu Hongtao, Yang Yun, et al.Hot Topic Detection Method on Micro-blog Based on Comments Tree[J]. Application Research of Computers, 2014, 31(12): 3776-3779, 3827.)
doi: 10.3969/j.issn.1001-3695.2014.12.066
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|