Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (12): 52-59    DOI: 10.11925/infotech.2096-3467.2018.0415
Current Issue | Archive | Adv Search |
Identifying Trending Topics in Q&A Community with CART Decision Tree
Cheng Xiufeng1, Zhang Xinyi2, Wang Ning2()
1Institute of Scientific and Technical Information of China, Beijing 100038, China
2School of Information Management, Central China Normal University, Wuhan 430079, China
Download: PDF (591 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify the trending topics, aiming to help the decision-making agencies manage online public opinion. [Methods] Firstly, we proposed the criteria to detect the trending topics of Q&A community. Then, we conducted an empirical study on China’s Zhihu Q&A community using the CART decision tree algorithm. [Results] The CART decision tree predicted the trending topics. [Limitations] We only collected data from a small portion of all topics on Zhihu. More data is needed for future studies. [Conclusions] The proposed method based on the CART decision tree algorithm could effectively predict trending topics in the Q&A community, which help us choose popular contents.

Key wordsDecision Tree      Q&A Community      Trending Topics     
Received: 13 April 2018      Published: 16 January 2019
ZTFLH:  G25  

Cite this article:

Cheng Xiufeng,Zhang Xinyi,Wang Ning. Identifying Trending Topics in Q&A Community with CART Decision Tree. Data Analysis and Knowledge Discovery, 2018, 2(12): 52-59.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0415     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I12/52

特征 一级标准 二级标准
①吸引力较强 A问题关注度 A1浏览次数
②参与度较高 A2关注人数
③影响力较大 A3回答数量
④内容多样性 B问题内聚度 B1回答相近度
⑤间隔时间短 B2粒度特征值
⑥具备关键节点 C问题影响度 C1用户关键度
⑦传播速度快
Title Focus Cohesion Impact Is Order
Tree1.Topic1 1 379 560 8.99 502 643 1 25
Tree1.Topic2 9 495 9.81 83 591 1 49
Tree1.Topic3 356 204 8.04 522 595 1 55
Tree1.Topic4 51 740 8.23 89 478 1 63
Tree1.Topic5 347 185 9.28 994 162 1 93
Tree1.Topic6 3 496 1.98 8 874 1 94
Tree1.Topic7 8 538 3.64 597 1 96
Tree1.Topic8 4 361 4.41 10 818 1 99
Tree1.Topic9 56 159 3.33 93 735 1 110
Tree1.Topic10 35 877 1.82 21 288 1 115
Tree1.Topic11 6 600 5.86 5 318 1 118
Tree1.Topic12 403 128 8.66 97 756 1 121
Tree1.Topic13 4 249 1.89 1 108 1 124
Tree1.Topic14 703 195 15.20 52 308 1 128
Tree1.Topic15 1 327 4.31 2 760 1 136
Title Focus Cohesion Impact Is Order
Tree1.Topic16 109 452 15.91 109 622 0 137
Tree1.Topic17 95 0 0 0 139
Tree1.Topic18 648 7.18 457 0 145
Tree1.Topic19 5 068 3.49 111 0 149
Tree1.Topic20 950 3.27 11 670 0 153
Tree1.Topic21 801 1.53 46 0 159
Tree1.Topic22 1 472 1.97 44 0 163
Tree1.Topic23 791 2.37 586 0 164
Tree1.Topic24 426 1.83 12 650 0 173
Tree1.Topic25 281 1.85 68 0 180
Tree1.Topic26 871 3.39 5 181 0 203
Tree1.Topic27 1 196 2.11 144 588 0 207
Tree1.Topic28 576 3.13 9 949 0 209
Tree1.Topic29 408 1.95 16 350 0 213
Tree1.Topic30 463 2.46 465 0 234
Title Focus Cohesion Impact Is Order
Tree2.Topic1 109 452 15.91 109 622 1 1
Tree2.Topic2 403128 8.66 97 756 1 8
Tree2.Topic3 14 593 5.26 2 347 1 15
Tree2.Topic4 2 357 3.28 36 327 1 22
Tree2.Topic5 217 4.29 2 751 1 29
Tree2.Topic6 233 3.92 1 178 1 36
Tree2.Topic7 165 4.00 700 1 43
Tree2.Topic8 82 4.36 1 182 1 50
Tree2.Topic9 3 496 1.98 8 874 1 57
Tree2.Topic10 151 2.77 1 156 1 64
Tree2.Topic11 170 3.03 390 1 71
Tree2.Topic12 426 1.82 12 650 1 78
Tree2.Topic13 294 3.46 59 1 85
Tree2.Topic14 135 2.98 246 1 92
Tree2.Topic15 141 2.54 322 1 99
Title Focus Cohesion Impact Is Order
Tree2.Topic16 156 1.82 8 982 0 106
Tree2.Topic17 102 3.78 51 0 113
Tree2.Topic18 309 1.64 1 141 0 120
Tree2.Topic19 865 1.34 2 178 0 127
Tree2.Topic20 161 2.68 39 0 134
Tree2.Topic21 75 3.26 54 0 141
Tree2.Topic22 187 2.68 27 0 148
Tree2.Topic23 87 2.04 169 0 155
Tree2.Topic24 57 1.81 1 350 0 162
Tree2.Topic25 117 2.03 47 0 169
Tree2.Topic26 56 1.78 405 0 176
Tree2.Topic27 31 1.91 1 091 0 183
Tree2.Topic28 130 1.99 18 0 190
Tree2.Topic29 59 1.76 239 0 197
Tree2.Topic30 93 1.71 64 0 204
[1] Guo J, Xu S, Bao S, et al.Tapping on the Potential of Q&A Community by Recommending Answer Providers[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 921-930.
[2] 笱程成, 杜攀, 刘悦, 等. 在线社交网络中的新兴话题检测技术综述[J]. 中文信息学报, 2016, 30(5): 9-18.
[2] (Gou Chengcheng, Du Pan, Liu Yue, et al.Emerging Topic Detection in Online Social Networks: A Survey[J]. Journal of Chinese Information Processing, 2016, 30(5): 9-18.)
[3] Wikipedia. Decision Tree[EB/OL].[2018-05-20]. .
[4] Franco-Arcega A, Carrasco-Ochoa J A, Sánchez-Díaz G, et al. Building Fast Decision Trees from Large Training Sets[J]. Intelligent Data Analysis, 2012, 16(4): 649-664.
doi: 10.3233/IDA-2012-0542
[5] 王洪伟, 高松, 陆頲. 基于LDA和SNA的在线新闻热点识别研究[J]. 情报学报, 2016, 35(10): 1022-1037.
doi: 10.3772/j.issn.1000-0135.2016.010.002
[5] (Wang Hongwei, Gao Song, Lu Ting.Identifying Hot Topics of Online News Based on LDA and SNA[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(10): 1022-1037.)
doi: 10.3772/j.issn.1000-0135.2016.010.002
[6] Yang Y, Carbonell J G, Brown R D, et al.Learning Approaches for Detecting and Tracking News Events[J]. IEEE Intelligent Systems and Their Applications, 1999, 14(4): 32-43.
doi: 10.1109/5254.784083
[7] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12): 58-65.
[7] (Fan Yunman, Ma Jianxia.Review on the LDA-Based Techniques Detection for the Field Emerging Topic[J]. New Technology of Library and Information Service, 2012(12): 58-65.)
[8] Deerwester S, Dumais S T, Furnas G W, et al.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
doi: 10.1002/(ISSN)1097-4571
[9] Mehrotra R, Sanner S, Buntine W, et al.Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 889-892.
[10] Takahashi T, Tomioka R, Yamanishi K.Discovering Emerging Topics in Social Streams via Link-Anomaly Detection[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 120-130.
doi: 10.1109/TKDE.2012.239
[11] 贺敏, 徐杰, 杜攀, 等. 基于时间序列分析的微博突发话题检测方法[J]. 通信学报, 2016, 37(3): 48-54.
doi: 10.11959/j.issn.1000-436x.2016052
[11] (He Min, Xu Jie, Du Pan, et al.Bursty Topic Detection Method for Microblog Based on Time Series Analysis[J]. Journal on Communications, 2016, 37(3): 48-54.)
doi: 10.11959/j.issn.1000-436x.2016052
[12] 黄鲁成, 蒋林杉, 苗红, 等. 基于网络问答社区的话题识别与分析——以知乎“老年人”话题为例[J]. 图书情报工作, 2016, 60(5): 93-100.
doi: 10.13266/j.issn.0252-3116.2016.05.014
[12] (Huang Lucheng, Jiang Linshan, Miao Hong, et al.Detection and Analysis of the Topic Based on the Social Q&A Website: A Case Study of “The Elderly” on Zhihu Website[J]. Library and Information Service, 2016, 60(5): 93-100.)
doi: 10.13266/j.issn.0252-3116.2016.05.014
[13] Seni G, Elder J F.Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions[M]. Williston: Morgan and Claypool Publishers, 2010.
[14] 张棪, 曹健. 面向大数据分析的决策树算法[J]. 计算机科学, 2016, 43(S1): 374-379, 383.
[14] (Zhang Yan, Cao Jian.Decision Tree Algorithms for Big Data Analysis[J]. Computer Science, 2016, 43(S1): 374-379, 383.)
[15] Quinlan J R.Simplifying Decision Trees[J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
doi: 10.1016/S0020-7373(87)80053-6
[16] Kretowski M, Grzes M.Evolutionary Induction of Mixed Decision Trees[J]. International Journal of Data Warehousing and Mining, 2007, 3(4): 68-82.
doi: 10.4018/IJDWM
[17] 奚浩瀚, 刘云, 熊菲. 微博噪声过滤和话题检测[J]. 铁路计算机应用, 2015, 24(3): 19-21, 32.
doi: 10.3969/j.issn.1005-8451.2015.03.005
[17] (Xi Haohan, Liu Yun, Xiong Fei.Micro-Blog Noise Filtering and Topic Detection[J]. Railway Computer Application, 2015, 24(3): 19-21, 32.)
doi: 10.3969/j.issn.1005-8451.2015.03.005
[18] 宗慧, 刘金岭. 基于短文本信息流的热点话题检测[J]. 数据采集与处理, 2015, 30(2): 464-468.
doi: 10.16337/j.1004-9037.2015.02.026
[18] (Zong Hui, Liu Jinling.Hot Topic Detection Based on Short Text Information Flow[J]. Journal of Data Acquisition and Processing, 2015, 30(2): 464-468.)
doi: 10.16337/j.1004-9037.2015.02.026
[19] Tu Y N, Seng J L.Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
doi: 10.1016/j.ipm.2011.07.006
[20] 万越, 隋杰. 基于用户行为影响的微博突发话题检测方法[J]. 中国科学技术大学学报, 2017, 47(4): 328-335.
doi: 10.3969/j.issn.0253-2778.2017.04.007
[20] (Wan Yue, Sui Jie.Bursty Topic Detection Method for Microblog Based on Influence from User Behaviors[J]. Journal of University of Science and Technology of China, 2017, 47(4): 328-335.)
doi: 10.3969/j.issn.0253-2778.2017.04.007
[21] Dang Q, Gao F, Zhou Y.Early Detection Method for Emerging Topics Based on Dynamic Bayesian Networks in Micro-Blogging Networks[J]. Expert Systems with Applications, 2016, 57: 285-295.
doi: 10.1016/j.eswa.2016.03.050
[22] 孔维泽, 刘奕群, 张敏, 等. 问答社区中回答质量的评价方法研究[J]. 中文信息学报, 2011, 25(1): 3-8.
doi: 10.3969/j.issn.1003-0077.2011.01.001
[22] (Kong Weize, Liu Yiqun, Zhang Min, et al.Answer Quality Analysis on Community Question Answering[J]. Journal of Chinese Information Processing, 2011, 25(1): 3-8.)
doi: 10.3969/j.issn.1003-0077.2011.01.001
[23] Yang Y, Pierce T, Carbonell J.A Study of Retrospective and Online Event Detection[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[24] Zhang J, Ackerman M S, Adamic L.Expertise Networks in Online Communities: Structure and Algorithms[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 221-230.
[25] Quinlan J R.Induction of Decision Trees[J]. Machine Learning, 1986, 1(1): 81-106.
[26] Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
[27] Dunham M H.Data Mining: Introductory and Advanced Topics[M]. 2006.
[28] 栾丽华, 吉根林. 决策树分类技术研究[J]. 计算机工程, 2004, 30(9): 94-96, 105.
doi: 10.3969/j.issn.1000-3428.2004.09.038
[28] (Luan Lihua, Ji Genlin.The Study on Decision Tree Classification Techniques[J]. Computer Engineering, 2004, 30(9): 94-96, 105.)
doi: 10.3969/j.issn.1000-3428.2004.09.038
[29] Han J, Kambr M.Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
[30] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[30] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[31] 崔瑞飞, 于洪涛, 杨赟, 等. 基于评论树的微博社区热门话题检测方法[J]. 计算机应用研究, 2014, 31(12): 3776-3779, 3827.
doi: 10.3969/j.issn.1001-3695.2014.12.066
[31] (Cui Ruifei, Yu Hongtao, Yang Yun, et al.Hot Topic Detection Method on Micro-blog Based on Comments Tree[J]. Application Research of Computers, 2014, 31(12): 3776-3779, 3827.)
doi: 10.3969/j.issn.1001-3695.2014.12.066
[1] Shen Wang, Li Shiyu, Liu Jiayu, Li He. Optimizing Quality Evaluation for Answers of Q&A Community[J]. 数据分析与知识发现, 2021, 5(2): 83-93.
[2] Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[3] Fan Xinyue,Cui Lei. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[4] Yang Yang,Lin Hui,Hu Guangwei. Detecting Investment Risks of Photovoltaic Projects with Big Data: Case Study of Solarbao.com[J]. 现代图书情报技术, 2016, 32(11): 11-19.
[5] Zhao Jingxian. Detect of Internet Fake Public Opinion Based on Decision Tree[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[6] Tang Xiangbin, Lu Wei, Zhang Xiaojuan, Huang Shihao. Feature Analysis and Automatic Identification of Query Specificity[J]. 现代图书情报技术, 2015, 31(2): 15-23.
[7] Xu Xiaojuan,Zhao Yuxiang,Zhu Qinghua. Explore User’s Behavior of Academic Blog Based on EDTM:Take Blog.Sciencenet as an Example[J]. 现代图书情报技术, 2014, 30(1): 79-86.
[8] Wang Hongyu,Zhao Ying,Dang Yuewu. Design of an E-commerce Recommender System Based on Hybrid Algorithm[J]. 现代图书情报技术, 2009, 3(1): 80-85.
[9] Dong Chaoxiong,Xiao Xiaodan,Chen Xianlai,Gan Yongsheng . Comparison Research Between Discriminant Analysis and Decision Tree in the Application of Hospital Information System[J]. 现代图书情报技术, 2006, 1(12): 72-77.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn