Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (12): 52-59    DOI: 10.11925/infotech.2096-3467.2018.0415
Current Issue | Archive | Adv Search |
Identifying Trending Topics in Q&A Community with CART Decision Tree
Xiufeng Cheng1,Xinyi Zhang2,Ning Wang2()
1Institute of Scientific and Technical Information of China, Beijing 100038, China
2School of Information Management, Central China Normal University, Wuhan 430079, China
Download: PDF(591 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify the trending topics, aiming to help the decision-making agencies manage online public opinion. [Methods] Firstly, we proposed the criteria to detect the trending topics of Q&A community. Then, we conducted an empirical study on China’s Zhihu Q&A community using the CART decision tree algorithm. [Results] The CART decision tree predicted the trending topics. [Limitations] We only collected data from a small portion of all topics on Zhihu. More data is needed for future studies. [Conclusions] The proposed method based on the CART decision tree algorithm could effectively predict trending topics in the Q&A community, which help us choose popular contents.

Key wordsDecision Tree      Q&A Community      Trending Topics     
Received: 13 April 2018      Published: 16 January 2019

Cite this article:

Xiufeng Cheng,Xinyi Zhang,Ning Wang. Identifying Trending Topics in Q&A Community with CART Decision Tree. Data Analysis and Knowledge Discovery, 2018, 2(12): 52-59.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0415     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I12/52

[1] Guo J, Xu S, Bao S, et al.Tapping on the Potential of Q&A Community by Recommending Answer Providers[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 921-930.
[2] 笱程成, 杜攀, 刘悦, 等. 在线社交网络中的新兴话题检测技术综述[J]. 中文信息学报, 2016, 30(5): 9-18.
[2] (Gou Chengcheng, Du Pan, Liu Yue, et al.Emerging Topic Detection in Online Social Networks: A Survey[J]. Journal of Chinese Information Processing, 2016, 30(5): 9-18.)
[3] Wikipedia. Decision Tree[EB/OL].[2018-05-20]. .
[4] Franco-Arcega A, Carrasco-Ochoa J A, Sánchez-Díaz G, et al. Building Fast Decision Trees from Large Training Sets[J]. Intelligent Data Analysis, 2012, 16(4): 649-664.
[5] 王洪伟, 高松, 陆頲. 基于LDA和SNA的在线新闻热点识别研究[J]. 情报学报, 2016, 35(10): 1022-1037.
[5] (Wang Hongwei, Gao Song, Lu Ting.Identifying Hot Topics of Online News Based on LDA and SNA[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(10): 1022-1037.)
[6] Yang Y, Carbonell J G, Brown R D, et al.Learning Approaches for Detecting and Tracking News Events[J]. IEEE Intelligent Systems and Their Applications, 1999, 14(4): 32-43.
[7] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12): 58-65.
[7] (Fan Yunman, Ma Jianxia.Review on the LDA-Based Techniques Detection for the Field Emerging Topic[J]. New Technology of Library and Information Service, 2012(12): 58-65.)
[8] Deerwester S, Dumais S T, Furnas G W, et al.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[9] Mehrotra R, Sanner S, Buntine W, et al.Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 889-892.
[10] Takahashi T, Tomioka R, Yamanishi K.Discovering Emerging Topics in Social Streams via Link-Anomaly Detection[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 120-130.
[11] 贺敏, 徐杰, 杜攀, 等. 基于时间序列分析的微博突发话题检测方法[J]. 通信学报, 2016, 37(3): 48-54.
[11] (He Min, Xu Jie, Du Pan, et al.Bursty Topic Detection Method for Microblog Based on Time Series Analysis[J]. Journal on Communications, 2016, 37(3): 48-54.)
[12] 黄鲁成, 蒋林杉, 苗红, 等. 基于网络问答社区的话题识别与分析——以知乎“老年人”话题为例[J]. 图书情报工作, 2016, 60(5): 93-100.
[12] (Huang Lucheng, Jiang Linshan, Miao Hong, et al.Detection and Analysis of the Topic Based on the Social Q&A Website: A Case Study of “The Elderly” on Zhihu Website[J]. Library and Information Service, 2016, 60(5): 93-100.)
[13] Seni G, Elder J F.Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions[M]. Williston: Morgan and Claypool Publishers, 2010.
[14] 张棪, 曹健. 面向大数据分析的决策树算法[J]. 计算机科学, 2016, 43(S1): 374-379, 383.
[14] (Zhang Yan, Cao Jian.Decision Tree Algorithms for Big Data Analysis[J]. Computer Science, 2016, 43(S1): 374-379, 383.)
[15] Quinlan J R.Simplifying Decision Trees[J]. International Journal of Man-Machine Studies, 1987, 27(3): 221-234.
[16] Kretowski M, Grzes M.Evolutionary Induction of Mixed Decision Trees[J]. International Journal of Data Warehousing and Mining, 2007, 3(4): 68-82.
[17] 奚浩瀚, 刘云, 熊菲. 微博噪声过滤和话题检测[J]. 铁路计算机应用, 2015, 24(3): 19-21, 32.
[17] (Xi Haohan, Liu Yun, Xiong Fei.Micro-Blog Noise Filtering and Topic Detection[J]. Railway Computer Application, 2015, 24(3): 19-21, 32.)
[18] 宗慧, 刘金岭. 基于短文本信息流的热点话题检测[J]. 数据采集与处理, 2015, 30(2): 464-468.
[18] (Zong Hui, Liu Jinling.Hot Topic Detection Based on Short Text Information Flow[J]. Journal of Data Acquisition and Processing, 2015, 30(2): 464-468.)
[19] Tu Y N, Seng J L.Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
[20] 万越, 隋杰. 基于用户行为影响的微博突发话题检测方法[J]. 中国科学技术大学学报, 2017, 47(4): 328-335.
[20] (Wan Yue, Sui Jie.Bursty Topic Detection Method for Microblog Based on Influence from User Behaviors[J]. Journal of University of Science and Technology of China, 2017, 47(4): 328-335.)
[21] Dang Q, Gao F, Zhou Y.Early Detection Method for Emerging Topics Based on Dynamic Bayesian Networks in Micro-Blogging Networks[J]. Expert Systems with Applications, 2016, 57: 285-295.
[22] 孔维泽, 刘奕群, 张敏, 等. 问答社区中回答质量的评价方法研究[J]. 中文信息学报, 2011, 25(1): 3-8.
[22] (Kong Weize, Liu Yiqun, Zhang Min, et al.Answer Quality Analysis on Community Question Answering[J]. Journal of Chinese Information Processing, 2011, 25(1): 3-8.)
[23] Yang Y, Pierce T, Carbonell J.A Study of Retrospective and Online Event Detection[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[24] Zhang J, Ackerman M S, Adamic L.Expertise Networks in Online Communities: Structure and Algorithms[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 221-230.
[25] Quinlan J R.Induction of Decision Trees[J]. Machine Learning, 1986, 1(1): 81-106.
[26] Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
[27] Dunham M H.Data Mining: Introductory and Advanced Topics[M]. 2006.
[28] 栾丽华, 吉根林. 决策树分类技术研究[J]. 计算机工程, 2004, 30(9): 94-96, 105.
[28] (Luan Lihua, Ji Genlin.The Study on Decision Tree Classification Techniques[J]. Computer Engineering, 2004, 30(9): 94-96, 105.)
[29] Han J, Kambr M.Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
[30] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[30] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[31] 崔瑞飞, 于洪涛, 杨赟, 等. 基于评论树的微博社区热门话题检测方法[J]. 计算机应用研究, 2014, 31(12): 3776-3779, 3827.
[31] (Cui Ruifei, Yu Hongtao, Yang Yun, et al.Hot Topic Detection Method on Micro-blog Based on Comments Tree[J]. Application Research of Computers, 2014, 31(12): 3776-3779, 3827.)
[1] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[2] Yang Yang,Lin Hui,Hu Guangwei. Detecting Investment Risks of Photovoltaic Projects with Big Data: Case Study of Solarbao.com[J]. 现代图书情报技术, 2016, 32(11): 11-19.
[3] Zhao Jingxian. Detect of Internet Fake Public Opinion Based on Decision Tree[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[4] Tang Xiangbin, Lu Wei, Zhang Xiaojuan, Huang Shihao. Feature Analysis and Automatic Identification of Query Specificity[J]. 现代图书情报技术, 2015, 31(2): 15-23.
[5] Xu Xiaojuan,Zhao Yuxiang,Zhu Qinghua. Explore User’s Behavior of Academic Blog Based on EDTM:Take Blog.Sciencenet as an Example[J]. 现代图书情报技术, 2014, 30(1): 79-86.
[6] Wang Hongyu,Zhao Ying,Dang Yuewu. Design of an E-commerce Recommender System Based on Hybrid Algorithm[J]. 现代图书情报技术, 2009, 3(1): 80-85.
[7] Dong Chaoxiong,Xiao Xiaodan,Chen Xianlai,Gan Yongsheng . Comparison Research Between Discriminant Analysis and Decision Tree in the Application of Hospital Information System[J]. 现代图书情报技术, 2006, 1(12): 72-77.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn