Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (7-8): 12-20    DOI: 10.11925/infotech.1003-3513.2016.07.03
  本期目录 | 过刊浏览 | 高级检索 |
基于突发主题词和凝聚式层次聚类的微博突发事件检测研究*
丁晟春(),龚思兰,李红梅
南京理工大学经济管理学院 南京 210094
A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm
Ding Shengchun(),Gong Silan,Li Hongmei
School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
全文: PDF(653 KB)   HTML ( 60
输出: BibTeX | EndNote (RIS)      
摘要 

目的】实时、准确、高效地检测出海量微博中的突发事件, 为舆情应急管理提供重要的决策信息支持。【方法】引入参照时间窗机制, 设计词频、文档频率、话题标签(Hashtag)、词频增长率4类特征的选择与计算方法, 基于动态阈值实现对突发主题词的抽取。在此基础上, 将微博文本表示为突发主题词的特征向量, 使用凝聚式层次聚类算法实现了突发事件的检测。【结果】将实验结果结合实例进行分析, 突发事件检测达到80%的准确率, 验证该方法的可行性和有效性。【局限】由于语料数据和研究范围的限制, 还未实现对所检测突发事件的自动描述, 对网民情感、事件间语义关系等要素的分析及考量也存在一定欠缺。【结论】本研究突破以往相关研究中文本内容质量、文本形式、突发特征抽取结果的局限, 提升微博突发事件检测的效率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
丁晟春
龚思兰
李红梅
关键词 突发事件检测突发凝聚式层次聚类网络舆情微博    
Abstract

[Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making information to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users’ emotion and semantic relationships among the bursty events. [Conclusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.

Key wordsBursty events detection    Bursty topic words    Agglomerative hierarchical clustering algorithm Public opinion    Micro-blog
收稿日期: 2016-06-12     
基金资助:*本文系国家社会科学基金项目“基于社会网络分析的网络舆情主题发现研究”(项目编号:15BTQ063)、中央高校基本科研业务费专项资金资助项目“大数据时代基于深度融合的创新型知识服务体系及其运行机制研究”(项目编号:30916011330)和国家社会科学基金重点项目“大数据环境下社会舆情与决策支持方法体系研究”(项目编号:14AZD084)的研究成果之一
引用本文:   
丁晟春,龚思兰,李红梅. 基于突发主题词和凝聚式层次聚类的微博突发事件检测研究*[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2016.07.03.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.07.03
[1] Wang X, Zhai C X, Hu X, et al.Mining Correlated Bursty Topic Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2007: 784-793.
[2] Du Y, Wu W, He Y, et al.Microblog Bursty Feature Detection Based on Dynamics Model [C]. In: Proceedings of 2012 International Conference on Systems and Informatics (ICSAI). IEEE, 2012: 2304-2308.
[3] Aggarwal C C, Zhai C X.A Survey of Text Clustering Algorithms [A]. //Mining Text Data[M]. Springer US, 2012: 77-128.
[4] Yang Y, Pierce T, Carbonell J.A Study of Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[5] Petrovi? S, Osborne M, Lavrenko V.Streaming First Story Detection with Application to Twitter[C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 181-189.
[6] Phuvipadawat S, Murata T.Breaking News Detection and Tracking in Twitter [C]. In: Proceedings of 2010 IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 2010: 120-123.
[7] 葛高飞. 突发事件微博新话题检测与跟踪系统的设计与实现[D]. 北京: 北京邮电大学, 2014.
[7] (Ge Gaofei.Design and Implementation of New Topic Detection and Tracking of Microblog Based on Emergency [D]. Beijing: Beijing University of Posts and Telecomunications, 2014.)
[8] Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. 2011: 438-441.
[9] Du Y, He Y, Tian Y, et al.Microblog Bursty Topic Detection Based on User Relationship [C]. In:Proceedings of 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2011: 260-263.
[10] Benevenuto F, Magno G, Rodrigues T, et al.Detecting Spammers on Twitter [C]. In: Proceedings of the Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). 2010(6): 12-20.
[11] Weng J, Lee B S.Event Detection in Twitter [C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. 2011: 401-408.
[12] Kleinberg J.Bursty and Hierarchical Structure in Streams[J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397.
[13] He Q, Chang K, Lim E P.Analyzing Feature Trajectories for Event Detection [C]. In:Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 207-214.
[14] Mathioudakis M, Koudas N.Twittermonitor: Trend Detection over the Twitter Stream[C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 2010: 1155-1158.
[15] Long R, Wang H, Chen Y, et al.Towards Effective Event Detection, Tracking and Summarization on Microblog Data [A]. //Web-Age Information Management[M]. Springer Berlin Heidelberg, 2011: 652-663.
[16] 赵文清, 侯小可. 基于词共现图的中文微博新闻话题识别[J]. 智能系统学报, 2012, 7(5): 444-449.
[16] (Zhao Wenqing, Hou Xiaoke.News Topic Recognition of Chinese Microblog Based on Word Co-occurrence Graph[J]. CAAI Transactions on Intelligent Systems, 2012, 7(5): 444-449.)
[17] Yao J, Cui B, Huang Y, et al.Bursty Event Detection from Collaborative Tags[J]. World Wide Web, 2012, 15(2): 171-195.
[18] 王勇, 肖诗斌, 郭跇秀, 等. 中文微博突发事件检测研究[J]. 现代图书情报技术, 2013(2): 57-62.
[18] (Wang Yong, Xiao Shibin, Guo Yixiu, et al.Research on Chinese Micro-blog Bursty Topics Detection[J]. New Technology of Library and Information Service, 2013 (2): 57-62.)
[19] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490, 505.
[19] (Guo Yixiu, Lv Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490, 505.)
[20] Small T A.What the Hashtag? A Content Analysis of Canadian Politics on Twitter[J]. Information, Communication & Society, 2011, 14(6): 872-895.
[21] 张志瑛. 基于主题模型和社区发现的微博热点事件检测研究[D]. 重庆: 西南大学, 2014.
[21] (Zhang Zhiying.Research on Hot Event Detection in Micro-blog Based on Topic Model and Community Discovery [D]. Chongqing: Southwest University, 2014.)
[22] 国家语言文字工作委员会.《通用规范汉字表》[K]. 2013.08. . (National Languages Committee. The Common Standard Chinese Characters Table [K]. 2013.08.
[23] NLPIR汉语分词系统[CP/OL]. . (NLPIR Chinese Word Segmentation System [CP/OL].
[1] 安璐,梁艳平. 突发公共卫生事件微博话题与用户行为选择研究*[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
[2] 赵明清,武圣强. 基于微博情感分析的股市加权预测方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 43-51.
[3] 梅妍霜,朱恒民,魏静. 媒体协同对网络舆情扩散的作用机制研究*[J]. 数据分析与知识发现, 2019, 3(2): 65-71.
[4] 曾子明,杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析*[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[5] 贾隆嘉,张邦佐. 高校网络舆情安全中主题分类方法研究*——以新浪微博数据为例[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[6] 王璟琦,李锐,吴华意. 基于空间自相关的网络舆情话题演化时空规律分析*[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[7] 高永兵,杨贵朋,张娣,马占飞. 基于突显词博文聚类的官微事件检测方法*[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[8] 何跃,朱灿. 基于微博的意见领袖网情感特征分析*——以“非法疫苗”事件为例[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
[9] 李真,丁晟春,王楠. 网络舆情观点主题识别研究*[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
[10] 敦欣卉,张云秋,杨铠西. 基于微博的细粒度情感分析[J]. 数据分析与知识发现, 2017, 1(7): 61-72.
[11] 王晰巍,张柳,李师萌,王楠阿雪. 新媒体环境下社会公益网络舆情传播研究* ——以新浪微博“画出生命线”话题为例[J]. 数据分析与知识发现, 2017, 1(6): 93-101.
[12] 祁瑞华. 基于依存关系的中文微博作者性别识别*[J]. 数据分析与知识发现, 2017, 1(2): 58-63.
[13] 杨爽,陈芬. 基于SVM多特征融合的微博情感多级分类研究*[J]. 数据分析与知识发现, 2017, 1(2): 73-79.
[14] 吴鹏,金贝贝,强韶华. 基于BDI-Agent模型的突发事件网络舆情应急响应建模研究*[J]. 现代图书情报技术, 2016, 32(7-8): 32-41.
[15] 杨小平,马奇凤,余力,莫雨婷,吴佳楠,张悦. 评论簇在网络舆论中的情感倾向代表性研究*[J]. 现代图书情报技术, 2016, 32(7-8): 51-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn