Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (7-8): 12-20     https://doi.org/10.11925/infotech.1003-3513.2016.07.03
  本期目录 | 过刊浏览 | 高级检索 |
基于突发主题词和凝聚式层次聚类的微博突发事件检测研究*
丁晟春(),龚思兰,李红梅
南京理工大学经济管理学院 南京 210094
A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm
Ding Shengchun(),Gong Silan,Li Hongmei
School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
全文: PDF (653 KB)   HTML ( 62
输出: BibTeX | EndNote (RIS)      
摘要 

目的】实时、准确、高效地检测出海量微博中的突发事件, 为舆情应急管理提供重要的决策信息支持。【方法】引入参照时间窗机制, 设计词频、文档频率、话题标签(Hashtag)、词频增长率4类特征的选择与计算方法, 基于动态阈值实现对突发主题词的抽取。在此基础上, 将微博文本表示为突发主题词的特征向量, 使用凝聚式层次聚类算法实现了突发事件的检测。【结果】将实验结果结合实例进行分析, 突发事件检测达到80%的准确率, 验证该方法的可行性和有效性。【局限】由于语料数据和研究范围的限制, 还未实现对所检测突发事件的自动描述, 对网民情感、事件间语义关系等要素的分析及考量也存在一定欠缺。【结论】本研究突破以往相关研究中文本内容质量、文本形式、突发特征抽取结果的局限, 提升微博突发事件检测的效率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
丁晟春
龚思兰
李红梅
关键词 突发事件检测突发凝聚式层次聚类网络舆情微博    
Abstract

[Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making information to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users’ emotion and semantic relationships among the bursty events. [Conclusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.

Key wordsBursty events detection    Bursty topic words    Agglomerative hierarchical clustering algorithm Public opinion    Micro-blog
收稿日期: 2016-06-12      出版日期: 2016-09-29
基金资助:*本文系国家社会科学基金项目“基于社会网络分析的网络舆情主题发现研究”(项目编号:15BTQ063)、中央高校基本科研业务费专项资金资助项目“大数据时代基于深度融合的创新型知识服务体系及其运行机制研究”(项目编号:30916011330)和国家社会科学基金重点项目“大数据环境下社会舆情与决策支持方法体系研究”(项目编号:14AZD084)的研究成果之一
引用本文:   
丁晟春,龚思兰,李红梅. 基于突发主题词和凝聚式层次聚类的微博突发事件检测研究*[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm. New Technology of Library and Information Service, 2016, 32(7-8): 12-20.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.07.03      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I7-8/12
[1] Wang X, Zhai C X, Hu X, et al.Mining Correlated Bursty Topic Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2007: 784-793.
[2] Du Y, Wu W, He Y, et al.Microblog Bursty Feature Detection Based on Dynamics Model [C]. In: Proceedings of 2012 International Conference on Systems and Informatics (ICSAI). IEEE, 2012: 2304-2308.
[3] Aggarwal C C, Zhai C X.A Survey of Text Clustering Algorithms [A]. //Mining Text Data[M]. Springer US, 2012: 77-128.
[4] Yang Y, Pierce T, Carbonell J.A Study of Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[5] Petrovi? S, Osborne M, Lavrenko V.Streaming First Story Detection with Application to Twitter[C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 181-189.
[6] Phuvipadawat S, Murata T.Breaking News Detection and Tracking in Twitter [C]. In: Proceedings of 2010 IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 2010: 120-123.
[7] 葛高飞. 突发事件微博新话题检测与跟踪系统的设计与实现[D]. 北京: 北京邮电大学, 2014.
[7] (Ge Gaofei.Design and Implementation of New Topic Detection and Tracking of Microblog Based on Emergency [D]. Beijing: Beijing University of Posts and Telecomunications, 2014.)
[8] Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. 2011: 438-441.
[9] Du Y, He Y, Tian Y, et al.Microblog Bursty Topic Detection Based on User Relationship [C]. In:Proceedings of 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2011: 260-263.
[10] Benevenuto F, Magno G, Rodrigues T, et al.Detecting Spammers on Twitter [C]. In: Proceedings of the Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). 2010(6): 12-20.
[11] Weng J, Lee B S.Event Detection in Twitter [C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. 2011: 401-408.
[12] Kleinberg J.Bursty and Hierarchical Structure in Streams[J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397.
[13] He Q, Chang K, Lim E P.Analyzing Feature Trajectories for Event Detection [C]. In:Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 207-214.
[14] Mathioudakis M, Koudas N.Twittermonitor: Trend Detection over the Twitter Stream[C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 2010: 1155-1158.
[15] Long R, Wang H, Chen Y, et al.Towards Effective Event Detection, Tracking and Summarization on Microblog Data [A]. //Web-Age Information Management[M]. Springer Berlin Heidelberg, 2011: 652-663.
[16] 赵文清, 侯小可. 基于词共现图的中文微博新闻话题识别[J]. 智能系统学报, 2012, 7(5): 444-449.
[16] (Zhao Wenqing, Hou Xiaoke.News Topic Recognition of Chinese Microblog Based on Word Co-occurrence Graph[J]. CAAI Transactions on Intelligent Systems, 2012, 7(5): 444-449.)
[17] Yao J, Cui B, Huang Y, et al.Bursty Event Detection from Collaborative Tags[J]. World Wide Web, 2012, 15(2): 171-195.
[18] 王勇, 肖诗斌, 郭跇秀, 等. 中文微博突发事件检测研究[J]. 现代图书情报技术, 2013(2): 57-62.
[18] (Wang Yong, Xiao Shibin, Guo Yixiu, et al.Research on Chinese Micro-blog Bursty Topics Detection[J]. New Technology of Library and Information Service, 2013 (2): 57-62.)
[19] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490, 505.
[19] (Guo Yixiu, Lv Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490, 505.)
[20] Small T A.What the Hashtag? A Content Analysis of Canadian Politics on Twitter[J]. Information, Communication & Society, 2011, 14(6): 872-895.
[21] 张志瑛. 基于主题模型和社区发现的微博热点事件检测研究[D]. 重庆: 西南大学, 2014.
[21] (Zhang Zhiying.Research on Hot Event Detection in Micro-blog Based on Topic Model and Community Discovery [D]. Chongqing: Southwest University, 2014.)
[22] 国家语言文字工作委员会.《通用规范汉字表》[K]. 2013.08. . (National Languages Committee. The Common Standard Chinese Characters Table [K]. 2013.08.
[23] NLPIR汉语分词系统[CP/OL]. . (NLPIR Chinese Word Segmentation System [CP/OL].
[1] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[3] 程铁军, 王曼, 黄宝凤, 冯兰萍. 基于CEEMDAN-BP模型的突发事件网络舆情预测研究*[J]. 数据分析与知识发现, 2021, 5(11): 59-67.
[4] 席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[5] 邱尔丽,何鸿魏,易成岐,李慧颖. 基于字符级CNN技术的公共政策网民支持度研究 *[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[6] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[7] 邓建高,张璇,傅柱,韦庆明. 基于系统动力学的突发事件网络舆情传播研究:以“江苏响水爆炸事故”为例*[J]. 数据分析与知识发现, 2020, 4(2/3): 110-121.
[8] 梁艳平,安璐,刘静. 同类突发公共卫生事件微博话题共振研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[9] 徐月梅,刘韫文,蔡连侨. 基于深度融合特征的政务微博转发规模预测模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[10] 丁晟春,俞沣洋,李真. 网络舆情潜在热点主题识别研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
[11] 韩康康,徐建民,张彬. 融合用户兴趣和多维信任度的微博推荐*[J]. 数据分析与知识发现, 2020, 4(12): 95-104.
[12] 黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[13] 王晰巍,张柳,黄博,韦雅楠. 基于LDA的微博用户主题图谱构建及实证研究*——以“埃航空难”为例[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[14] 李博诚,张云秋,杨铠西. 面向微博商品评论的情感标签抽取研究 *[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[15] 安璐,梁艳平. 突发公共卫生事件微博话题与用户行为选择研究*[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn