Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (2): 57-62     https://doi.org/10.11925/infotech.1003-3513.2013.02.09
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
中文微博突发事件检测研究
王勇1, 肖诗斌1,2, 郭跇秀1, 吕学强1,2
1. 北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101;
2. 北京拓尔思信息技术股份有限公司 北京 100101
Research on Chinese Micro-blog Bursty Topics Detection
Wang Yong1, Xiao Shibin1,2, Guo Yixiu1, Lv Xueqiang1,2
1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
全文: PDF (502 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 从微博中准确而高效地挖掘出突发事件是近年来的研究热点。通过词频统计、词增长率计算和TF-PDF算法抽取突发词集,使用突发词表示文本并结合微博突发事件的描述特征进行文本过滤;提出一种“绝对聚类”算法,对描述突发事件的文本进行聚类,并通过微博的回复数和转发数加权计算热度,检测各类事件中热度最大的作为突发事件。检测准确率为92.60%,召回率为85.51%,F值为0.89。实验结果表明,相比于传统的突发事件检测方法,该方法能够比较准确地检测到微博中的突发事件,有一定的应用价值。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王勇
肖诗斌
郭跇秀
吕学强
关键词 突发事件突发词文本过滤绝对聚类    
Abstract:Much attention is paid to mining bursty topics accurately and efficiently from micro-blog nowadays. In this paper, a set of burst terms are extracted by counting the term frequency, calculating the growth rate of the terms and using Term Frequency-Proportional Document Frequency (TF-PDF) algorithm to measure the weight. And then micro-blog texts are described with the burst terms. Analyzing the characteristic that bursty topics propagate in the platform of micro-blog, the authors filter the texts that do not contribute to detect bursty topics. The paper proposes a novel clustering strategy of “Absolute Clustering” to cluster the micro-blog texts. By figuring up the hot spot of the texts with weighted value of reply and retweet number, the top 5 texts are extracted as the result of burst topics detection. The experiments show that the precision is 92.60%, the recall is 85.51% and the F-measure is 0.89. Contrast with the traditional method, the validity of the proposed method is proved.
Key wordsBursty topics    Burst terms    Filter    Absolute clustering
收稿日期: 2013-01-18      出版日期: 2013-04-24
:  TP311.6  
基金资助:本文系国家自然科学基金项目“基于本体的专利自动标引研究”(项目编号:61271304)、国家自然科学基金项目“网页内容真实性评价研究”(项目编号:61171159)、北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目“面向领域的互联网多模态信息精准搜索方法研究”(项目编号:KZ201311232037)和国家科技支撑计划课题“增强型搜索引擎关键技术研究与示范”(项目编号:2011BAH11B03)的研究成果之一。
通讯作者: 王勇,wy514674793@126.com     E-mail: wy514674793@126.com
引用本文:   
王勇, 肖诗斌, 郭跇秀, 吕学强. 中文微博突发事件检测研究[J]. 现代图书情报技术, 2013, 29(2): 57-62.
Wang Yong, Xiao Shibin, Guo Yixiu, Lv Xueqiang. Research on Chinese Micro-blog Bursty Topics Detection. New Technology of Library and Information Service, 2013, 29(2): 57-62.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.02.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2013/V29/I2/57
[1] 中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].北京:中国互联网络信息中心,2012.(China Internet Network Information Center. The 30th Statistical Report of China Internet Development[R]. Beijing:CNNIC, 2012.)
[2] 原福永,冯静,符茜茜.微博用户的影响力指数模型[J].现代图书情报技术,2012(6):60-64.(Yuan Fuyong, Feng Jing, Fu Qianqian. Influence Index Model of Micro-blog User[J]. New Technology of Library and Information Service, 2012(6):60-64.)
[3] Diao Q M, Jiang J, Zhu F D. Finding Bursty Topics from Microblogs[C].In: Proceedings of ACL, 2012:536-544.
[4] Wang X H, Zhai C X, Hu X,et al. Mining Correlated Bursty Topics Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'07), California, USA. New York, NY, USA:ACM,2007:784-793.
[5] Du Y Y, He Y X, Tian Y,et al. Microblog Bursty Topic Detection Based on User Relationship[C]. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 2011:260-263.
[6] Du Y Y, Wu W, He Y X,et al. Microblog Bursty Feature Detection Based on Dynamics Model[C]. In: Proceedings of the International Conference on Systems and Informatics(ICSAI). 2012:2304-2308.
[7] Fung G P C, Yu J X, Yu P S,et al. Parameter Free Bursty Events Detection in Text Streams[C].In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005:181-192.
[8] Erdmann M, Nakayama K, Hara T,et al. Improving the Extraction of Bilingual Terminology from Wikipedia[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2009, 5(4):1-17.
[9] Bollegala D, Matsuo Y, Ishizuka M. Measuring the Similarity Between Implicit Semantic Relation Using Web Search Engines[C].In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining(WSDM'09). New York, NY, USA: ACM, 2009:104-113.
[10] 李海芳,史俊冰,段利国,等.一种基于含糊同义词的查询扩展方法[J].计算机应用与软件,2011, 28(12):439-443.(Li Haifang, Shi Junbing, Duan Liguo, et.al. A Query Expansion Method Based on Vague Synonyms[J]. Computer Application and Software, 2011, 28(12):439-443.)
[11] 赵辉,刘怀亮,范云杰,等.一种基于语义的中文文本分类算法[J].情报理论与实践,2012, 35(3):115-118.(Zhao Hui, Liu Huailiang, Fan Yunjie, et.al. A Chinese Text Classfication Algorithm Based on Semantics[J]. Information Studies:Theory & Application, 2012, 35(3):115-118.)
[12] Blei D M , Ng A Y , Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Nallapati R, Cohen W. Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence in Blogs[C].In: Proceedings of the International Conference for Weblogs and Social Media. 2008:84-92.
[14] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87.(Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6):71-87.)
[15] Bun K K,Ishizuka M. Topic Extraction from News Archive Using TF*PDF Algorithm[C]. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering.2002:73-82.
[16] 百度百科.新闻五要素[EB/OL].[2013-01-03].http://baike.baidu.com/view/754050.htm.(Baidu Baike. The Five Elements of News[EB/OL].[2013-01-03]. http://baike.baidu.com/view/754050.htm.)
[1] 程铁军, 王曼, 黄宝凤, 冯兰萍. 基于CEEMDAN-BP模型的突发事件网络舆情预测研究*[J]. 数据分析与知识发现, 2021, 5(11): 59-67.
[2] 尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[3] 邓建高,张璇,傅柱,韦庆明. 基于系统动力学的突发事件网络舆情传播研究:以“江苏响水爆炸事故”为例*[J]. 数据分析与知识发现, 2020, 4(2/3): 110-121.
[4] 梁艳平,安璐,刘静. 同类突发公共卫生事件微博话题共振研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[5] 胡哲,查先进,严亚兰. 突发事件情境下在线健康社区用户交互行为研究 *[J]. 数据分析与知识发现, 2019, 3(12): 10-20.
[6] 李纲,陈思菁,毛进,谷岩松. 自然灾害事件微博热点话题的时空对比分析 *[J]. 数据分析与知识发现, 2019, 3(11): 1-15.
[7] 丁晟春,龚思兰,李红梅. 基于突发主题词和凝聚式层次聚类的微博突发事件检测研究*[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
[8] 吴鹏,金贝贝,强韶华. 基于BDI-Agent模型的突发事件网络舆情应急响应建模研究*[J]. 现代图书情报技术, 2016, 32(7-8): 32-41.
[9] 吴鹏, 杨爽, 张晶晶, 高庆宁. 突发事件网络舆情中网民群体行为演化的Agent建模与仿真研究[J]. 现代图书情报技术, 2015, 31(7-8): 65-72.
[10] 卓可秋, 虞为, 苏新宁. 突发事件检测的MapReduce并行化实现[J]. 现代图书情报技术, 2015, 31(2): 46-54.
[11] 强韶华, 吴鹏. 突发事件网络舆情演变过程中网民群体行为仿真研究[J]. 现代图书情报技术, 2014, 30(6): 71-78.
[12] 夏彦, 何琳, 潘运来, 欧阳辰晨. 基于规则与统计相结合的互联网突发事件识别研究[J]. 现代图书情报技术, 2010, 26(10): 65-69.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn