Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (9): 57-64    DOI: 10.11925/infotech.2096-3467.2017.09.06
Orginal Article Current Issue | Archive | Adv Search |
Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words
Gao Yongbing1(), Yang Guipeng1, Zhang Di1, Ma Zhanfei2
1School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
2Department of Computer, Baotou Teachers’ College, Baotou 014010, China;
Download: PDF (961 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.

Key wordsOfficial Micro-blog      Related Words      Burst Words      Official Microblog Events      Word2Vec     
Received: 05 April 2017      Published: 18 October 2017
ZTFLH:  TP391 G35  

Cite this article:

Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words. Data Analysis and Knowledge Discovery, 2017, 1(9): 57-64.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.09.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I9/57

相关词 相关度权值 相关词 相关度权值
北大 0.511464 北京大学第三医院 0.418483
许智宏 0.483764 荣获 0.418440
清华大学 0.470910 生命科学 0.416257
招生办 0.470327 大讲堂 0.416236
深圳 0.468520 展开 0.414171
携手 0.466221 院长 0.411467
揭晓 0.461243 来访 0.409098
代表团 0.451333 团委 0.408964
天文 0.450333 北京大学法学院 0.405279
电视台 0.447696 研究院 0.404885
第一届 0.442393 泰王国 0.404662
代表队 0.440339 物理 0.404421
孔庆东 0.433270 邓宏魁 0.400969
研究生会 0.431442 空间科学 0.398369
6月 0.421351 博雅 0.398264
研究生院 0.420898 学生会 0.397033
官微事件 事件类描述 突显词博文聚类 日期
屠呦呦获诺奖, 北大师生表示祝贺 屠呦呦 校友诺奖 林建华 校长 北大 医学部 席谈 【林建华校长看望诺贝尔奖获得者屠呦呦校友】10月6日下午, 2015年诺贝尔生理学或医学奖获得者、北京大学校友屠呦呦的家里暖意融融。北大校长林建华一行向屠呦呦校友表示祝贺... 2015-10-7 11: 44: 35
【踏实做事 献身科学——屠呦呦校友获诺奖后医学部师生一席谈】在校友屠呦呦获得诺贝尔奖后, 北大医学部... 2015-10-17 13: 12: 17
空间科学院教授获国家技术发明奖 北大 国家 技术奖 2015 空间科学院 晏磊 #北大新闻#【简讯: 北京大学13项成果喜获2015年度国家科学技术奖】1月8日上午。人民大会堂举行2015年度国家科学技术奖励大会。北京大学... 2016-1-8 18: 28: 49
#科研动态#【地球与空间科学学院晏磊教授获国家技术发明奖二等奖】1月8日, 中共中央、国务院在人民大会堂举行2015年度国家科学技术奖励大会... 2016-1-16 10: 30: 03
总理来访北大 总理 北京大学 朗润园 智库 林建华 校长 母校 光华管理 农园食堂 #总理来啦# 第一站, 克强总理来到位于朗润园的国家发展研究院, 了解北京大学智库建设以及国家发展研究院的发展情况。北京大学校长林建华... 2016-4-15 15: 48: 00
#总理来啦# 第三站, 克强总理来到本科期间(1978-1982年)就读的法学院...光华管理学院的同学们热烈欢迎总理回到母校, 总理与同学们合影留念。 2016-4-15 16: 30: 29
#总理来啦# 夜幕渐渐降临, 克强总理一行来到北京大学农园食堂...克强总理在同学们的簇拥下走出农园食堂... 2016-4-15 20: 09: 01
[1] 戴天, 吴渝, 雷大江. 利用组合模型生成微博热点话题事件摘要[J]. 计算机应用研究, 2016, 33(7): 2026-2029.
[1] (Dai Tian, Wu Yu, Lei Dajiang.Hot Topic Summarization on Microblog Generated by Model Combination[J]. Application Research of Computers, 2016, 33(7): 2026-2029.)
[2] 贺敏, 杜攀, 张瑾, 等. 基于动量模型的微博突发话题检测方法[J]. 计算机研究与发展, 2015, 52(5): 1022-1028.
doi: 10.7544/issn1000-1239.2015.20131549
[2] (He Min, Du Pan, Zhang Jin, et al.Microblog Bursty Topic Detection Method Based on Momentum Model[J]. Journal of Computer Research and Development, 2015, 52(5): 1022-1028.)
doi: 10.7544/issn1000-1239.2015.20131549
[3] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490.
doi: 10.11772/j.issn.1001-9081.2014.02.0486
[3] (Guo Yixiu, Lyu Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
doi: 10.11772/j.issn.1001-9081.2014.02.0486
[4] 童薇, 陈威, 孟小峰. EDM: 高效的微博事件检测算法[J]. 计算机科学与探索, 2012, 6(12): 1076-1086.
doi: 10.3778/j.issn.1673-9418.2012.12.002
[4] (Tong Wei, Chen Wei, Meng Xiaofeng.EDM: An Efficient Algorithm for Event Detection in Microblogs[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1076-1086.)
doi: 10.3778/j.issn.1673-9418.2012.12.002
[5] 郑斐然, 苗夺谦, 张志飞, 等. 一种中文微博新闻话题检测的方法[J]. 计算机科学, 2012, 39(1): 138-141.
doi: 10.3969/j.issn.1002-137X.2012.01.031
[5] (Zheng Feiran, Miao Duoqian, Zhang Zhifei, et al.News Topic Detection Approach on Chinese Microblog[J]. Computer Science, 2012, 39(1): 138-141.)
doi: 10.3969/j.issn.1002-137X.2012.01.031
[6] Gorling R.A Preliminary Study of Tweet Summarization Using Information Extraction[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
[7] Chakrabarti D, Punera K.Event Summarization Using Tweets[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[8] Li C, Sun A, Datta A.Twevent: Segment-based Event Detection from Tweets[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM, 2012: 155-164.
[9] 杨文漪. 面向微博的事件检测算法研究[D]. 北京: 北京邮电大学, 2013.
[9] (Yang Wenyi.Research on Event Detection Algorithm for Microblog[D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
[10] 宁瑞芳, 欧阳宁, 莫建文. 基于光流法的聚众事件检测[J]. 计算机工程与应用, 2012, 48(3): 198-201.
doi: 10.3778/j.issn.1002-8331.2012.03.059
[10] (Ning Ruifang, Ouyang Ning, Mo Jianwen.Detection of Gathering Events Based on Optical Flow[J]. Computer Engineering and Applications, 2012, 48(3): 198-201.)
doi: 10.3778/j.issn.1002-8331.2012.03.059
[11] 唐明, 朱磊, 邹显春. 基于Word2Vec的一种文档向量表示[J]. 计算机科学, 2016, 43(6): 214-217.
doi: 10.11896/j.issn.1002-137X.2016.6.043
[11] (Tang Ming, Zhu Lei, Zou Xianchun.Document Vector Representation Based on Word2Vec[J]. Computer Science, 2016, 43(6): 214-217.)
doi: 10.11896/j.issn.1002-137X.2016.6.043
[12] Becker H, Naaman M, Gravano L, et al.Selecting Quality Twitter Content for Events[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[13] Duan Y, Chen Z, Wei F, et al.Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012: 763-780.
[14] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language.2004: 404-411.
[15] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
doi: 10.11896/j.issn.1002-137X.2016.6.048
[15] (Yu Shanshan, Su Jindian, Li Pengfei.Improved TextRank-based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
doi: 10.11896/j.issn.1002-137X.2016.6.048
[16] 朱征宇, 孙俊华.改进的基于知网的词汇语义相似度计算[J]. 计算机应用, 2013, 33(8): 2276-2279.
doi: 10.11772/j.issn.1001-9081.2013.08.2276
[16] (Zhu Zhengyu, Sun Junhua.Improved Vocabulary Semantic Similarity Calculation Based on HowNet[J]. Journal of Computer Applications, 2013, 33(8): 2276-2279.)
doi: 10.11772/j.issn.1001-9081.2013.08.2276
[17] 黄承慧, 印鉴, 侯昉. 一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J]. 计算机学报, 2011, 34(5): 856-864.
doi: 10.3724/SP.J.1016.2011.00856
[17] (Huang Chenghui, Yin Jian, Hou Fang.A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J]. Chinese Journal of Computers, 2011, 34(5): 856-864.)
doi: 10.3724/SP.J.1016.2011.00856
[1] Li Yueyan,Xiong Huixiang,Li Xiaomin. Recommending Doctors Online Based on Combined Conditions[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[2] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[3] Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[4] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[5] Tao Xing,Zhang Xiangxian,Guo Shunli,Zhang Liman. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
[6] Ye Jiaxin,Xiong Huixiang,Jiang Wuxuan. A Physician Recommendation Algorithm Integrating Inquiries and Decisions of Patients[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[7] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[8] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[9] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[10] Li Xinlei,Wang Hao,Liu Xiaomin,Deng Sanhong. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[11] Zhang Qin,Guo Hongmei,Zhang Zhixiong. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[12] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[13] Liu Ruilun,Ye Wenhao,Gao Ruiqing,Tang Mengjia,Wang Dongbo. Research on Text Clustering Based on Requirements of Big Data Jobs[J]. 数据分析与知识发现, 2017, 1(12): 32-40.
[14] Luo Wenxin,Chen Chong,Deng Siyi. Detecting Disease Associations with Word2Vec from Consumer Health Information[J]. 现代图书情报技术, 2016, 32(9): 78-87.
[15] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn