Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (9): 57-64    DOI: 10.11925/infotech.2096-3467.2017.09.06
Orginal Article Current Issue | Archive | Adv Search |
Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words
Yongbing Gao1(),Guipeng Yang1,Di Zhang1,Zhanfei Ma2
1School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
2Department of Computer, Baotou Teachers’ College, Baotou 014010, China;
Download: PDF(961 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.

Key wordsOfficial Micro-blog      Related Words      Burst Words      Official Microblog Events      Word2Vec     
Received: 05 April 2017      Published: 18 October 2017

Cite this article:

Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words. Data Analysis and Knowledge Discovery, 2017, 1(9): 57-64.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.09.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I9/57

[1] 戴天, 吴渝, 雷大江. 利用组合模型生成微博热点话题事件摘要[J]. 计算机应用研究, 2016, 33(7): 2026-2029.
[1] (Dai Tian, Wu Yu, Lei Dajiang.Hot Topic Summarization on Microblog Generated by Model Combination[J]. Application Research of Computers, 2016, 33(7): 2026-2029.)
[2] 贺敏, 杜攀, 张瑾, 等. 基于动量模型的微博突发话题检测方法[J]. 计算机研究与发展, 2015, 52(5): 1022-1028.
[2] (He Min, Du Pan, Zhang Jin, et al.Microblog Bursty Topic Detection Method Based on Momentum Model[J]. Journal of Computer Research and Development, 2015, 52(5): 1022-1028.)
[3] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490.
[3] (Guo Yixiu, Lyu Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
[4] 童薇, 陈威, 孟小峰. EDM: 高效的微博事件检测算法[J]. 计算机科学与探索, 2012, 6(12): 1076-1086.
[4] (Tong Wei, Chen Wei, Meng Xiaofeng.EDM: An Efficient Algorithm for Event Detection in Microblogs[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1076-1086.)
[5] 郑斐然, 苗夺谦, 张志飞, 等. 一种中文微博新闻话题检测的方法[J]. 计算机科学, 2012, 39(1): 138-141.
[5] (Zheng Feiran, Miao Duoqian, Zhang Zhifei, et al.News Topic Detection Approach on Chinese Microblog[J]. Computer Science, 2012, 39(1): 138-141.)
[6] Gorling R.A Preliminary Study of Tweet Summarization Using Information Extraction[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
[7] Chakrabarti D, Punera K.Event Summarization Using Tweets[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[8] Li C, Sun A, Datta A.Twevent: Segment-based Event Detection from Tweets[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM, 2012: 155-164.
[9] 杨文漪. 面向微博的事件检测算法研究[D]. 北京: 北京邮电大学, 2013.
[9] (Yang Wenyi.Research on Event Detection Algorithm for Microblog[D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
[10] 宁瑞芳, 欧阳宁, 莫建文. 基于光流法的聚众事件检测[J]. 计算机工程与应用, 2012, 48(3): 198-201.
[10] (Ning Ruifang, Ouyang Ning, Mo Jianwen.Detection of Gathering Events Based on Optical Flow[J]. Computer Engineering and Applications, 2012, 48(3): 198-201.)
[11] 唐明, 朱磊, 邹显春. 基于Word2Vec的一种文档向量表示[J]. 计算机科学, 2016, 43(6): 214-217.
[11] (Tang Ming, Zhu Lei, Zou Xianchun.Document Vector Representation Based on Word2Vec[J]. Computer Science, 2016, 43(6): 214-217.)
[12] Becker H, Naaman M, Gravano L, et al.Selecting Quality Twitter Content for Events[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[13] Duan Y, Chen Z, Wei F, et al.Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012: 763-780.
[14] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language.2004: 404-411.
[15] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
[15] (Yu Shanshan, Su Jindian, Li Pengfei.Improved TextRank-based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
[16] 朱征宇, 孙俊华.改进的基于知网的词汇语义相似度计算[J]. 计算机应用, 2013, 33(8): 2276-2279.
[16] (Zhu Zhengyu, Sun Junhua.Improved Vocabulary Semantic Similarity Calculation Based on HowNet[J]. Journal of Computer Applications, 2013, 33(8): 2276-2279.)
[17] 黄承慧, 印鉴, 侯昉. 一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J]. 计算机学报, 2011, 34(5): 856-864.
[17] (Huang Chenghui, Yin Jian, Hou Fang.A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J]. Chinese Journal of Computers, 2011, 34(5): 856-864.)
[1] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[2] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[3] Qin Zhang,Hongmei Guo,Zhixiong Zhang. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[4] Tian Xia. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[5] Ruilun Liu,Wenhao Ye,Ruiqing Gao,Mengjia Tang,Dongbo Wang. Research on Text Clustering Based on Requirements of Big Data Jobs[J]. 数据分析与知识发现, 2017, 1(12): 32-40.
[6] Luo Wenxin,Chen Chong,Deng Siyi. Detecting Disease Associations with Word2Vec from Consumer Health Information[J]. 现代图书情报技术, 2016, 32(9): 78-87.
[7] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[8] Hong Na, Zhang Zhixiong, Le Xiaoqiu. Detection Method of Latent Burst Word Based on the Clue of Energy Evolution[J]. 现代图书情报技术, 2010, 26(11): 45-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn