Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words
Gao Yongbing1(), Yang Guipeng1, Zhang Di1, Ma Zhanfei2
1School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China 2Department of Computer, Baotou Teachers’ College, Baotou 014010, China;
[Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.
高永兵, 杨贵朋, 张娣, 马占飞. 基于突显词博文聚类的官微事件检测方法*[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words. Data Analysis and Knowledge Discovery, 2017, 1(9): 57-64.
(Dai Tian, Wu Yu, Lei Dajiang.Hot Topic Summarization on Microblog Generated by Model Combination[J]. Application Research of Computers, 2016, 33(7): 2026-2029.)
(He Min, Du Pan, Zhang Jin, et al.Microblog Bursty Topic Detection Method Based on Momentum Model[J]. Journal of Computer Research and Development, 2015, 52(5): 1022-1028.)
doi: 10.7544/issn1000-1239.2015.20131549
(Guo Yixiu, Lyu Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
doi: 10.11772/j.issn.1001-9081.2014.02.0486
(Tong Wei, Chen Wei, Meng Xiaofeng.EDM: An Efficient Algorithm for Event Detection in Microblogs[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1076-1086.)
doi: 10.3778/j.issn.1673-9418.2012.12.002
(Zheng Feiran, Miao Duoqian, Zhang Zhifei, et al.News Topic Detection Approach on Chinese Microblog[J]. Computer Science, 2012, 39(1): 138-141.)
doi: 10.3969/j.issn.1002-137X.2012.01.031
[6]
Gorling R.A Preliminary Study of Tweet Summarization Using Information Extraction[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
[7]
Chakrabarti D, Punera K.Event Summarization Using Tweets[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[8]
Li C, Sun A, Datta A.Twevent: Segment-based Event Detection from Tweets[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM, 2012: 155-164.
[9]
杨文漪. 面向微博的事件检测算法研究[D]. 北京: 北京邮电大学, 2013.
[9]
(Yang Wenyi.Research on Event Detection Algorithm for Microblog[D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
(Ning Ruifang, Ouyang Ning, Mo Jianwen.Detection of Gathering Events Based on Optical Flow[J]. Computer Engineering and Applications, 2012, 48(3): 198-201.)
doi: 10.3778/j.issn.1002-8331.2012.03.059
(Tang Ming, Zhu Lei, Zou Xianchun.Document Vector Representation Based on Word2Vec[J]. Computer Science, 2016, 43(6): 214-217.)
doi: 10.11896/j.issn.1002-137X.2016.6.043
[12]
Becker H, Naaman M, Gravano L, et al.Selecting Quality Twitter Content for Events[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
[13]
Duan Y, Chen Z, Wei F, et al.Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012: 763-780.
[14]
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language.2004: 404-411.
(Zhu Zhengyu, Sun Junhua.Improved Vocabulary Semantic Similarity Calculation Based on HowNet[J]. Journal of Computer Applications, 2013, 33(8): 2276-2279.)
doi: 10.11772/j.issn.1001-9081.2013.08.2276
(Huang Chenghui, Yin Jian, Hou Fang.A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J]. Chinese Journal of Computers, 2011, 34(5): 856-864.)
doi: 10.3724/SP.J.1016.2011.00856