|
|
Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words |
Gao Yongbing1( ), Yang Guipeng1, Zhang Di1, Ma Zhanfei2 |
1School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China 2Department of Computer, Baotou Teachers’ College, Baotou 014010, China; |
|
|
Abstract [Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.
|
Received: 05 April 2017
Published: 18 October 2017
|
|
[1] |
戴天, 吴渝, 雷大江. 利用组合模型生成微博热点话题事件摘要[J]. 计算机应用研究, 2016, 33(7): 2026-2029.
|
[1] |
(Dai Tian, Wu Yu, Lei Dajiang.Hot Topic Summarization on Microblog Generated by Model Combination[J]. Application Research of Computers, 2016, 33(7): 2026-2029.)
|
[2] |
贺敏, 杜攀, 张瑾, 等. 基于动量模型的微博突发话题检测方法[J]. 计算机研究与发展, 2015, 52(5): 1022-1028.
doi: 10.7544/issn1000-1239.2015.20131549
|
[2] |
(He Min, Du Pan, Zhang Jin, et al.Microblog Bursty Topic Detection Method Based on Momentum Model[J]. Journal of Computer Research and Development, 2015, 52(5): 1022-1028.)
doi: 10.7544/issn1000-1239.2015.20131549
|
[3] |
郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490.
doi: 10.11772/j.issn.1001-9081.2014.02.0486
|
[3] |
(Guo Yixiu, Lyu Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
doi: 10.11772/j.issn.1001-9081.2014.02.0486
|
[4] |
童薇, 陈威, 孟小峰. EDM: 高效的微博事件检测算法[J]. 计算机科学与探索, 2012, 6(12): 1076-1086.
doi: 10.3778/j.issn.1673-9418.2012.12.002
|
[4] |
(Tong Wei, Chen Wei, Meng Xiaofeng.EDM: An Efficient Algorithm for Event Detection in Microblogs[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1076-1086.)
doi: 10.3778/j.issn.1673-9418.2012.12.002
|
[5] |
郑斐然, 苗夺谦, 张志飞, 等. 一种中文微博新闻话题检测的方法[J]. 计算机科学, 2012, 39(1): 138-141.
doi: 10.3969/j.issn.1002-137X.2012.01.031
|
[5] |
(Zheng Feiran, Miao Duoqian, Zhang Zhifei, et al.News Topic Detection Approach on Chinese Microblog[J]. Computer Science, 2012, 39(1): 138-141.)
doi: 10.3969/j.issn.1002-137X.2012.01.031
|
[6] |
Gorling R.A Preliminary Study of Tweet Summarization Using Information Extraction[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
|
[7] |
Chakrabarti D, Punera K.Event Summarization Using Tweets[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
|
[8] |
Li C, Sun A, Datta A.Twevent: Segment-based Event Detection from Tweets[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM, 2012: 155-164.
|
[9] |
杨文漪. 面向微博的事件检测算法研究[D]. 北京: 北京邮电大学, 2013.
|
[9] |
(Yang Wenyi.Research on Event Detection Algorithm for Microblog[D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
|
[10] |
宁瑞芳, 欧阳宁, 莫建文. 基于光流法的聚众事件检测[J]. 计算机工程与应用, 2012, 48(3): 198-201.
doi: 10.3778/j.issn.1002-8331.2012.03.059
|
[10] |
(Ning Ruifang, Ouyang Ning, Mo Jianwen.Detection of Gathering Events Based on Optical Flow[J]. Computer Engineering and Applications, 2012, 48(3): 198-201.)
doi: 10.3778/j.issn.1002-8331.2012.03.059
|
[11] |
唐明, 朱磊, 邹显春. 基于Word2Vec的一种文档向量表示[J]. 计算机科学, 2016, 43(6): 214-217.
doi: 10.11896/j.issn.1002-137X.2016.6.043
|
[11] |
(Tang Ming, Zhu Lei, Zou Xianchun.Document Vector Representation Based on Word2Vec[J]. Computer Science, 2016, 43(6): 214-217.)
doi: 10.11896/j.issn.1002-137X.2016.6.043
|
[12] |
Becker H, Naaman M, Gravano L, et al.Selecting Quality Twitter Content for Events[C]//Proceedings of the 15th International AAAI Conference on Weblogs and Social Media.2011.
|
[13] |
Duan Y, Chen Z, Wei F, et al.Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012: 763-780.
|
[14] |
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language.2004: 404-411.
|
[15] |
余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
doi: 10.11896/j.issn.1002-137X.2016.6.048
|
[15] |
(Yu Shanshan, Su Jindian, Li Pengfei.Improved TextRank-based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
doi: 10.11896/j.issn.1002-137X.2016.6.048
|
[16] |
朱征宇, 孙俊华.改进的基于知网的词汇语义相似度计算[J]. 计算机应用, 2013, 33(8): 2276-2279.
doi: 10.11772/j.issn.1001-9081.2013.08.2276
|
[16] |
(Zhu Zhengyu, Sun Junhua.Improved Vocabulary Semantic Similarity Calculation Based on HowNet[J]. Journal of Computer Applications, 2013, 33(8): 2276-2279.)
doi: 10.11772/j.issn.1001-9081.2013.08.2276
|
[17] |
黄承慧, 印鉴, 侯昉. 一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J]. 计算机学报, 2011, 34(5): 856-864.
doi: 10.3724/SP.J.1016.2011.00856
|
[17] |
(Huang Chenghui, Yin Jian, Hou Fang.A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J]. Chinese Journal of Computers, 2011, 34(5): 856-864.)
doi: 10.3724/SP.J.1016.2011.00856
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|