Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (7): 44-51     https://doi.org/10.11925/infotech.2096-3467.2017.0479
  首届"数据分析与知识发现"学术研讨会专辑(I) 本期目录 | 过刊浏览 | 高级检索 |
基于情感分析的网络谣言识别方法*
首欢容, 邓淑卿, 徐健()
中山大学资讯管理学院 广州 510006
Detecting Online Rumors with Sentiment Analysis
Shou Huanrong, Deng Shuqing, Xu Jian()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
全文: PDF (704 KB)   HTML ( 6
输出: BibTeX | EndNote (RIS)      
摘要 

目的】提出一种基于情感分析技术自动识别特定领域谣言的方法。【方法】界定高、低质量信息源, 在假设高质量信息源信息更可靠的情况下, 通过基于情感词典的情感分析方法, 量化高质量信息源与低质量信息源对特定对象的情感差异, 判定低质量信息源提供的信息是否属于谣言。【结果】将该方法应用于“食品养生”、“医学健康”两个领域进行谣言识别。在30个疑似谣言案例中准确识别出23个谣言案例, 准确率为76.67%。本文提出的谣言识别方法在谣言预测方面的F值为83.34%, 查全率为71.42%, 查准率为100%; 在非谣言文本预测上的F值为72.73%, 查全率为100%, 查准率为57.14%。【局限】未实现不同信息源数据自动抽取, 每个谣言案例下的人工收集的谣言数量有限。【结论】本文基于情感分析的谣言识别方法对特定类型的谣言是有效的。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
首欢容
邓淑卿
徐健
关键词 情感分析情感词典谣言检测谣言识别    
Abstract

[Objective] This paper aims to identify rumors automatically with the help of sentiment analysis. [Methods] First, we chose high-quality and low-quality information sources. Then, we calculated the sentiment value and difference between the information from different sources. Based on the assumption that the information from high-quality source was more reliable, information from low-quality channels could be listed as rumor if the sentiment difference between them exceeded the pre-set threshold. [Results] We applied the proposed method to information on food and health as well as health and medical issues, and then successfully identified twenty-three rumors from thirty suspected cases. The accuracy rate of rumor detection was 76.67%, the F-value was 83.34%, the recall and precision was 71.42% and 100%, respectively. For non-rumor message, the F-value, recall, and precision were 72.73%, 100% and 57.14%. [Limitations] We did not extract the data automatically from different sources and the sample size was relatively small. [Conclusions] Sentiment analysis could help us identify rumors effectively.

Key wordsSentiment Analysis    Sentiment Lexicon    Rumor Identification    Rumor Detection
收稿日期: 2017-05-27      出版日期: 2017-07-12
ZTFLH:  G350  
基金资助:*本文系国家社会科学基金项目“用户评论情感分析及其在竞争情报服务中的应用研究”(项目编号: 11CTQ022)和广东省科技专项“基于内容的科技文献分析服务平台”(项目编号: 2016B030303003)的研究成果之一
引用本文:   
首欢容, 邓淑卿, 徐健. 基于情感分析的网络谣言识别方法*[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
Shou Huanrong,Deng Shuqing,Xu Jian. Detecting Online Rumors with Sentiment Analysis. Data Analysis and Knowledge Discovery, 2017, 1(7): 44-51.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.0479      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I7/44
  基于情感分析技术的谣言识别方法的总体框架
  疑似谣言文本的情感值计算流程
实际是谣言 实际不是谣言
预测是谣言 15 0
预测不是谣言 6 8
  谣言识别结果
实际是谣言 实际不是谣言
预测是谣言 A B
预测不是谣言 C D
  谣言检测分类性能评价列表
指标
Pr 100%
Rr 71.42%
F1r 83.34%
Pn 57.14%
Rn 100%
F1n 72.73%
  谣言检测分类性能评价结果
[1] 郭小安. 当代中国网络谣言的社会心理研究[M]. 北京: 中国社会科学出版社, 2015.
[1] (Guo Xiaoan.A Study on the Social Psychology of Contemporary Chinese Rumors Online[M]. Beijing: China Social Sciences Press, 2015.)
[2] 中国社会科学语言研究所词典编辑室. 现代汉语词典[M].第5版. 北京: 商务印书馆, 2005.
[2] (Dictionary of Chinese Social Sciences Language Research Institute.The Modern Chinese Dictionary [M]. The 5th Edition. Beijing: The Commercial Press, 2005.)
[3] 沙莲香. 社会心理学[M]. 第2版. 北京: 中国人民大学出版社, 2006.
[3] (Sha Lianxiang.Social Psychology[M]. The 2nd Edition. Beijing: China Renmin University Press, 2016.)
[4] Zhang L, Liu B.Sentiment Analysis and Opinion Mining[J]. Synthesis Lectures on Human Language Technologies, 2012, 30(1): 152-153.
[5] 杨立公, 朱俭, 汤世平.文本情感分析综述[J]. 计算机应用, 2013, 33(6): 1574-1578, 1607.
doi: 10.3724/SP.J.1087.2013.01574
[5] (Yang Ligong, Zhu Jian, Tang Shiping.Survey of Text Sentiment Analysis[J]. Journal of Computer Applications, 2013, 33(6): 1574-1578, 1607.)
doi: 10.3724/SP.J.1087.2013.01574
[6] Tong R M.An Operational System for Detecting and Tracking Opinions in Online Discussion[C]// Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001.
[7] 徐琳宏, 林鸿飞, 潘宇, 等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
doi: 10.3969/j.issn.1000-0135.2008.02.004
[7] (Xu Linhong, Lin Hongfei, Pan Yu, et al.Constructing the Affective Lexicon Ontology[J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 180-185.)
doi: 10.3969/j.issn.1000-0135.2008.02.004
[8] HowNet [EB/OL]. [2017-02-01].
[9] 陈晓东. 基于情感词典的中文微博情感倾向分析研究[D].武汉: 华中科技大学, 2012.
[9] (Chen Xiaodong.Research on Sentiment Dictionary Based Emotional Tendency Analysis of Chinese MicroBlog[D]. Wuhan: Huazhong University of Science & Technology, 2012.)
[10] 肖璐, 陈果, 刘继云. 基于情感分析的企业产品级竞争对手识别研究——以用户评论为数据源[J]. 图书情报工作, 2016, 60(1): 83-90, 97.
doi: 10.13266/j.issn.0252-3116.2016.01.012
[10] (Xiao Lu, Chen Guo, Liu Jiyun.Study on Identification of Enterprise Product Level Competitor Based on Sentiment Analysis: Taking User Reviews for Data Resources[J]. Library and Information Service, 2016, 60(1): 83-90, 97.)
doi: 10.13266/j.issn.0252-3116.2016.01.012
[11] Litton I. # TwitterCritic: Sentiment Analysis of Tweets to Predict TV Ratings [EB/OL]. [2017-03-05].
[12] Nguyen T H, Shirai K.Topic Modeling Based Sentiment Analysis on Social Media for Stock Market Prediction[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 1354-1364.
[13] 毛二松, 陈刚, 刘欣, 等. 基于深层特征和集成分类器的微博谣言检测研究[J]. 计算机应用研究, 2016, 33(11): 3369-3373.
doi: 10.3969/j.issn.1001--3695.2016.11.037
[13] (Mao Ersong, Chen Gang, Liu Xin, et al.Research on Detecting Micro-blog Rumors Based on Deep Features and Ensemble Classifier[J]. Application Research of Computers, 2016, 33(11): 3369-3373.)
doi: 10.3969/j.issn.1001--3695.2016.11.037
[14] Qazvinian V, Rosengren E, Radev D R, et al.Rumor Has It: Identifying Misinformation in Microblogs[C]// Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2012: 1589-1599.
[15] Kwon S, Cha M, Jung K.Rumor Detection over Varying Time Windows[J]. PLoS One, 2017, 12(1): e0168344.
doi: 10.1371/journal.pone.0168344 pmid: 28081135
[16] 张志安, 束开荣, 何凌南. 微信谣言的主题与特征[J]. 新闻与写作, 2016(1): 60-64.
[16] (Zhang Zhian, Shu Kairong, He Lingnan.The Topics and Features About Rumors on the WeChat[J]. News and Writing, 2016(1): 60-64.)
[17] 马费成, 宋恩梅. 信息管理学基础[M]. 第2版. 武汉: 武汉大学出版社, 2011: 136-142.
[17] (Ma Feicheng, Song Enmei.Principles of Information Management [M]. The 2nd Edition. Wuhan: Wuhan University Press, 2011: 136-142.)
[18] 杜嘉忠, 徐健, 刘颖. 网络商品评论的特征-情感词本体构建与情感分析方法研究[J]. 现代图书情报技术, 2014(5): 74-82.
[18] (Du Jiazhong, Xu Jian, Liu Ying.Research on Construction of Feature-Sentiment Ontology and Sentiment Analysis[J]. New Technology of Library and Information Service, 2014(5): 74-82.)
[19] 维基百科[EB/OL]. [2017-03-26].
[19] (Wikipedia[EB/OL].[2017-03-26].)
[20] 知乎[EB/OL]. [2017-03-26].
[20] (Zhihu [EB/OL]. [2017-03-26].)
[21] 果壳网[EB/OL]. [2017-03-26].
[21] (Guokr [EB/OL]. [2017-03-26].
[22] Giles J.Internet Encyclopedias Go Head to Head[J]. Nature, 2005, 138(15): 900-901.
[23] 百度[EB/OL]. [2017-03-26].
[23] (Baidu [EB/OL]. [2017-03-26].)
[24] 搜狗[EB/OL]. [2017-03-26].
[24] (Sougou [EB/OL]. [2017-03-26].)
[25] 流言百科[EB/OL]. [2017-03-26].
[25] (Liuyanbaike [EB/OL]. [2017-03-26].
[26] 百度文库. 疾病名称大全[EB/OL]. [2017-03-26].
[26] (Baidu Wenku.Full Listing of Disease Name [EB/OL]. [2017-03-26].)
[27] 吴江, 唐常杰, 李太勇, 等. 基于语义规则的Web金融文本情感分析[J]. 计算机应用, 2014, 34(2): 481-485.
[27] (Wu Jiang, Tang Changjie, Li Taiyong, et al.Sentiment Analysis on Web Financial Text Based on Semantic Rules[J]. Journal of Computer Applications, 2014, 34(2): 481-485.)
[1] 徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 *[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[2] 姜霖,张麒麟. 基于引文细粒度情感量化的学术评价研究*[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[3] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[4] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[5] 沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[6] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[7] 谭荧,张进,夏立新. 社交媒体情境下的情感分析研究综述[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[8] 聂卉,何欢. 引入词向量的隐性特征识别研究*[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[9] 岑咏华,谭志浩,吴承尧. 财经媒介信息对股票市场的影响研究: 基于情感分析的实证 *[J]. 数据分析与知识发现, 2019, 3(9): 98-114.
[10] 卢伟聪,徐健. 基于三分网络的网络用户评论情感分析 *[J]. 数据分析与知识发现, 2019, 3(8): 10-20.
[11] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[12] 蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[13] 刘勘,杜好宸. 基于深度迁移网络的Twitter谣言检测研究 *[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
[14] 余本功,张培行,许庆堂. 基于F-BiGRU情感分析的产品选择方法*[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[15] 曾子明,杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析*[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn