Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (3): 121-131     https://doi.org/10.11925/infotech.2096-3467.2020.0743
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习方法对特定群体推特的动态政治情感极性分析*
常城扬1,王晓东1(),张胜磊2
1中国人民解放军国防科技大学计算机学院 长沙 410073
2战略支援部队航天系统部参谋部 北京 100094
Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method
Chang Chengyang1,Wang Xiaodong1(),Zhang Shenglei2
1College of Computer Science, National University of Defence Technology, Changsha 410073, China
2Staff of the Space Systems Department, Strategic Support Force, Beijing 100094, China
全文: PDF (915 KB)   HTML ( 23
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 根据美国政客在特定时间段内的推特文本数据分析其动态的政治情感极性变化,辅助情报分析人员判断美国政治走向和中美关系未来走势。【方法】 提出一种架构,结合多种深度学习模型,构建特定群体的专属推文数据集,得到情感极性多分类器,然后引入推文的时间特征,最终得到政客动态政治情感极性。【结果】 构建的美国政客推文数据集验证所提出的综合架构在此任务中的有效性,分类器验证集准确率达到80.66%,准确率相比传统人工神经网络方法提高8.07%。针对20名美国州长、参议员的情感极性判断,成功率为75%。针对个体的动态政治情感极性分析,可以为分析人员提供有效的帮助和情报支撑。【局限】 动态政治情感极性的分析依赖于数据集的定时更新和迭代,否则模型的准确率和有效性会随时间的变化而降低;政治情感极性所受的影响因素非常多,政客所发推文情感内容与其所代表的真实政治倾向可能有差异,会造成模型一定程度的误判。【结论】 本文方法有效地利用多种深度学习技术辅助情报分析人员从海量推特文本数据中获取较为准确的动态政治情感极性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
常城扬
王晓东
张胜磊
关键词 推特动态情感分析政治人物群体深度学习BERT    
Abstract

[Objective] This paper studies the polarity of dynamic political sentiments from U.S. politicians’ tweets, aiming to analyze the future directions of U.S. politics and the China-US relations. [Methods] First, we proposed a framework combining multiple deep learning models. Then, we constructed tweet dataset from politicians and obtained a multi-classifier for sentiment polarity. Third, we added the tweets’ time characteristics to find the dynamic political sentiment polarity. [Results] We examined our framework with tweets from 20 U.S. governors and senators. Its accuracy reached 80.66%, which was 8.07% higher than that of the traditional artificial neural network method. The success rate of sentiment polarity analysis was 75%. [Limitations] The analysis of dynamic political sentiment polarity depends on the regular update and iteration of the data set, otherwise the accuracy and effectiveness of the model will decrease with the change of time; political sentiment polarity is affected by many factors, and the emotional content of politicians’ tweets may be different from the real political tendency they represent, which will lead to a certain degree of misjudgment of the model. [Conclusions] The proposed method helps intelligence analysts effectively obtain polarity of dynamic political sentiments from massive Twitter text data.

Key wordsTwitter    Dynamic Sentiment Analysis    Politician Group    Deep Learning    BERT
收稿日期: 2020-07-29      出版日期: 2020-11-24
ZTFLH:  TP391  
基金资助:*国防科技重点实验室基金项目(6142110180405)
通讯作者: 王晓东     E-mail: xdwang@nudt.edu.cn
引用本文:   
常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method. Data Analysis and Knowledge Discovery, 2021, 5(3): 121-131.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0743      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I3/121
Fig.1  政治情感分析框架
分类 政治家姓名
类别1:
亲特朗普派
John Bolton, Donald Trump, Mike Pence, RoBert O’brien, Mike Pompeo, Steven Mnuchin
类别2:
温和反对派
Congressmen Frank Palone, Congressman Eric Swawell, Senator Richard Blumenthal
类别3:
直接竞争派
Joe Biden, Congressmen Adam Schiff, Senator Bernie Sanders, Speakers Nancy Pelosi
类别4:
客观温和派
Governor Gretchen Whitmer, Senator Kamala Harris, Lawrence H. Summers, Governor Andrew Cuomo, Sally Yates, Senator Maria Cantwell, Senator Edward Markey, Senator Elizabeth Warren
Table 1  数据集选择的各类别政治家名单
推文内容 标签
Enforcers must stop scammers and bottom feeders from exploiting COVID-19 and endangering health. False pitches and sky-high price hikes should be halted and prosecuted. 1
Enjoyed talking davidgura at Select USA summit. Tax reform trade and regulation rollback are critical to serve hardworking Americans. 0
President Trump may be a slick salesman who fooled many people in this country, but you didn’t fool me and you didn’t fool New Yorkers. 3
With respect, Mr. President, not sure we can rely on Mr. Manafort’s lawyer to tell us whether there was collusion, as unbiased as he may be. 2
Table 2  清洗完成后的数据集实例
数据集 类别1 类别2 类别3 类别4 总计
训练集 9 300 6 430 7 427 14 905 38 062
验证集 1 861 1 286 1 486 2 981 7 614
测试集 1 862 1 286 1 486 2 982 7 616
总计 13 023 9 002 10 399 20 868 53 292
Table 3  数据集各分类具体情况(单位:条)
模型 验证集准确率 模型Loss F1值 训练速度(epoch=3) 模型大小(循环最后一个参数文件)
CNN 63.79% 0.827 1 62.35% 12分19秒 54.3MB
C-LSTM 67.56% 0.664 5 66.87% 14分15秒 69.3MB
Bi-LSTM 72.59% 0.736 5 71.83% 13分42秒 55.1MB
BERT 80.66% 0.628 2 79.34% 52分23秒 1.22GB
Table 4  不同分类器的实验结果
身份 人数 正确判别 错误判别
参议员 10 7 3
州长 10 8 2
总计 20 15 5
Table 5  针对20名未知政治情感极性的参议员、州长判别结果
姓名 时间段 职务 类别1概率 类别2概率 类别3概率 类别4概率
Phil Murphy 2020-02-05-2020-05-24 Governor of New Jersey 0.098 0.454 0.019 0.428
Richard Mike De Wine 2019-12-20-2020-05-24 Governor of Ohio 0.357 0.034 0.019 0.590
John Carney 2016-06-07-2015-05-24 Governor of Delaware 0.179 0.214 0.050 0.557
Eric Brakey 2018-08-26-2020-05-23 Senator in Maine 0.347 0.131 0.220 0.302
Laura Kelly 2019-01-14-2020-05-22 Governor of Kansas 0.434 0.016 0.090 0.460
Table 6  针对5名未知政治倾向美国政客推文判别结果
姓名 党派 职务 主要的政治观点 对特朗普
政府的态度
情报分析人员分析总结
Phil Murphy 民主党 Governor of New Jersey 对亚裔、有色人种持同情态度,在抗疫问题上赞成特朗普,感谢特朗普政府给予的大量支援 中性 属于温和的反对派,在尽管和特朗普有党派不同之分,但是在大部分情况下还是跟随特朗普政府,支持特朗普政府的决定。分类器也将此人归类为此类
Richard Mike De Wine 共和党 Governor of Ohio 不同于共和党支持控枪,但是反对同性婚姻、反对堕胎,同时积极防控新冠疫情 支持 与普通的共和党极右翼人员有很大不同,虽然支持特朗普政府,但是在很多立场上是相背离的。分类器展现的也是这个判断
John Carney 民主党 Governor of Delaware 支持保护有色人种权益,强调种族平等、支持控枪、积极组织抗击疫情 强烈反对 自特朗普政府上台之后,此人就对其持反对意见。分类器的结果也是如此
Eric Brakey 共和党 Senator in Maine 传统的共和党参议员,有强烈的反中倾向 支持 反中倾向极强,对特朗普政府的主要政治观点都是非常赞同的。分类器结果与实际结果有所偏差,不过Lable=0概率达到0.347,已经非常高
Laura Kelly 民主党 Governor of Kansas 传统的民主党州长,支持全民医保,积极抗击新冠疫情,但是其所在州是传统共和党票仓,得到了大量联邦的医疗资源 反对 民主党人士,在抗击疫情问题上强烈抨击特朗普政府。但是判别器却给出相反的结论,原因可能因为其推文发布有大量感谢特朗普政府支援物资的内容,并没有真实反映其政治取向
Table 7  5名美国政客的政治情感极性分析
推文发表时间 推文内容
['2020-05-21'] By withdrawing from the Open Skies Treaty, Pres. Trump is barreling down a path that makes us less secure,... I urge the President to reverse this reckless decision.
['2020-05-14'] ...With limited supplies, I'm calling on the Trump administration to be transparent with the American people about how this drug will be distributed.
Table 8  参议员沙欣的部分推文
Fig.2  以2020年3月20日为特殊事件时间节点的动态政治情感分析结果
[1] Twitter , Inc . Financial Information 2020 Annual Report[EB/OL]. https://investor.twitterinc.com/financial-information/default.aspx.
[2] Bermingham A, Smeaton A F. On Using Twitter to Monitor Political Sentiment and Predict Election Results[C]// Proceedings of the Workshop on Sentiment Analysis Where AI Meets Psychology (SAAIP 2011). 2011: 2-10.
[3] 刘志明, 刘鲁. 基于机器学习的中文微博情感分类实证研究[J]. 计算机工程与应用, 2012,48(1):1-4.
[3] ( Liu Zhiming, Liu Lu. Empirical Study of Sentiment Classification for Chinese Microblog Based on Machine Learning[J]. Computer Engineering and Applications, 2012,48(1):1-4.)
[4] Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 1480-1489.
[5] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1746-1751.
[6] Kalchbrenner N, Grefenstette E, Blunsom P. A Convolutional Neural Network for Modelling Sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014: 655-665.
[7] Yin W P, Schütze H. Multichannel Variable-Size Convolution for Sentence Classification[OL]. arXiv Preprint, arXiv: 1603. 04513.
[8] Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks[OL]. arXiv Preprint, arXiv: 1503. 00075.
[9] Ding Z X, Xia R, Yu J F, et al. Densely Connected Bidirectional LSTM with Applications to Sentence Classification[C]// Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2018: 278-287.
[10] Zhou C T, Sun C L, Liu Z Y, et al. A C-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511. 08630.
[11] Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[12] Socher R, Perelygin A, Wu J, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[13] Li X, Roth D. Learning Question Classifiers[C]// Proceedings of the 19th International Conference on Computational Linguistics. 2002: 1-7.
[14] 王芝辉, 王晓东. 基于神经网络的文本分类方法研究[J]. 计算机工程, 2020,46(3):11-17.
[14] ( Wang Zhihui, Wang Xiaodong. Research on Text Classification Method Based on Neural Network[J]. Computer Engineering, 2020,46(3):11-17.)
[15] Reddy D M, Reddy N V S. Twitter Sentiment Analysis Using Distributed Word and Sentence Representation[OL]. arXiv Preprint, arXiv: 1904. 12580.
[16] Dunnmon J, Ganguli S, Hau D, et al. Predicting State-Level Agricultural Sentiment with Tweets from Farming Communities[OL]. arXiv Preprint,arXiv: 1902. 07087.
[17] Dai X F, Bikdash M, Meyer B. From Social Media to Public Health Surveillance: Word Embedding Based Clustering Method for Twitter Classification[C]// Proceedings of SoutheastCon 2017. DOI: 10.1109/SECON.2017.7925400.
[18] Gencoglu O. Deep Representation Learning for Clustering of Health Tweets[OL]. arXiv Preprint,arXiv: 1901. 00439.
[19] Biseda B, Mo K. Enhancing Pharmacovigilance with Drug Reviews and Social Media[OL]. arXiv Preprint,arXiv: 2004. 08731.
[20] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidrectional Transformers for Language Understanding[C]// Proceedings of NAACL-HLT 2019. 2019:4171-4186.
[21] Lee J, Yoon W, Kim S, et al. BioBERT: a Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2019,36(4). DOI: 10.1093/bioinformatics/btz682.
pmid: 31584610
[22] Nikfarjam A, Sarker S, O’Connor K, et al. Pharmacovigilance from Social Media: Mining Adverse Drug Reaction Mentions Using Sequence Labeling with Word Embedding Cluster Features[J]. Journal of the American Medical Informatics Association, 2015,22(3):671-681.
doi: 10.1093/jamia/ocu041 pmid: 25755127
[23] Müller M, Salathé M, Kummervold P E. COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter[OL]. arXiv Preprint,arXiv: 2005. 07503.
[24] Yin H, Yang S Q, Li J X. Detecting Topic and Sentiment Dynamics Due to COVID-19 Pandemic Using Social Media[OL]. arXiv Preprint,arXiv: 2007. 02304.
[25] Hutto C J, Gilbert E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text[C]// Proceedings of the 8th International AAAI Conference on Weblogs and Social Media. 2014.
[26] 李慧, 胡云凤. 基于动态情感主题模型的在线评论分析[J]. 数据分析与知识发现, 2017,1(9):74-82.
[26] ( Li Hui, Hu Yunfeng. Analyzing Online Reviews with Dynamic Sentiment Topic Model[J]. Data Analysis and Knowledge Discovery, 2017,1(9):74-82.)
[27] 熊蜀峰, 姬东鸿. 面向产品评论分析的短文本情感主题模型[J]. 自动化学报, 2016,42(8):1227-1237.
[27] ( Xiong Shufeng, Ji Donghong. A Short Text Sentiment-Topic Model for Product Review Analysis[J]. Acta Automatica Sinica, 2016,42(8):1227-1237.)
[28] Gupta V, Aggarwal A, Chakraborty T. Detecting and Characterizing Extremist Reviewer Groups in Online Product Reviews[J]. IEEE Transactions on Computational Social Systems, 2020,7(3):741-750.
[29] Minaee S, Azimi E, Abdolrashidi A A. Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models[OL]. arXiv Preprint, arXiv: 1904. 04206.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] 马江微, 吕学强, 游新冬, 肖刚, 韩君妹. 融合BERT与关系位置特征的军事领域关系抽取方法*[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[5] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[6] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[7] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[8] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[9] 喻雪寒, 何琳, 徐健. 基于RoBERTa-CRF的古文历史事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[10] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[11] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[12] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[13] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[14] 宋若璇,钱力,杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究*[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[15] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn