Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (10): 47-55    DOI: 10.11925/infotech.2096-3467.2018.1250
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度迁移网络的Twitter谣言检测研究 *
刘勘(),杜好宸
中南财经政法大学信息与安全工程学院 武汉 430073
Detecting Twitter Rumors with Deep Transfer Network
Kan Liu(),Haochen Du
School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073,China
全文: PDF(705 KB)   HTML ( 22
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】解决网络谣言分领域检测时某些领域标注数据不足的问题, 帮助在无标注数据的领域构建谣言 检测模型。【方法】提出一种深度迁移网络, 以Multi-BiLSTM网络为基础, 加入MMD统计量计算的领域分布差异, 训练过程中同时学习源领域的标签损失与领域间的分布差异, 完成标签信息在领域间的有效迁移。【结果】相较于未分领域的谣言检测方法和分领域但不使用迁移学习的谣言检测方法, 本文方法在F1指标上分别提升10.3%与8.5%。【局限】在数据分布差异大的领域迁移效果受到限制, 未涉及多个领域的谣言检测。【结论】本文方法可以有效地将迁移学习技术应用在分领域谣言检测场景下, 为网络谣言识别提供新思路。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘勘
杜好宸
关键词 谣言检测深度迁移网络多层双向长短时记忆网络领域适配推特    
Abstract

[Objective] This paper proposes a new model to address the issue of insufficient data facing network rumors detection. [Methods] We proposed a deep transfer network based on the Multi-BiLSTM network as well as domain distributions of MMD statistics calculation. Then, we trained the model to learn the data loss of source domain and the distribution difference among domains. Finally, we realized the effective migration of label information across domains. [Results] Compared with two traditional rumor detection methods, the proposed model’s F1 index was increased by 10.3% and 8.5% respectively. [Limitations] The effect of transfer was not obvious in skewed data distribution and multiple domains. Conclusions] The proposed method could improve the rumor detection results. The deep transfer network could achieve positive outcomes among domains, and provide new directions for Internet rumor recognition.

Key wordsRumor Detection    Deep Transfer Network    Multi-BiLSTM    Domain Adaption    Twitter
收稿日期: 2018-11-09     
中图分类号:  TP393  
基金资助:*本文系国家社会科学基金项目“基于文本挖掘的网络谣言预判研究”的研究成果之一(14BXW033)
通讯作者: 刘勘     E-mail: liukan@zuel.edu.cn
引用本文:   
刘勘,杜好宸. 基于深度迁移网络的Twitter谣言检测研究 *[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
Kan Liu,Haochen Du. Detecting Twitter Rumors with Deep Transfer Network. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.1250.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1250
图1  跨领域谣言检测流程
图2  Multi-BiLSTM网络
图3  领域适配示意
特征名 特征解释
reg_period 注册时间, 从注册日到数据获取日, 以月计
followers_count 作者粉丝数
friends_count 作者朋友数
listed_count 作者关注数
statuses_count 作者发布信息的数量
Description 作者个人描述的长度
Sex 作者性别
Location 用户位置
表1  用户特征
特征名 特征解释
retweet_count 转发数
retweet favorite_count 转发点赞数
retweet_comment_count 转发评论数
retweet_ followers_count 转发用户平均粉丝数
retweet_reg_period 转发用户平均注册时间
表2  传播特征
文本领域 谣言文本数(条) 非谣言文本数(条)
Politics 1 780 1 934
News 1 744 1 659
Food 562 676
History 488 440
Business 576 455
表3  Twitter数据集明细
图4  保留特殊元素的文本预处理
图5  不同迁移常数下F1值的变化
目标领域 建模方式 Precision Recall F1
Food 深度迁移学习(P>>F) 0.814 0.822 0.818
深度迁移学习(N>>F) 0.869 0.872 0.870
SVM(有监督学习) 0.881 0.875 0.878
LR(有监督学习) 0.873 0.869 0.871
Multi-AN(无监督学习) 0.802 0.801 0.801
RNN+AN(无监督学习) 0.823 0.813 0.818
History 深度迁移学习(P>>H) 0.874 0.869 0.871
深度迁移学习(N>>H) 0.865 0.872 0.868
SVM(有监督学习) 0.903 0.890 0.896
LR(有监督学习) 0.873 0.865 0.869
Multi-AN(无监督学习) 0.869 0.871 0.870
RNN+AN(无监督学习) 0.882 0.862 0.872
Business 深度迁移网络(P>>B) 0.901 0.895 0.898
深度迁移网络(N>>B) 0.904 0.884 0.894
SVM(有监督学习) 0.913 0.906 0.909
LR(有监督学习) 0.907 0.901 0.904
Multi-AN(无监督学习) 0.824 0.819 0.821
RNN+AN(无监督学习) 0.831 0.826 0.828
表5  与其他学习方法的比对结果
[1] 曹博林 . 社交媒体: 概念、发展历程、特征与未来——兼谈当下对社交媒体认识的模糊之处[J]. 湖南广播电视大学学报, 2011(3):65-69.
( Cao Bolin . Social Media: Definition, History of Development, Features and Future—The Ambiguous Cognition of Social Media[J]. Journal of Hunan Radio & Television University, 2011(3):65-69.)
[2] 雷霞 . 谣言: 概念演变与发展[J]. 新闻与传播研究, 2016(9):113-118.
( Lei Xia . Rumor: Concept Evolution and Development[J]. Journalism & Communication, 2016(9):113-118.)
[3] Fanti G, Kairouz P, Oh S , et al. Hiding the Rumor Source[J]. IEEE Transactions on Information Theory, 2017,63(10):6679-6713.
[4] Castillo C, Mendoza M, Poblete B . Information Credibility on Twitter [C]// Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India. 2011: 675-684.
[5] Ma J, Gao W, Wei Z , et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites [C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia. ACM, 2015: 1751-1754.
[6] Zhao Z, Resnick P, Mei Q . Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts [C]// Proceedings of the 24th International Conference on World Wide Web, Florence, Italy. ACM, 2015: 1395-1405.
[7] 祖坤琳, 赵铭伟, 郭凯 , 等. 新浪微博谣言检测研究[J]. 中文信息学报, 2017,31(3):198-204.
( Zu Kunlin, Zhao Mingwei, Guo Kai , et al. Research on the Detection of Rumor on Sina Weibo[J]. Journal of Chinese Information Processing, 2017,31(3):198-204.)
[8] Ma J, Gao W, Mitra P , et al. Detecting Rumors from Microblogs with Recurrent Neural Networks [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, USA. 2016: 3818-3824.
[9] Chen T, Li X, Yin H , et al. Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection [C]// Proceedings of the 2018 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2018: 40-52.
[10] Zhang Y, Chen W, Yeo C K , et al. Detecting Rumors on Online Social Networks Using Multi-Layer Autoencoder [C]// Proceedings of the 2017 IEEE Technology & Engineering Management Conference. IEEE, 2017: 437-441.
[11] Chen W, Zhang Y, Yeo C K , et al. Unsupervised Rumor Detection Based on Users’ Behaviors Using Neural Networks[J]. Pattern Recognition Letters, 2018,105:226-233.
[12] 刘雅辉, 靳小龙, 沈华伟 , 等. 社交媒体中的谣言识别研究综述[J]. 计算机学报, 2018,41(7):1536-1558.
( Liu Yahui, Jin Xiaolong, Shen Huawei , et al. A Survey on Rumor Identification over Social Media[J]. Chinese Journal of Computers, 2018,41(7):1536-1558.)
[13] Zhou J, Xu W . End-to-End Learning of Semantic Role Labeling Using Recurrent Neural Networks [C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 1127-1137.
[14] Chen T, Xu R, He Y , et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017,72:221-230.
[15] Blitzer J, McDonald R, Pereira F. Domain Adaptation with Structural Correspondence Learning [C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia. ACM, 2006: 120-128.
[16] Borgwardt K M, Gretton A, Rasch M J , et al. Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy[J]. Bioinformatics, 2006,22(14):e49-e57.
[17] Ghifary M, Kleijn W B, Zhang M . Domain Adaptive Neural Networks for Object Recognition [C]// Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence. 2014: 898-904.
[18] Tzeng E, Hoffman J, Zhang N , et al. Deep Domain Confusion: Maximizing for Domain Invariance[OL]. arXiv Preprint, arXiv: 1412. 3474.
[19] Long M, Cao Y, Wang J, et al. Learning Transferable Features with Deep Adaptation Networks [C]// Proceedings of the 32nd International Conference on Machine Learning, Lille, France. 2015: 97-105.
[20] Mou L, Meng Z, Yan R, et al. How Transferable are Neural Networks in NLP Applications? [C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, USA. 2016: 479-489.
[21] Gretton A, Sriperumbudur B, Sejdinovic D, et al. Optimal Kernel Choice for Large-Scale Two-Sample Tests [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA. 2012: 1205-1213.
[22] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. 2014: 1532-1543.
[23] Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures [C]// Proceedings of the 31st International Conference on Data Engineering, Seoul, South Korea. IEEE, 2015: 651-662.
[1] 首欢容,邓淑卿,徐健. 基于情感分析的网络谣言识别方法*[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn