|
|
Identifying Clickbait with BERT-BiGA Model |
Yin Pengbo,Pan Weimin(),Zhang Haijun,Chen Degang |
College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China |
|
|
Abstract [Objective] This paper proposes an algorithm with BiGRU and attention mechanism based on the Chinese BERT model,aiming to identify the clickbait from online news titles. [Methods] First, we pre-trained our model as a text encoder using the Chinese BERT. Then, we extracted text features through the fusion attention mechanism, and used BiGRU to model news titles and contents. Finally, we identified clickbait based on their semantic correlation. [Results] This method addressed the issues of complex feature engineering and secondary error amplification in the text similarity calculation. The recognition accuracy rate was 81%, and a browser plug-in was developed to detect clickbait. [Limitations] The proposed model only examined news titles and contents, and did not include pageviews, likes, and comments in the calculation. [Conclusions] Our new method, whose recall is 4% higher than those of the existing methods, could effectively identify the clickbait from online news.
|
Received: 29 January 2021
Published: 06 July 2021
|
|
Fund:National Natural Science Foundation of China-Xinjiang Joint Fund(U1703261) |
Corresponding Authors:
Pan Weimin
E-mail: panweiminss@163.com
|
[1] |
Pujahari A, Sisodia D S. Clickbait Detection Using Multiple Categorisation Techniques[J]. Journal of Information Science, 2019,24(5):132-137.
|
[2] |
Agrawal A. Clickbait Detection Using Deep Learning[C]// Proceedings of the 2nd International Conference on Next Generation Computing Technologies (NGCT). 2016: 268-272.
|
[3] |
Loewenstein G. The Psychology of Curiosity: A Review and Reinterpretation[J]. Psychological Bulletin, 1994,116(1):75-82.
doi: 10.1037/0033-2909.116.1.75
|
[4] |
Potthast M, Köpsel S, Stein B, et al. Clickbait Detection[C]// Proceedings of European Conference on Information Retrieval. 2016: 810-817.
|
[5] |
赵帅. 基于改进型VSM-HowNet融合相似度算法在“标题党”新闻识别中的研究[D]. 长春: 吉林大学, 2018.
|
[5] |
(Zhao Shuai. A Research on the Recognition of the “Sensational Headline” News Based on an Improved VSM-HowNet Fusion Similarity Algorithm[D]. Changchun: Jilin University, 2018.)
|
[6] |
Bourgonje P, Schneider J M, Rehm G. From Clickbait to Fake News Detection: An Approach Based on Detecting the Stance of Headlines to Articles[C]// Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism. 2017: 84-89.
|
[7] |
Potthast M, Gollub T, Komlossy K, et al. Crowdsourcing a Large Corpus of Clickbait on Twitter[C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 1498-1507.
|
[8] |
Shu K, Wang S H, Le T, et al. Deep Headline Generation for Clickbait Detection[C]// Proceedings of 2018 IEEE International Conference on Data Mining (ICDM). 2018: 467-476.
|
[9] |
Chen Y M, Conroy N J, Rubin V L. Misleading Online Content: Recognizing Clickbait as “False News”[C]// Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection. 2015: 15-19.
|
[10] |
Chakraborty A, Paranjape B, Kakarla S, et al. Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media[C]// Proceedings of 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2016: 9-16.
|
[11] |
Biyani P, Tsioutsiouliklis K, Blackmer J. “8 Amazing Secrets for Getting More Clicks”: Detecting Clickbaits in News Streams Using Article Informality[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2016: 46-53.
|
[12] |
梅钟霄. 基于文本挖掘的新闻标题与内容契合度评价研究[D]. 北京: 首都经济贸易大学, 2018.
|
[12] |
(Mei Zhongxiao. Research on Evaluation of News Headlines and Content Correspondence Based on Text Mining[D]. Beijing: Capital University of Economics and Business, 2018.)
|
[13] |
罗佳. 基于潜在语义分析的标题党新闻识别技术研究[D]. 武汉: 湖北工业大学, 2015.
|
[13] |
(Luo Jia. Research of Title Party News Identification Technology Based on Latent Semantic Analysis[D]. Wuhan: Hubei University of Technology, 2015.)
|
[14] |
Rony M M U, Hassan N, Yousuf M. Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects?[C]// Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017: 232-239.
|
[15] |
Anand A, Chakraborty T, Park N. We Used Neural Networks to Detect Clickbaits: You won’t Believe What Happened Next![C]// Proceedings of European Conference on Information Retrieval. 2017: 541-547.
|
[16] |
Chakraborty A, Sarkar R, Mrigen A, et al. Tabloids in the Era of Social Media? Understanding the Production and Consumption of Clickbaits in Twitter[J]. PACM on Human-Computer Interaction, 2017, 1(CSCW): Article No. 30.
|
[17] |
Zhou Y W. Clickbait Detection in Tweets Using Self-Attentive Network[OL]. arXiv Preprint, arXiv:1710.05364.
|
[18] |
Kumar V, Khattar D, Gairola S, et al. Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1225-1228.
|
[19] |
Cui Y M, Che W X, Liu T, et al. Pre-training with Whole Word Masking for Chinese BERT [OL]. arXiv Preprint, arXiv:1906.08101.
|
[20] |
Seo M, Kembhavi A, Farhadi A, et al. Bidirectional Attention Flow for Machine Comprehension [OL]. arXiv Preprint, arXiv:1611.01603.
|
[21] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need [OL]. arXiv Preprint, arXiv:1706.03762.
|
[22] |
Tilk O, Alumäe T. Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration[C]// Proceedings of Interspeech 2016: Understanding Speech Processing in Human and Machines. 2016: 3047-3051.
|
[23] |
Naeem B, Khan A, Beg M O, et al. A Deep Learning Framework for Clickbait Detection on Social Area Network Using Natural Language Cues[J]. Journal of Computational Social Science, 2020,26(2):1-13.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|