Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (4): 101-113     https://doi.org/10.11925/infotech.2096-3467.2022.0379
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
网评贴文自动生成方法研究*
刘欣然1,2,徐雅斌1,2(),李继先3
1北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
2北京信息科技大学计算机学院 北京 100101
3北京开放大学人文与教育学院 北京 100081
Method for Automatically Generating Online Comments
Liu Xinran1,2,Xu Yabin1,2(),Li Jixian3
1Beijing Key Laboratory of Network Culture and Digital Communication, Beijing University of Information Science and Technology, Beijing 100101, China
2School of Computer Science, Beijing University of Information Science and Technology, Beijing 100101, China
3School of Humanities and Education, Beijing Open University, Beijing 100081, China
全文: PDF (1288 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 为反制社交网络中的恶意信息、引导正确的舆论走向,提出一种时序序列生成式对抗网络(T-SeqGAN),实现网评贴文自动生成。【方法】 通过将序列生成式对抗网络(SeqGAN)的生成器修改为Seq2Seq结构,分别以双向门控循环单元和时序卷积神经网络(TCN)作为其编码器与解码器的骨架网络的方式,提高生成贴文与真实网评贴文的语序结构及语义特征的相似性;通过将SeqGAN的判别器修改为TCN与注意力机制层相结合的模型的方式,提高生成贴文的语句通顺度。【结果】 与基线模型相比,利用T-SeqGAN生成的网评贴文BLEU-2(0.799 35)、BLEU-3(0.603 96)、BLEU-4(0.476 42)、KenLM(-27.670 29)指标值更高,PPL(0.752 47)指标值更低。【局限】 生成贴文的词汇量及语言风格受制于已有的真实贴文,网评贴文自动生成方法的适用情景受限。【结论】 本文模型生成的网评贴文具有更高的语序正确性和语法正确性,与真实贴文的内容相似性也更高,能够在社交网络中引导正确的舆论走向。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘欣然
徐雅斌
李继先
关键词 网评贴文序列生成式对抗网络时序卷积神经网络Seq2Seq    
Abstract

[Objective] This paper proposes a Temporal Sequence Generative Adversarial Network (T-SeqGAN) automatically generating online comments, aiming to counteract malicious information on social networks and guide the correct direction of public opinion. [Methods] First, we modified the Sequence Generative Adversarial Network (SeqGAN) generator to a Seq2Seq structure. Then, we used the bidirectional gated recurrent unit (BiGRU) and the sequential convolutional neural network (TCN) as the skeleton network of the encoder and decoder, respectively. Next, we improved the similarity of the syntactic structure and semantic features between the generated posts and the real online comments. Finally, we modified the discriminator of SeqGAN to a model combing TCN and attention mechanism layers to improve the fluency of generated posts. [Results] Compared with the baseline model, the comments generated by the proposed model have significantly higher BLEU-2 (0.799 35), BLEU-3(0.603 96), BLEU-4(0.476 42), and KenLM (-27.670 29)metrics, as well as lower PPL(0.752 47) metrics. [Limitations] The vocabulary and language style of the generated posts are limited by actual posts, and the applicability of our method is limited. [Conclusions] The comments generated by the proposed model have higher syntactic and grammatical correctness and higher similarity to the real-world ones, which can guide the correct direction of public opinion on social networks.

Key wordsSocial Network Comment Posts    SeqGAN    TCN    Seq2Seq
收稿日期: 2022-04-21      出版日期: 2023-06-07
ZTFLH:  TP391  
基金资助:*国家自然科学基金项目(61672101);网络文化与数字传播北京市重点实验室开放课题(ICCD XN004);信息网络安全公安部重点实验室开放课题的研究成果之一(C18601)
通讯作者: 徐雅斌,ORCID:0000-0003-2727-3773,E-mail: xyb@bistu.edu.cn   
引用本文:   
刘欣然, 徐雅斌, 李继先. 网评贴文自动生成方法研究*[J]. 数据分析与知识发现, 2023, 7(4): 101-113.
Liu Xinran, Xu Yabin, Li Jixian. Method for Automatically Generating Online Comments. Data Analysis and Knowledge Discovery, 2023, 7(4): 101-113.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0379      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I4/101
Fig.1  T-SeqGAN模型总体架构
Fig.2  编码器结构
Fig.3  解码器结构
Fig.4  判别器结构
生成模型 BLEU-2 BLEU-3 BLEU-4 PPL KenLM
NMT 0.701 15 0.530 68 0.418 85 0.781 67 -25.576 32
CNN2CNN 0.693 00 0.476 64 0.328 88 0.791 53 -32.772 72
VAE 0.585 18 0.208 87 0.098 22 0.844 17 -30.045 38
T2T 0.750 25 0.549 47 0.415 34 0.772 86 -31.458 31
B2T 0.795 99 0.601 05 0.474 88 0.753 72 -28.369 01
Table 1  Seq2Seq模型生成贴文评价指标值
Fig.5  T-SeqGAN对抗训练损失函数值
Fig.6  T-SeqGAN对抗训练各epoch评价指标
生成模型 BLEU-2 BLEU-3 BLEU-4 PPL KenLM
SeqGAN1 0.795 93 0.601 69 0.475 96 0.753 30 -28.389 92
SeqGAN2 0.795 70 0.600 78 0.475 32 0.753 65 -28.411 49
SeqGAN3 0.792 31 0.597 86 0.476 20 0.755 88 -28.694 24
T-SeqGAN 0.799 35 0.603 96 0.476 42 0.752 47 -27.670 29
Table 2  SeqGAN模型生成贴文评价指标值
[1] 程新斌. 对重大舆情与突发事件舆论引导研究的分析与对策[J]. 西南民族大学学报(人文社会科学版), 2022, 43(2): 235-240.
[1] Cheng Xinbin. Analysis and Countermeasures of Public Opinion Guidance Research on Major Public Opinions and Emergencies[J]. Journal of Southwest Minzu University (Humanities and Social Science), 2022, 43(2): 235-240.)
[2] 余晓青. 意识形态网络舆情应对机制研究[J]. 南京邮电大学学报(社会科学版), 2021, 23(5): 37-47.
[2] Yu Xiaoqing. The Problems and Countermeasures of Coping Mechanism on Ideological Internet Public Opinion[J]. Journal of Nanjing University of Posts and Telecommunications (Social Science Edition), 2021, 23(5): 37-47.)
[3] Zou X H, Lin C, Zhang Y J, et al. To be an Artist: Automatic Generation on Food Image Aesthetic Captioning[C]// Proceedings of the 32nd International Conference on Tools with Artificial Intelligence. IEEE, 2020: 779-786.
[4] Zeng W H, Abuduweili A, Li L, et al. Automatic Generation of Personalized Comment Based on User Profile[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics:Student Research Workshop. ACL, 2019: 229-235.
[5] Li W, Xu J J, He Y C, et al. Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 2019: 4843-4852.
[6] Casar Morejon C. Automatic Generation of Comments on Twitter Based on News[D]. Universität Politècnica de Catalunya, 2018.
[7] 阮叶丽. 基于GAN考虑外部知识与情感属性的文本生成研究[D]. 武汉: 中南财经政法大学, 2020.
[7] (Ruan Yeli. The Research of Text Generation with Considering External Knowledge and Emotional Attributes Based on GAN[D]. Wuhan: Zhongnan University of Economics and Law, 2020.)
[8] Kingma D P, Welling M. Auto-Encoding Variational Bayes[OL]. arXiv Preprint, arXiv: 1312.6114.
[9] Li C Y, Gao X, Li Y, et al. Optimus: Organizing Sentences via Pre-Trained Modeling of a Latent Space[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 4678-4699.
[10] 胡盛伟. 基于深度学习的条件文本生成技术与应用研究[D]. 泉州: 华侨大学, 2020.
[10] (Hu Shengwei. Research on Conditional Text Generation Technology and Its Application Based on Deep Learning[D]. Quanzhou: Huaqiao University, 2020.)
[11] Liu D Y, Liu G S. A Transformer-Based Variational Autoencoder for Sentence Generation[C]// Proceedings of the 2019 International Joint Conference on Neural Networks. IEEE, 2019: 1-7.
[12] 张超. 生成对抗网络在文本生成中的应用研究[D]. 武汉: 湖北工业大学, 2019.
[12] (Zhang Chao. Application Research of Generative Adversarial Networks in Text Generation[D]. Wuhan: Hubei University of Technology, 2019.)
[13] 刘磊. 基于生成式对抗网络与异质集成学习的文本情感分类研究[D]. 南京: 南京邮电大学, 2020.
[13] (Liu Lei. Research on Text Sentiment Classfication Based on Generative Adversarial Network and Heterogenous Ensemble Learning[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020.)
[14] Yu L T, Zhang W N, Wang J, et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017.
[15] 王姿雯. 基于深度学习的多条件个性化文本生成[D]. 北京: 北京邮电大学, 2019.
[15] (Wang Ziwen. Multi-Conditional Generation of Personalized Texts Based on Deep Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
[16] 严丹, 何军, 刘红岩, 等. 考虑评级信息的音乐评论文本自动生成[J]. 计算机科学与探索, 2020, 14(8): 1389-1396.
doi: 10.3778/j.issn.1673-9418.1908075
[16] (Yan Dan, He Jun, Liu Hongyan, et al. Considering Grade Information for Music Comment Text Automatic Generation[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(8): 1389-1396.)
doi: 10.3778/j.issn.1673-9418.1908075
[17] 韩虎, 孙天岳, 赵启涛. 引入自编码机制对抗网络的文本生成模型[J]. 计算机工程与科学, 2020, 42(9): 1704-1710.
[17] (Han Hu, Sun Tianyue, Zhao Qitao. Generative Adversarial Networks with Autoencoder for Text Generation[J]. Computer Engineering & Science, 2020, 42(9): 1704-1710.)
[18] Zheng H T, Wang W, Chen W, et al. Automatic Generation of News Comments Based on Gated Attention Neural Networks[J]. IEEE Access, 2017, 6: 702-710.
doi: 10.1109/ACCESS.2017.2774839
[19] 朱向其, 张忠林, 李林川, 等. 基于改进词性信息和ACBiLSTM的短文本分类[J]. 计算机应用与软件, 2021, 38(12): 179-186.
[19] (Zhu Xiangqi, Zhang Zhonglin, Li Linchuan, et al. Short Text Classification Based on Improved Part of Speech Information and ACBiLSTM[J]. Computer Applications and Software, 2021, 38(12): 179-186.)
[20] Bai S J, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[OL]. arXiv Preprint, arXiv: 1803.01271.
[21] 高巍, 马辉, 李大舟, 等. 基于双编码器的中文文本摘要技术的研究与实现[J]. 计算机工程与设计, 2021, 42(9): 2687-2695.
[21] (Gao Wei, Ma Hui, Li Dazhou, et al. Research and Implementation of Chinese Text Abstract Technology Based on Double Encoder[J]. Computer Engineering and Design, 2021, 42(9): 2687-2695.)
[22] Cao D, Huang Y J, Fu Y B. Text Sentiment Analysis Based on Parallel TCN Model and Attention Model[C]// Proceedings of the 2nd Symposium on Signal Processing Systems. ACM, 2020: 86-90.
[23] 邹智, 吴铁洲, 张晓星, 等. 基于贝叶斯优化CNN-BiGRU混合神经网络的短期负荷预测[J]. 高电压技术, 2022, 48(10): 3935-3945.
[23] (Zou Zhi, Wu Tiezhou, Zhang Xiaoxing, et al. Short-Term Load Forecast Based on Bayesian Optimized CNN-BiGRU Hybrid Neural Networks[J]. High Voltage Engineering, 2022, 48(10): 3935-3945.)
[24] 段明君. 基于生成对抗网络的中文文本生成[D]. 成都: 电子科技大学, 2021.
[24] (Duan Mingjun. Research on Chinese Text Generation Based on Generative Adversarial Networks[D]. Chengdu: University of Electronic Science and Technology of China, 2021.)
[25] Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York: ACM, 2002: 311-318.
[26] 何天文, 王红. 基于语义语法分析的中文语句困惑度评价[J]. 计算机应用研究, 2017, 34(12): 3538-3542.
[26] (He Tianwen, Wang Hong. Evaluating Perplexity of Chinese Sentences Based on Grammar & Semantics Analysis[J]. Application Research of Computers, 2017, 34(12): 3538-3542.)
[27] Heafield K. KenLM: Faster and Smaller Language Model Queries[C]// Proceedings of the 6th Workshop on Statistical Machine Translation. ACM, 2011: 187-197.
[28] Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[OL]. arXiv Preprint, arXiv: 1409.3215.
[29] Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[C]// Proceedings of the 34th International Conference on Machine Learning. ACM, 2017: 1243-1252.
[30] Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. ACM, 2015: 2267-2273.
[31] Srivastava R K, Greff K, Schmidhuber J. Highway Networks[OL]. arXiv Preprint, arXiv: 1505.00387.
[32] Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 207-212.
[1] 张思阳, 魏苏波, 孙争艳, 张顺香, 朱广丽, 吴厚月. 基于多标签Seq2Seq模型的情绪-原因对提取模型*[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[2] 付常雷,钱力,张华平,赵华茗,谢靖. 基于深度学习的创新主题智能挖掘算法研究*[J]. 数据分析与知识发现, 2019, 3(1): 46-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn