Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (6): 95-104    DOI: 10.11925/infotech.2096-3467.2021.0916
Current Issue | Archive | Adv Search |
Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering
Hu Jiming1,2(),Zheng Xiang1
1School of Information Management, Wuhan University, Wuhan 430072, China
2Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China
Download: PDF (1049 KB)   HTML ( 29
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to summarize interactive contents from new media for government affairs based on topic clustering, aiming to help the government effectively control public opinion events. [Methods] First, we analyzed the textual features of the interactive contents. Then, we generated abstracts for the contents with the Top2Vec, TextRank and Transformer-Copy algorithms. [Results] The proposed model’s ROUGE-1, ROUGE-2 and ROUGE-L values reached 22.05%, 6.93% and 20.96%, respectively, which were better than those of the Seq2Seq and Seq2Seq-Attention models. [Limitations] We only examined the new model with interactive contents on 10 draft laws and regulations from Sina Microblog. [Conclusions] The proposed method can summarize the topics and public opinion on specific events.

Key wordsNew Media      Government Affairs Interactive Content      Summarization Generation      Topic Clustering     
Received: 27 August 2021      Published: 28 July 2022
ZTFLH:  G206  
Fund:National Natural Science Foundation of China(71874125);Young Top-notch Talent Cultivation Program of Hubei Province
Corresponding Authors: Hu Jiming     E-mail: hujiming@whu.edu.cn

Cite this article:

Hu Jiming, Zheng Xiang. Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering. Data Analysis and Knowledge Discovery, 2022, 6(6): 95-104.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0916     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I6/95

The Model of Interactive Content Summarization Generation
参数名称 参数值
Batch_Size 32
输入文本最大长度 150
输出摘要最大长度 25
Epoch 20
Gpu_Nums 1
Beam_Size 5
Dropout 0.1
Warmup_Steps 4 000
Num_Blocks 6
Num_Heads 8
Parameter Setting of Interactive Content Summarization Generation
ROUGE Changes of the Model
Interactive Content Clustering of “Draft”
主题类别 互动内容数 关键句 摘要
5 1 786 ……社会及家庭对青少年的品德素质教育应在短期讯速提升。现阶段修改下调年龄犯罪入刑确有必要……未成年人犯罪早该得到重视,家庭教育是关键,未成年人监护人应承担一定的法律责任…… 重视未成年
人品德素质
教育
感觉犯罪就用刑法量刑,没必要分年龄……利用年龄优势犯罪的嫌疑人真的很难追究责任……恶魔不能因为年龄小就能逍遥法外。 犯罪没必
要分年龄
8 20 终于迈出这一步了……为自己的行为负责。强烈支持,即使是未成年人也要为自己的行为负责。 终于修订,
强烈支持
9 447 #12至14岁故意杀人等犯罪或将负刑责#必须支持。进步了支持。强烈支持。 支持12至
14岁故意
杀人修订
真的应该改了,未成年人受害者没被保护,未成年罪犯却一直被保护,逻辑上来讲是行不通的。年龄可以再低一些,现在孩子早熟。年龄还是大了点。 法律应修改,
处罚年龄可
降低
Part of Interactive Content Summarization of “Draft”
[1] 中国为什么有这么多政务新媒体?[EB/OL]. [2021-05-22]. http://www.gov.cn/xinwen/2019-10/11/content_5438342.htm.
[1] (Why are There So Many Government New Media in China? [EB/OL]. [2021-05-22]. http://www.gov.cn/xinwen/2019-10/11/content_5438342.htm.)
[2] 国务院办公厅印发《关于全面推进政务公开工作的意见》实施细则的通知[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2016-11/15/content_5132852.htm.
[2] (Notice of the General Office of the State Council on the Implementation Rules of the Opinions on Comprehensively Promoting the Open Government Work[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2016-11/15/content_5132852.htm.)
[3] 国务院办公厅关于印发进一步深化“互联网+政务服务”推进政务服务“一网、一门、一次”改革实施方案的通知(国办发〔2018〕45号)[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2018/content_5303434.htm.
[3] (The General Office of the State Council on the Issuance of Further Deepening the “Internet + Government Services” to Promote the Implementation of the “One Network, One Door, One Time” Reform of Government Services Notice[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2018/content_5303434.htm.)
[4] 国务院办公厅关于印发2020年政务公开工作要点的通知(国办发〔2020〕17号)[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2020/content_5528175.htm.
[4] (Notice of the General Office of the State Council on the Issuance of the Main Points of Open Government Work in 2020[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2020/content_5528175.htm.)
[5] 国务院办公厅关于推进政务新媒体健康有序发展的意见(国办发〔2018〕123号)[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2018-12/27/content_5352666.htm.
[5] (Opinions of the General Office of the State Council on Promoting the Healthy and Orderly Development of New Media for Government Affairs[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2018-12/27/content_5352666.htm.)
[6] 习近平总书记在网络安全和信息化工作座谈会上的讲话[EB/OL]. [2020-03-13]. http://www.cac.gov.cn/2016-04/25/c_1118731366.htm.
[6] (Speech by Secretary General Xi Jinping at the Symposium on Internet Security and Informatization[EB/OL]. [2020-03-13]. http://www.cac.gov.cn/2016-04/25/c_1118731366.htm.)
[7] 李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
[7] (Li Tingting, Ji Donghong. Sentiment Analysis of Micro-Blog Based on SVM and CRF Using Various Combinations of Features[J]. Application Research of Computers, 2015, 32(4): 978-981.)
[8] Rudrapal D, Das A, Bhattacharya B. A Survey on Automatic Twitter Event Summarization[J]. Journal of Information Processing Systems, 2018, 14(1): 79-100.
[9] 王连喜. 微博短文本预处理及学习研究综述[J]. 图书情报工作, 2013, 57(11): 125-131.
doi: 10.7536/j.issn.0252-3116.2013.11.023
[9] (Wang Lianxi. A Literature Review on Pre-Processing and Learning of Microtext[J]. Library and Information Service, 2013, 57(11): 125-131.)
doi: 10.7536/j.issn.0252-3116.2013.11.023
[10] 李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
[10] (Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
[11] 周炜翔, 张仰森, 张良. 面向微博热点事件的话题检测及表述方法研究[J]. 计算机应用研究, 2019, 36(12): 3565-3569.
[11] (Zhou Weixiang, Zhang Yangsen, Zhang Liang. Research on Topic Detection and Expression Method for Weibo Hot Events[J]. Application Research of Computers, 2019, 36(12): 3565-3569.)
[12] 刘一仝. 篇章级事件表示及相关性计算[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[12] (Liu Yitong. Passage Level Event Representation and Relevance Computation[D]. Harbin: Harbin Institute of Technology, 2019.)
[13] Belwal R C, Rai S, Gupta A. Text Summarization Using Topic-Based Vector Space Model and Semantic Measure[J]. Information Processing & Management, 2021, 58(3): 102536.
[14] Ali S M, Noorian Z, Bagheri E, et al. Topic and Sentiment Aware Microblog Summarization for Twitter[J]. Journal of Intelligent Information Systems, 2020, 54(1): 129-156.
[15] Ma Y, Li Q. A Weakly-Supervised Extractive Framework for Sentiment-Preserving Document Summarization[J]. World Wide Web, 2019, 22(4): 1401-1425.
[16] 余传明, 郑智梁, 朱星宇, 等. 面向查询的观点摘要模型研究:以Debatepedia为数据源[J]. 情报学报, 2020, 39(4): 374-386.
[16] (Yu Chuanming, Zheng Zhiliang, Zhu Xingyu, et al. Query-Oriented Opinion Summarization Model Using Debatepedia as Datasource[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 374-386.)
[17] 刘欣. 文本摘要自动生成的研究与实现[D]. 北京: 北京邮电大学, 2020.
[17] (Liu Xin. Research and Implementation of Automatic Text Summarization[D]. Beijing: Beijing University of Posts and Telecommunications, 2020.)
[18] Hu B T, Chen Q C, Zhu F Z. LCSTS: A Large Scale Chinese Short Text Summarization Dataset[OL]. arXiv Preprint, arXiv: 1506. 05865v4.
[19] Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 3104-3112.
[20] You F C, Zhao S, Chen J J. A Topic Information Fusion and Semantic Relevance for Text Summarization[J]. IEEE Access, 2020, 8: 178946-178953.
[21] 周健, 田萱, 崔晓晖. 基于改进Sequence-to-Sequence模型的文本摘要生成方法[J]. 计算机工程与应用, 2019, 55(1): 128-134.
[21] (Zhou Jian, Tian Xuan, Cui Xiaohui. Generation Method of Text Summarization Based on Advanced Sequence-to-Sequence Model[J]. Computer Engineering and Applications, 2019, 55(1): 128-134.)
[22] Cibils A, Musat C, Hossman A, et al. Diverse Beam Search for Increased Novelty in Abstractive Summarization[OL]. arXiv Preprint, arXiv:1802.01457.
[23] 施云生. 基于序列到序列模型的生成式文本摘要研究[D]. 大连: 大连理工大学, 2020.
[23] (Shi Yunsheng. Research on Abstract Text Summarization Based on Sequence to Sequence Model[D]. Dalian: Dalian University of Technology, 2020.)
[24] Goyal P, Kaushik P, Gupta P, et al. Multilevel Event Detection, Storyline Generation, and Summarization for Tweet Streams[J]. IEEE Transactions on Computational Social Systems, 2020, 7(1): 8-23.
[25] Barros C, Lloret E, Saquete E, et al. NATSUM: Narrative Abstractive Summarization Through Cross-Document Timeline Generation[J]. Information Processing & Management, 2019, 56(5): 1775-1793.
[26] Rudrapal D, Das A, Bhattacharya B. A New Approach for Twitter Event Summarization Based on Sentence Identification and Partial Textual Entailment[J]. Computación y Sistemas, 2019, 23(3): 1065-1078.
[27] Xu H Y, Liu H T, Zhang W, et al. Rating-Boosted Abstractive Review Summarization with Neural Personalized Generation[J]. Knowledge-Based Systems, 2021, 218: 106858.
[28] Marimont R B, Shapiro M B. Nearest Neighbour Searches and the Curse of Dimensionality[J]. IMA Journal of Applied Mathematics, 1979, 24(1): 59-70.
[29] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[30] Angelov D. Top2Vec: Distributed Representations of Topics[OL]. arXiv Preprint, arXiv: 2008.09470.
[31] Top2Vec[EB/OL]. [2021-06-30]. https://top2vec.readthedocs.io/en/latest/Top2Vec.html#benefits.
[32] McInnes L, Healy J, Saul N, et al. UMAP: Uniform Manifold Approximation and Projection[J]. Journal of Open Source Software, 2018, 3(29): 861.
[33] McInnes L, Healy J, Astels S. HDBSCAN: Hierarchical Density Based Clustering[J]. The Journal of Open Source Software, 2017, 2(11):205.
[34] Dong W, Moses C, Li K. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 577-586.
[35] Tutte W T. How to Draw a Graph[J]. Proceedings of the London Mathematical Society, 1963, 3(1): 743-768.
[36] Koren Y. Drawing Graphs by Eigenvectors: Theory and Practice[J]. Computers & Mathematics with Applications, 2005, 49(11-12): 1867-1888.
[37] McInnes L, Healy J, Astels S. How HDBSCAN Works[EB/OL]. [2021-03-16]. https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html#transform-the-space.
[38] 孙小军. 基于Prim算法的度约束最小生成树问题研究[J]. 内蒙古师范大学学报(自然科学汉文版), 2016, 45(4): 445-448.
[38] Sun Xiaojun. Research on Degree-Constrained Minimum Spanning Tree Problem Based on Prim Algorithm[J]. Journal of Inner Mongolia Normal University(Natural Science Edition), 2016, 45(4): 445-448.)
[39] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[40] 王帅, 赵翔, 李博, 等. TP-AS: 一种面向长文本的两阶段自动摘要方法[J]. 中文信息学报, 2018, 32(6): 71-79.
[40] (Wang Shuai, Zhao Xiang, Li Bo, et al. TP-AS: A Two-Phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018, 32(6): 71-79.)
[41] Byte Cup 2018国际机器学习竞赛夺冠记[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.
[41] (Byte Cup 2018 International Machine Learning Competition Winning Notes[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.)
[42] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[43] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[44] 胡吉明, 郑翔, 程齐凯, 等. 基于BiLSTM-CRF的政府微博舆论观点抽取与焦点呈现[J]. 情报理论与实践, 2021, 44(1): 174-179, 137.
[44] (Hu Jiming, Zheng Xiang, Cheng Qikai, et al. Public Opinion Extraction and Focus Presentation in Government Microblog Based on BiLSTM-CRF[J]. Information Studies: Theory & Application, 2021, 44(1): 174-179, 137.)
[45] CoLab[EB/OL]. [2021-03-16]. https://drive.google.com/drive/my-drive.
[46] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[OL]. arXiv Preprint,arXiv: 1310.4546.
[47] 冀中, 江俊杰. 基于解码器注意力机制的视频摘要[J]. 天津大学学报(自然科学与工程技术版), 2018, 51(10): 1023-1030.
[47] Ji Zhong, Jiang Junjie. Video Summarization Based on Decoder Attention Mechanism[J]. Journal of Tianjin University(Science and Technology), 2018, 51(10): 1023-1030.)
[48] 池军奇. 基于深度语义挖掘的标题生成技术研究与实现[D]. 北京: 北京邮电大学, 2019.
[48] (Chi Junqi. Headline Generation Based on Deep Semantic Mining[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
[49] Mist[EB/OL]. [2021-03-16]. https://www.mistgpu.com.
[50] GitHub. Transformer-Pointer-Generator[EB/OL]. [2021-07-03]. https://github.com/xiongma/transformer-pointer-generator.
[51] Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
[52] GitHub. Automatic-Generation-Of-Text-Summaries[EB/OL]. [2021-05-08]. https://github.com/ztz818/Automatic-generation-of-text-summaries.
[53] Lin J Y, Sun X, Ma S M, et al. Global Encoding for Abstractive Summarization[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 163-169.
[1] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[2] Wang Xiufang,Sheng Shu,Lu Yan. Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[3] Wang Xiwei,Zhang Liu,Li Shimeng,Wang Nan’axue. The Dissemination of Online Public Opinion on Social Welfare Issues via New Media: Case Study of “Draw up the Lifeline” in Sina Weibo[J]. 数据分析与知识发现, 2017, 1(6): 93-101.
[4] Hu Zhengyin, Fang Shu. Review on Text-based Patent Technology Mining[J]. 现代图书情报技术, 2014, 30(6): 62-70.
[5] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn