|
|
Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering |
Hu Jiming1,2(),Zheng Xiang1 |
1School of Information Management, Wuhan University, Wuhan 430072, China 2Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper tries to summarize interactive contents from new media for government affairs based on topic clustering, aiming to help the government effectively control public opinion events. [Methods] First, we analyzed the textual features of the interactive contents. Then, we generated abstracts for the contents with the Top2Vec, TextRank and Transformer-Copy algorithms. [Results] The proposed model’s ROUGE-1, ROUGE-2 and ROUGE-L values reached 22.05%, 6.93% and 20.96%, respectively, which were better than those of the Seq2Seq and Seq2Seq-Attention models. [Limitations] We only examined the new model with interactive contents on 10 draft laws and regulations from Sina Microblog. [Conclusions] The proposed method can summarize the topics and public opinion on specific events.
|
Received: 27 August 2021
Published: 28 July 2022
|
|
Fund:National Natural Science Foundation of China(71874125);Young Top-notch Talent Cultivation Program of Hubei Province |
Corresponding Authors:
Hu Jiming
E-mail: hujiming@whu.edu.cn
|
[1] |
中国为什么有这么多政务新媒体?[EB/OL]. [2021-05-22]. http://www.gov.cn/xinwen/2019-10/11/content_5438342.htm.
|
[1] |
(Why are There So Many Government New Media in China? [EB/OL]. [2021-05-22]. http://www.gov.cn/xinwen/2019-10/11/content_5438342.htm.)
|
[2] |
国务院办公厅印发《关于全面推进政务公开工作的意见》实施细则的通知[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2016-11/15/content_5132852.htm.
|
[2] |
(Notice of the General Office of the State Council on the Implementation Rules of the Opinions on Comprehensively Promoting the Open Government Work[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2016-11/15/content_5132852.htm.)
|
[3] |
国务院办公厅关于印发进一步深化“互联网+政务服务”推进政务服务“一网、一门、一次”改革实施方案的通知(国办发〔2018〕45号)[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2018/content_5303434.htm.
|
[3] |
(The General Office of the State Council on the Issuance of Further Deepening the “Internet + Government Services” to Promote the Implementation of the “One Network, One Door, One Time” Reform of Government Services Notice[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2018/content_5303434.htm.)
|
[4] |
国务院办公厅关于印发2020年政务公开工作要点的通知(国办发〔2020〕17号)[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2020/content_5528175.htm.
|
[4] |
(Notice of the General Office of the State Council on the Issuance of the Main Points of Open Government Work in 2020[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2020/content_5528175.htm.)
|
[5] |
国务院办公厅关于推进政务新媒体健康有序发展的意见(国办发〔2018〕123号)[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2018-12/27/content_5352666.htm.
|
[5] |
(Opinions of the General Office of the State Council on Promoting the Healthy and Orderly Development of New Media for Government Affairs[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2018-12/27/content_5352666.htm.)
|
[6] |
习近平总书记在网络安全和信息化工作座谈会上的讲话[EB/OL]. [2020-03-13]. http://www.cac.gov.cn/2016-04/25/c_1118731366.htm.
|
[6] |
(Speech by Secretary General Xi Jinping at the Symposium on Internet Security and Informatization[EB/OL]. [2020-03-13]. http://www.cac.gov.cn/2016-04/25/c_1118731366.htm.)
|
[7] |
李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
|
[7] |
(Li Tingting, Ji Donghong. Sentiment Analysis of Micro-Blog Based on SVM and CRF Using Various Combinations of Features[J]. Application Research of Computers, 2015, 32(4): 978-981.)
|
[8] |
Rudrapal D, Das A, Bhattacharya B. A Survey on Automatic Twitter Event Summarization[J]. Journal of Information Processing Systems, 2018, 14(1): 79-100.
|
[9] |
王连喜. 微博短文本预处理及学习研究综述[J]. 图书情报工作, 2013, 57(11): 125-131.
doi: 10.7536/j.issn.0252-3116.2013.11.023
|
[9] |
(Wang Lianxi. A Literature Review on Pre-Processing and Learning of Microtext[J]. Library and Information Service, 2013, 57(11): 125-131.)
doi: 10.7536/j.issn.0252-3116.2013.11.023
|
[10] |
李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
|
[10] |
(Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
|
[11] |
周炜翔, 张仰森, 张良. 面向微博热点事件的话题检测及表述方法研究[J]. 计算机应用研究, 2019, 36(12): 3565-3569.
|
[11] |
(Zhou Weixiang, Zhang Yangsen, Zhang Liang. Research on Topic Detection and Expression Method for Weibo Hot Events[J]. Application Research of Computers, 2019, 36(12): 3565-3569.)
|
[12] |
刘一仝. 篇章级事件表示及相关性计算[D]. 哈尔滨: 哈尔滨工业大学, 2019.
|
[12] |
(Liu Yitong. Passage Level Event Representation and Relevance Computation[D]. Harbin: Harbin Institute of Technology, 2019.)
|
[13] |
Belwal R C, Rai S, Gupta A. Text Summarization Using Topic-Based Vector Space Model and Semantic Measure[J]. Information Processing & Management, 2021, 58(3): 102536.
|
[14] |
Ali S M, Noorian Z, Bagheri E, et al. Topic and Sentiment Aware Microblog Summarization for Twitter[J]. Journal of Intelligent Information Systems, 2020, 54(1): 129-156.
|
[15] |
Ma Y, Li Q. A Weakly-Supervised Extractive Framework for Sentiment-Preserving Document Summarization[J]. World Wide Web, 2019, 22(4): 1401-1425.
|
[16] |
余传明, 郑智梁, 朱星宇, 等. 面向查询的观点摘要模型研究:以Debatepedia为数据源[J]. 情报学报, 2020, 39(4): 374-386.
|
[16] |
(Yu Chuanming, Zheng Zhiliang, Zhu Xingyu, et al. Query-Oriented Opinion Summarization Model Using Debatepedia as Datasource[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 374-386.)
|
[17] |
刘欣. 文本摘要自动生成的研究与实现[D]. 北京: 北京邮电大学, 2020.
|
[17] |
(Liu Xin. Research and Implementation of Automatic Text Summarization[D]. Beijing: Beijing University of Posts and Telecommunications, 2020.)
|
[18] |
Hu B T, Chen Q C, Zhu F Z. LCSTS: A Large Scale Chinese Short Text Summarization Dataset[OL]. arXiv Preprint, arXiv: 1506. 05865v4.
|
[19] |
Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 3104-3112.
|
[20] |
You F C, Zhao S, Chen J J. A Topic Information Fusion and Semantic Relevance for Text Summarization[J]. IEEE Access, 2020, 8: 178946-178953.
|
[21] |
周健, 田萱, 崔晓晖. 基于改进Sequence-to-Sequence模型的文本摘要生成方法[J]. 计算机工程与应用, 2019, 55(1): 128-134.
|
[21] |
(Zhou Jian, Tian Xuan, Cui Xiaohui. Generation Method of Text Summarization Based on Advanced Sequence-to-Sequence Model[J]. Computer Engineering and Applications, 2019, 55(1): 128-134.)
|
[22] |
Cibils A, Musat C, Hossman A, et al. Diverse Beam Search for Increased Novelty in Abstractive Summarization[OL]. arXiv Preprint, arXiv:1802.01457.
|
[23] |
施云生. 基于序列到序列模型的生成式文本摘要研究[D]. 大连: 大连理工大学, 2020.
|
[23] |
(Shi Yunsheng. Research on Abstract Text Summarization Based on Sequence to Sequence Model[D]. Dalian: Dalian University of Technology, 2020.)
|
[24] |
Goyal P, Kaushik P, Gupta P, et al. Multilevel Event Detection, Storyline Generation, and Summarization for Tweet Streams[J]. IEEE Transactions on Computational Social Systems, 2020, 7(1): 8-23.
|
[25] |
Barros C, Lloret E, Saquete E, et al. NATSUM: Narrative Abstractive Summarization Through Cross-Document Timeline Generation[J]. Information Processing & Management, 2019, 56(5): 1775-1793.
|
[26] |
Rudrapal D, Das A, Bhattacharya B. A New Approach for Twitter Event Summarization Based on Sentence Identification and Partial Textual Entailment[J]. Computación y Sistemas, 2019, 23(3): 1065-1078.
|
[27] |
Xu H Y, Liu H T, Zhang W, et al. Rating-Boosted Abstractive Review Summarization with Neural Personalized Generation[J]. Knowledge-Based Systems, 2021, 218: 106858.
|
[28] |
Marimont R B, Shapiro M B. Nearest Neighbour Searches and the Curse of Dimensionality[J]. IMA Journal of Applied Mathematics, 1979, 24(1): 59-70.
|
[29] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[30] |
Angelov D. Top2Vec: Distributed Representations of Topics[OL]. arXiv Preprint, arXiv: 2008.09470.
|
[31] |
Top2Vec[EB/OL]. [2021-06-30]. https://top2vec.readthedocs.io/en/latest/Top2Vec.html#benefits.
|
[32] |
McInnes L, Healy J, Saul N, et al. UMAP: Uniform Manifold Approximation and Projection[J]. Journal of Open Source Software, 2018, 3(29): 861.
|
[33] |
McInnes L, Healy J, Astels S. HDBSCAN: Hierarchical Density Based Clustering[J]. The Journal of Open Source Software, 2017, 2(11):205.
|
[34] |
Dong W, Moses C, Li K. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 577-586.
|
[35] |
Tutte W T. How to Draw a Graph[J]. Proceedings of the London Mathematical Society, 1963, 3(1): 743-768.
|
[36] |
Koren Y. Drawing Graphs by Eigenvectors: Theory and Practice[J]. Computers & Mathematics with Applications, 2005, 49(11-12): 1867-1888.
|
[37] |
McInnes L, Healy J, Astels S. How HDBSCAN Works[EB/OL]. [2021-03-16]. https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html#transform-the-space.
|
[38] |
孙小军. 基于Prim算法的度约束最小生成树问题研究[J]. 内蒙古师范大学学报(自然科学汉文版), 2016, 45(4): 445-448.
|
[38] |
Sun Xiaojun. Research on Degree-Constrained Minimum Spanning Tree Problem Based on Prim Algorithm[J]. Journal of Inner Mongolia Normal University(Natural Science Edition), 2016, 45(4): 445-448.)
|
[39] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
|
[40] |
王帅, 赵翔, 李博, 等. TP-AS: 一种面向长文本的两阶段自动摘要方法[J]. 中文信息学报, 2018, 32(6): 71-79.
|
[40] |
(Wang Shuai, Zhao Xiang, Li Bo, et al. TP-AS: A Two-Phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018, 32(6): 71-79.)
|
[41] |
Byte Cup 2018国际机器学习竞赛夺冠记[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.
|
[41] |
(Byte Cup 2018 International Machine Learning Competition Winning Notes[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.)
|
[42] |
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
|
[43] |
See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
|
[44] |
胡吉明, 郑翔, 程齐凯, 等. 基于BiLSTM-CRF的政府微博舆论观点抽取与焦点呈现[J]. 情报理论与实践, 2021, 44(1): 174-179, 137.
|
[44] |
(Hu Jiming, Zheng Xiang, Cheng Qikai, et al. Public Opinion Extraction and Focus Presentation in Government Microblog Based on BiLSTM-CRF[J]. Information Studies: Theory & Application, 2021, 44(1): 174-179, 137.)
|
[45] |
CoLab[EB/OL]. [2021-03-16]. https://drive.google.com/drive/my-drive.
|
[46] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[OL]. arXiv Preprint,arXiv: 1310.4546.
|
[47] |
冀中, 江俊杰. 基于解码器注意力机制的视频摘要[J]. 天津大学学报(自然科学与工程技术版), 2018, 51(10): 1023-1030.
|
[47] |
Ji Zhong, Jiang Junjie. Video Summarization Based on Decoder Attention Mechanism[J]. Journal of Tianjin University(Science and Technology), 2018, 51(10): 1023-1030.)
|
[48] |
池军奇. 基于深度语义挖掘的标题生成技术研究与实现[D]. 北京: 北京邮电大学, 2019.
|
[48] |
(Chi Junqi. Headline Generation Based on Deep Semantic Mining[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
|
[49] |
Mist[EB/OL]. [2021-03-16]. https://www.mistgpu.com.
|
[50] |
GitHub. Transformer-Pointer-Generator[EB/OL]. [2021-07-03]. https://github.com/xiongma/transformer-pointer-generator.
|
[51] |
Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
|
[52] |
GitHub. Automatic-Generation-Of-Text-Summaries[EB/OL]. [2021-05-08]. https://github.com/ztz818/Automatic-generation-of-text-summaries.
|
[53] |
Lin J Y, Sun X, Ma S M, et al. Global Encoding for Abstractive Summarization[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 163-169.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|