Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering
Hu Jiming1,2(),Zheng Xiang1
1School of Information Management, Wuhan University, Wuhan 430072, China 2Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China
[Objective] This paper tries to summarize interactive contents from new media for government affairs based on topic clustering, aiming to help the government effectively control public opinion events. [Methods] First, we analyzed the textual features of the interactive contents. Then, we generated abstracts for the contents with the Top2Vec, TextRank and Transformer-Copy algorithms. [Results] The proposed model’s ROUGE-1, ROUGE-2 and ROUGE-L values reached 22.05%, 6.93% and 20.96%, respectively, which were better than those of the Seq2Seq and Seq2Seq-Attention models. [Limitations] We only examined the new model with interactive contents on 10 draft laws and regulations from Sina Microblog. [Conclusions] The proposed method can summarize the topics and public opinion on specific events.
胡吉明, 郑翔. 基于主题聚类的新媒体政务互动内容摘要生成研究*[J]. 数据分析与知识发现, 2022, 6(6): 95-104.
Hu Jiming, Zheng Xiang. Abstracting Interactive Contents from New Media for Government Affairs Based on Topic Clustering. Data Analysis and Knowledge Discovery, 2022, 6(6): 95-104.
(Notice of the General Office of the State Council on the Implementation Rules of the Opinions on Comprehensively Promoting the Open Government Work[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2016-11/15/content_5132852.htm.)
(The General Office of the State Council on the Issuance of Further Deepening the “Internet + Government Services” to Promote the Implementation of the “One Network, One Door, One Time” Reform of Government Services Notice[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2018/content_5303434.htm.)
(Notice of the General Office of the State Council on the Issuance of the Main Points of Open Government Work in 2020[EB/OL]. [2021-05-22]. http://www.gov.cn/gongbao/content/2020/content_5528175.htm.)
(Opinions of the General Office of the State Council on Promoting the Healthy and Orderly Development of New Media for Government Affairs[EB/OL]. [2021-05-22]. http://www.gov.cn/zhengce/content/2018-12/27/content_5352666.htm.)
(Speech by Secretary General Xi Jinping at the Symposium on Internet Security and Informatization[EB/OL]. [2020-03-13]. http://www.cac.gov.cn/2016-04/25/c_1118731366.htm.)
(Li Tingting, Ji Donghong. Sentiment Analysis of Micro-Blog Based on SVM and CRF Using Various Combinations of Features[J]. Application Research of Computers, 2015, 32(4): 978-981.)
[8]
Rudrapal D, Das A, Bhattacharya B. A Survey on Automatic Twitter Event Summarization[J]. Journal of Information Processing Systems, 2018, 14(1): 79-100.
(Wang Lianxi. A Literature Review on Pre-Processing and Learning of Microtext[J]. Library and Information Service, 2013, 57(11): 125-131.)
doi: 10.7536/j.issn.0252-3116.2013.11.023
(Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
(Zhou Weixiang, Zhang Yangsen, Zhang Liang. Research on Topic Detection and Expression Method for Weibo Hot Events[J]. Application Research of Computers, 2019, 36(12): 3565-3569.)
[12]
刘一仝. 篇章级事件表示及相关性计算[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[12]
(Liu Yitong. Passage Level Event Representation and Relevance Computation[D]. Harbin: Harbin Institute of Technology, 2019.)
[13]
Belwal R C, Rai S, Gupta A. Text Summarization Using Topic-Based Vector Space Model and Semantic Measure[J]. Information Processing & Management, 2021, 58(3): 102536.
[14]
Ali S M, Noorian Z, Bagheri E, et al. Topic and Sentiment Aware Microblog Summarization for Twitter[J]. Journal of Intelligent Information Systems, 2020, 54(1): 129-156.
[15]
Ma Y, Li Q. A Weakly-Supervised Extractive Framework for Sentiment-Preserving Document Summarization[J]. World Wide Web, 2019, 22(4): 1401-1425.
(Yu Chuanming, Zheng Zhiliang, Zhu Xingyu, et al. Query-Oriented Opinion Summarization Model Using Debatepedia as Datasource[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 374-386.)
[17]
刘欣. 文本摘要自动生成的研究与实现[D]. 北京: 北京邮电大学, 2020.
[17]
(Liu Xin. Research and Implementation of Automatic Text Summarization[D]. Beijing: Beijing University of Posts and Telecommunications, 2020.)
[18]
Hu B T, Chen Q C, Zhu F Z. LCSTS: A Large Scale Chinese Short Text Summarization Dataset[OL]. arXiv Preprint, arXiv: 1506. 05865v4.
[19]
Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 3104-3112.
[20]
You F C, Zhao S, Chen J J. A Topic Information Fusion and Semantic Relevance for Text Summarization[J]. IEEE Access, 2020, 8: 178946-178953.
(Zhou Jian, Tian Xuan, Cui Xiaohui. Generation Method of Text Summarization Based on Advanced Sequence-to-Sequence Model[J]. Computer Engineering and Applications, 2019, 55(1): 128-134.)
[22]
Cibils A, Musat C, Hossman A, et al. Diverse Beam Search for Increased Novelty in Abstractive Summarization[OL]. arXiv Preprint, arXiv:1802.01457.
[23]
施云生. 基于序列到序列模型的生成式文本摘要研究[D]. 大连: 大连理工大学, 2020.
[23]
(Shi Yunsheng. Research on Abstract Text Summarization Based on Sequence to Sequence Model[D]. Dalian: Dalian University of Technology, 2020.)
[24]
Goyal P, Kaushik P, Gupta P, et al. Multilevel Event Detection, Storyline Generation, and Summarization for Tweet Streams[J]. IEEE Transactions on Computational Social Systems, 2020, 7(1): 8-23.
[25]
Barros C, Lloret E, Saquete E, et al. NATSUM: Narrative Abstractive Summarization Through Cross-Document Timeline Generation[J]. Information Processing & Management, 2019, 56(5): 1775-1793.
[26]
Rudrapal D, Das A, Bhattacharya B. A New Approach for Twitter Event Summarization Based on Sentence Identification and Partial Textual Entailment[J]. Computación y Sistemas, 2019, 23(3): 1065-1078.
[27]
Xu H Y, Liu H T, Zhang W, et al. Rating-Boosted Abstractive Review Summarization with Neural Personalized Generation[J]. Knowledge-Based Systems, 2021, 218: 106858.
[28]
Marimont R B, Shapiro M B. Nearest Neighbour Searches and the Curse of Dimensionality[J]. IMA Journal of Applied Mathematics, 1979, 24(1): 59-70.
[29]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[30]
Angelov D. Top2Vec: Distributed Representations of Topics[OL]. arXiv Preprint, arXiv: 2008.09470.
McInnes L, Healy J, Saul N, et al. UMAP: Uniform Manifold Approximation and Projection[J]. Journal of Open Source Software, 2018, 3(29): 861.
[33]
McInnes L, Healy J, Astels S. HDBSCAN: Hierarchical Density Based Clustering[J]. The Journal of Open Source Software, 2017, 2(11):205.
[34]
Dong W, Moses C, Li K. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 577-586.
[35]
Tutte W T. How to Draw a Graph[J]. Proceedings of the London Mathematical Society, 1963, 3(1): 743-768.
[36]
Koren Y. Drawing Graphs by Eigenvectors: Theory and Practice[J]. Computers & Mathematics with Applications, 2005, 49(11-12): 1867-1888.
[37]
McInnes L, Healy J, Astels S. How HDBSCAN Works[EB/OL]. [2021-03-16]. https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html#transform-the-space.
Sun Xiaojun. Research on Degree-Constrained Minimum Spanning Tree Problem Based on Prim Algorithm[J]. Journal of Inner Mongolia Normal University(Natural Science Edition), 2016, 45(4): 445-448.)
[39]
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
(Wang Shuai, Zhao Xiang, Li Bo, et al. TP-AS: A Two-Phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018, 32(6): 71-79.)
[41]
Byte Cup 2018国际机器学习竞赛夺冠记[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.
[41]
(Byte Cup 2018 International Machine Learning Competition Winning Notes[EB/OL]. [2021-03-16]. https://www.sohu.com/a/294634571_500659.)
[42]
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[43]
See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
(Hu Jiming, Zheng Xiang, Cheng Qikai, et al. Public Opinion Extraction and Focus Presentation in Government Microblog Based on BiLSTM-CRF[J]. Information Studies: Theory & Application, 2021, 44(1): 174-179, 137.)
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[OL]. arXiv Preprint,arXiv: 1310.4546.
Ji Zhong, Jiang Junjie. Video Summarization Based on Decoder Attention Mechanism[J]. Journal of Tianjin University(Science and Technology), 2018, 51(10): 1023-1030.)
[48]
池军奇. 基于深度语义挖掘的标题生成技术研究与实现[D]. 北京: 北京邮电大学, 2019.
[48]
(Chi Junqi. Headline Generation Based on Deep Semantic Mining[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
Lin J Y, Sun X, Ma S M, et al. Global Encoding for Abstractive Summarization[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 163-169.