Designing and Implementing Automatic Title Generation System for Sci-Tech Papers
Wang Yufei1,2,Zhang Zhixiong1,2(),Zhao Yang1,2,Zhang Mengting1,2,Li Xuesi1,2
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper designs an automatic title generation system based on Chinese sci-tech papers’ abstracts, aiming to help researchers compose better titles. [Methods] First, we constructed a large-scale training dataset based on the CSCD database. Then, we created a title generation model with the help of BERT-UniLM. Finally, we designed the system interface using HTTP protocol to enable open calls. [Results] The implemented system could generate titles for articles appropriately. [Limitations] Since the BERT model limits its maximum token length, our new system automatically truncates abstracts exceeding the length limits and might affect the title generation. [Conclusions] This paper provides convenient tools for researchers and literature services, and also benefits automatic generation of titles for other scientific and technological documents.
王宇飞, 张智雄, 赵旸, 张梦婷, 李雪思. 中文科技论文标题自动生成系统的设计与实现*[J]. 数据分析与知识发现, 2023, 7(2): 61-71.
Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi. Designing and Implementing Automatic Title Generation System for Sci-Tech Papers. Data Analysis and Knowledge Discovery, 2023, 7(2): 61-71.
Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
[7]
Chopra S, Auli M, Rush A M. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 93-98.
[8]
Takase S, Suzuki J, Okazaki N, et al. Neural Headline Generation on Abstract Meaning Representation[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 1054-1059.
(Qian Yili, Ma Xuewen. Text Headline Generation Based on Sentence-level LSTM Encoding[J]. Computer Applications and Software, 2021, 38(5): 190-195.)
[10]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[11]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[12]
Liu Y, Lapata M. Text Summarization with Pretrained Encoders[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3730-3740.
[13]
Lewis M, Liu Y H, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7871-7880.
[14]
Dong L, Yang N, Wang W H, et al. Unified Language Model Pre-training for Natural Language Understanding and Generation[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 13063-13075.
[15]
Dorr B, Zajic D, Schwartz R. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation[C]// Proceedings of the HLT-NAACL 03 on Text Summarization Workshop. 2003: 1-8.
[16]
Gatti L, Ozbal G, Guerini M, et al. Heady-lines: A creative generator of newspaper headlines[C]// Companion Publication of the 21st International Conference on Intelligent User Interfaces. 2016: 79-83.
(Cai Zhongxiang. Design and implementation of News Title Generation System of Party Building Based on Automatic Text Summarization[D]. Shenyang: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, 2020.)
(Zhang Zhixiong, Zhao Yang, Liu Huan. Construction of a Practical Application-Oriented Automatic Classification Engine for Scientific Literature[J]. Journal of Library Science in China, 2022, 48(4): 104-115.)
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[21]
Gong Y H, Liu X. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis[C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 19-25.
[22]
Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
[23]
Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002: 311-318.
[24]
Grinberg M. Flask Web Development: Developing Web Applications with Python[M]. Sebastopol, CA: O’Reilly Media Inc., 2014.
[25]
Li Z, Niu K, He Z Q. Generating Poetry Title Based on Semantic Relevance with Convolutional Neural Network[J]. IOP Conference Series: Materials Science and Engineering, 2017, 235: 012007.
doi: 10.1088/1757-899X/235/1/012007
[26]
Ayana, Shen S Q, Lin Y K, et al. Recent Advances on Neural Headline Generation[J]. Journal of Computer Science and Technology, 2017, 32: 768-784.
doi: 10.1007/s11390-017-1758-3
(Zhang Zhixiong, Liu Huan, Yu Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.)
doi: 10.13998/j.cnki.issn1002-1248.20-0797