|
|
Designing and Implementing Automatic Title Generation System for Sci-Tech Papers |
Wang Yufei1,2,Zhang Zhixiong1,2(),Zhao Yang1,2,Zhang Mengting1,2,Li Xuesi1,2 |
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This paper designs an automatic title generation system based on Chinese sci-tech papers’ abstracts, aiming to help researchers compose better titles. [Methods] First, we constructed a large-scale training dataset based on the CSCD database. Then, we created a title generation model with the help of BERT-UniLM. Finally, we designed the system interface using HTTP protocol to enable open calls. [Results] The implemented system could generate titles for articles appropriately. [Limitations] Since the BERT model limits its maximum token length, our new system automatically truncates abstracts exceeding the length limits and might affect the title generation. [Conclusions] This paper provides convenient tools for researchers and literature services, and also benefits automatic generation of titles for other scientific and technological documents.
|
Received: 05 September 2022
Published: 28 March 2023
|
|
Fund:Project of Literature and Information Capacity Building, Chinese Academy of Sciences(E0290906) |
Corresponding Authors:
Zhang Zhixiong,ORCID:0000-0003-1596-7487,E-mail: zhangzhx@mail.las.ac.cn。
|
[6] |
Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
|
[7] |
Chopra S, Auli M, Rush A M. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 93-98.
|
[8] |
Takase S, Suzuki J, Okazaki N, et al. Neural Headline Generation on Abstract Meaning Representation[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 1054-1059.
|
[9] |
钱揖丽, 马雪雯. 基于句子级LSTM编码的文本标题生成[J]. 计算机应用与软件, 2021, 38(5): 190-195.
|
[9] |
(Qian Yili, Ma Xuewen. Text Headline Generation Based on Sentence-level LSTM Encoding[J]. Computer Applications and Software, 2021, 38(5): 190-195.)
|
[10] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
|
[11] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[12] |
Liu Y, Lapata M. Text Summarization with Pretrained Encoders[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3730-3740.
|
[13] |
Lewis M, Liu Y H, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7871-7880.
|
[14] |
Dong L, Yang N, Wang W H, et al. Unified Language Model Pre-training for Natural Language Understanding and Generation[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 13063-13075.
|
[15] |
Dorr B, Zajic D, Schwartz R. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation[C]// Proceedings of the HLT-NAACL 03 on Text Summarization Workshop. 2003: 1-8.
|
[16] |
Gatti L, Ozbal G, Guerini M, et al. Heady-lines: A creative generator of newspaper headlines[C]// Companion Publication of the 21st International Conference on Intelligent User Interfaces. 2016: 79-83.
|
[17] |
蔡中祥. 基于自动文本摘要的党建新闻标题生成系统的设计与实现[D]. 沈阳: 中国科学院沈阳计算技术研究所, 2020.
|
[17] |
(Cai Zhongxiang. Design and implementation of News Title Generation System of Party Building Based on Automatic Text Summarization[D]. Shenyang: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, 2020.)
|
[18] |
张智雄, 赵旸, 刘欢. 构建面向实际应用的科技文献自动分类引擎[J]. 中国图书馆学报, 2022, 48(4): 104-115.
|
[18] |
(Zhang Zhixiong, Zhao Yang, Liu Huan. Construction of a Practical Application-Oriented Automatic Classification Engine for Scientific Literature[J]. Journal of Library Science in China, 2022, 48(4): 104-115.)
|
[19] |
中国科学文献服务系统[EB/OL].[2022-07-08]. http://sciencechina.cn/.
|
[19] |
(ScienceChina[EB/OL].[2022-07-08]. http://sciencechina.cn/.)
|
[20] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
|
[21] |
Gong Y H, Liu X. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis[C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 19-25.
|
[22] |
Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
|
[23] |
Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002: 311-318.
|
[24] |
Grinberg M. Flask Web Development: Developing Web Applications with Python[M]. Sebastopol, CA: O’Reilly Media Inc., 2014.
|
[25] |
Li Z, Niu K, He Z Q. Generating Poetry Title Based on Semantic Relevance with Convolutional Neural Network[J]. IOP Conference Series: Materials Science and Engineering, 2017, 235: 012007.
doi: 10.1088/1757-899X/235/1/012007
|
[26] |
Ayana, Shen S Q, Lin Y K, et al. Recent Advances on Neural Headline Generation[J]. Journal of Computer Science and Technology, 2017, 32: 768-784.
doi: 10.1007/s11390-017-1758-3
|
[27] |
张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报, 2021, 33(1): 17-31.
doi: 10.13998/j.cnki.issn1002-1248.20-0797
|
[27] |
(Zhang Zhixiong, Liu Huan, Yu Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.)
doi: 10.13998/j.cnki.issn1002-1248.20-0797
|
[28] |
科技文献知识人工智能引擎[EB/OL]. [2022-07-08]. http://sciengine.las.ac.cn/.
|
[28] |
(SciAIEngine[EB/OL]. [2022-07-08]. http://sciengine.las.ac.cn/.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|