|
|
Constructing Smart Consulting Q&A System Based on Machine Reading Comprehension |
Wang Yihu,Bai Haiyan() |
Institute of Scientific and Technical Information of China, Beijing 100038, China |
|
|
Abstract [Objective] This paper aims to improve the smart consulting systems to effectively answer academic questions. [Methods] We utilized deep learning, machine reading comprehension, data augmentation, information retrieval, and semantic similarity techniques to construct datasets and an academic knowledge question-answering system. Additionally, we designed a multi-paragraph recall metric to address the characteristics of academic literature and enhance retrieval accuracy with multidimensional features. [Results] Our new model’s ROUGE-L score reached 0.7338, with a question-answering accuracy of 88.65% and a multi-paragraph recall metric accuracy of 88.38%. [Limitations] We only examined the new model with single-domain content, which may limit the system’s performance in dealing with complex issues involving multiple domains. [Conclusions] The deep integration of machine reading comprehension technology with reference services can enhance the efficiency and sharing of academic resources and provide more comprehensive and accurate information support for researchers.
|
Received: 12 April 2023
Published: 08 January 2024
|
|
Fund:Innovation Research Fund Youth Project of Institute of Scientific and Technical Information of China(QN2023-11) |
Corresponding Authors:
Bai Haiyan,ORCID:0000-0002-9552-3845,E-mail: bhy@istic.ac.cn。
|
[1] |
周泰冰, 刘文云. 我国公共图书馆数字参考咨询服务比较分析[J]. 情报理论与实践, 2010, 33(12): 84-87.
|
[1] |
(Zhou Taibing, Liu Wenyun. Comparative Analysis of Digital Reference Service in Public Libraries in China[J]. Information Studies: Theory & Application, 2010, 33(12): 84-87.)
|
[2] |
刘泽, 邵波, 王怡. 数据驱动下图书馆智慧参考咨询服务模式研究[J]. 情报理论与实践, 2023, 46(5): 176-184.
|
[2] |
(Liu Ze, Shao Bo, Wang Yi. Research on a Data-Driven Model for Smart Reference Service in Libraries[J]. Information Studies: Theory & Application, 2023, 46(5): 176-184.)
|
[3] |
刘青华, 谭红英. 国外参考咨询服务形式浅谈[J]. 情报科学, 2002, 20(6): 590-594.
|
[3] |
(Liu Qinghua, Tan Hongying. The Reference Service Patterns of Foreign Libraries[J]. Information Science, 2002, 20(6): 590-594.)
|
[4] |
Liu S S, Zhang X, Zhang S, et al. Neural Machine Reading Comprehension: Methods and Trends[J]. Applied Sciences, 2019, 9(18): Article No.3698.
|
[5] |
Qiu B Y, Chen X, Xu J G, et al. A Survey on Neural Machine Reading Comprehension[OL]. arXiv Preprint, arXiv: 1906.03824.
|
[6] |
Li P, Li W, He Z Y, et al. Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering[OL]. arXiv Preprint, arXiv: 1607.06275.
|
[7] |
Cui Y M, Liu T, Che W X, et al. A Span-Extraction Dataset for Chinese Machine Reading Comprehension[OL]. arXiv Preprint, arXiv: 1810.07366.
|
[8] |
ChineseSquad:中文机器阅读理解数据集[EB/OL]. [2023-05-25]. https://github.com/pluto-junzeng/ChineseSquad.
|
[8] |
(ChineseSquad: Chinese Machine Reading Comprehension Dataset[EB/OL]. [2023-05-25]. https://github.com/pluto-junzeng/ChineseSquad.)
|
[9] |
中医文献问题生成数据集[EB/OL]. [2023-05-25]. https://tianchi.aliyun.com/dataset/86895.
|
[9] |
(Question Generation Dataset from Texts of Traditional Chinese Medicine[EB/OL]. [2023-05-25]. https://tianchi.aliyun.com/dataset/86895.)
|
[10] |
疫情政务问答助手[EB/OL]. [2023-05-25]. https://www.datafountain.cn/competitions/424/datasets.
|
[10] |
(Epidemic Government Q&A Assistant[EB/OL]. [2023-05-25]. https://www.datafountain.cn/competitions/424/datasets.)
|
[11] |
Seo M, Kembhavi A, Farhadi A, et al. Bidirectional Attention Flow for Machine Comprehension[OL]. arXiv Preprint, arXiv: 1611.01603.
|
[12] |
Yu A W, Dohan D, Luong M T, et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension[OL]. arXiv Preprint, arXiv: 1804.09541.
|
[13] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[14] |
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
|
[15] |
Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
|
[16] |
Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[OL]. arXiv Preprint, arXiv: 2004.13922.
|
[17] |
Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM, 2020: 1877-1901.
|
[18] |
Wang S H, Sun Y, Xiang Y, et al. ERNIE 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2112.12731.
|
[19] |
Zeng W, Ren X Z, Su T, et al. PanGu-α: Large-Scale Autoregressive Pretrained Chinese Language Models with Auto-Parallel Computation[OL]. arXiv Preprint, arXiv: 2104.12369.
|
[20] |
Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models[OL]. arXiv Preprint, arXiv: 2302.13971.
|
[21] |
张华平, 李林翰, 李春锦. ChatGPT中文性能测评与风险应对[J]. 数据分析与知识发现, 2023, 7(3): 16-25.
|
[21] |
(Zhang Huaping, Li Linhan, Li Chunjin. ChatGPT Performance Evaluation on Chinese Language and Risk Measures[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 16-25.)
|
[22] |
McNeal M, Newyear D. Chatbots: Automating Reference in Public Libraries[M]// Iglesias E. Robots in Academic Libraries:Advancements in Library Automation. Pennsylvania: IGI Global, 2013: 101-114.
|
[23] |
Allison D. Chatbots in the Library: Is It Time?[J]. Library Hi Tech, 2012, 30(1): 95-107.
|
[24] |
顾德南. NSTL数字化参考咨询服务初探[J]. 图书情报工作, 2004, 48(1): 19-22.
|
[24] |
(Gu Denan. A Discussion on the Digital Reference Services of the National Science and Technology Library[J]. Library and Information Service, 2004, 48(1): 19-22.)
|
[25] |
姚飞, 张成昱, 陈武. 清华智能聊天机器人“小图”的移动应用[J]. 现代图书情报技术, 2014(7/8):120-126.
|
[25] |
(Yao Fei, Zhang Chengyu, Chen Wu. The Mobile Application of ‘Xiaotu’—The Smart Talking Robot of Tsinghua University Library[J]. New Technology of Library and Information Service, 2014(7/8):120-126.)
|
[26] |
孙翌, 李鲍, 曲建峰. 图书馆智能化IM咨询机器人的设计与实现[J]. 现代图书情报技术, 2011(5): 88-92.
|
[26] |
(Sun Yi, Li Bao, Qu Jianfeng. Design and Implementation of Library Intelligent IM Reference Robot[J]. New Technology of Library and Information Service, 2011(5): 88-92.)
|
[27] |
McKie I A S, Narayan B. Enhancing the Academic Library Experience with Chatbots: An Exploration of Research and Implications for Practice[J]. Journal of the Australian Library and Information Association, 2019, 68(3): 268-277.
|
[28] |
王毅, 罗军. 中美图书馆咨询知识库比较研究[J]. 图书情报工作, 2010, 54(17): 40-44.
|
[28] |
(Wang Yi, Luo Jun. A Comparative Study on the Library Reference Knowledge Base Between China and U.S.A[J]. Library and Information Service, 2010, 54(17): 40-44.)
|
[29] |
胡潇戈, 戚越, 王玉琦, 等. 面向智能问答的图书馆参考咨询知识库体系设计及构建[J]. 图书情报知识, 2019(5): 101-108.
|
[29] |
(Hu Xiaoge, Qi Yue, Wang Yuqi, et al. Design and Construction of Intelligent Question-Answer Knowledge Base for Library Reference and Consultancy[J]. Documentation, Information & Knowledge, 2019(5): 101-108.)
|
[30] |
刘泽, 徐潇洁, 邵波. 基于多策略混合问答系统模型的图书馆咨询机器人的设计与应用[J]. 新世纪图书馆, 2022(5): 43-49.
|
[30] |
(Liu Ze, Xu Xiaojie, Shao Bo. Design and Application of Library Consultation Robot Based on Multi-Strategy Mixed Question Answering System Model[J]. New Century Library, 2022(5): 43-49.)
|
[31] |
李文江, 陈诗琴. AIMLBot智能机器人在实时虚拟参考咨询中的应用[J]. 现代图书情报技术, 2012(7/8):127-132.
|
[31] |
(Li Wenjiang, Chen Shiqin. Application of AIMLBot Intelligent Robot in Real-Time Virtual Reference Service[J]. New Technology of Library and Information Service, 2012(7/8):127-132.)
|
[32] |
苏剑林. 鱼与熊掌兼得:融合检索和生成的SimBERT模型[EB/OL]. [2023-02-18]. https://kexue.fm/archives/7427.
|
[32] |
(Su Jianlin. Have Your Cake and Eat It: A SimBERT Model for Fusion Retrieval and Generation[EB/OL]. [2023-02-18]. https://kexue.fm/archives/7427.)
|
[33] |
Bao H B, Dong L, Wei F R, et al. UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training[C]// Proceedings of the 37th International Conference on Machine Learning. ACM, 2020: 642-652.
|
[34] |
Luo R X, Xu J J, Zhang Y, et al. PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation[OL]. arXiv Preprint, arXiv: 1906.11455.
|
[35] |
Salton G, Buckley C. Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
|
[36] |
Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond[J]. Foundations and Trends in Information Retrieval, 2009, 3(4): 333-389.
|
[37] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
|
[38] |
Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
|
[39] |
杨飞洪. 面向中文临床自然语言处理的BERT模型研究[D]. 北京: 北京协和医学院, 2021.
|
[39] |
(Yang Feihong. Research on BERT Model for Chinese Clinical Language Processing[D]. Beijing: Peking Union Medical College, 2021.)
|
[40] |
Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[M]// Text Summarization Branches Out. Association for Computational Linguistics, 2004: 74-81.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|