1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Key Laboratory of New Publishing and Knowledge Services for Scholarly Journals, Beijing 100190, China
[Objective] This paper aims to comprehensively review and summarize the current development status, mechanism principles, and application trends of question-answering techniques based on large language models. [Coverage] We retrieved a total of 73 relevant papers. [Methods] The study systematically reviews the development status of large language models and efficient parameter fine-tuning strategies. It analyzes the principles, mechanisms, application value, and existing issues of various techniques. It focuses on retrieval-enhanced generation question-answering inference for simple questions and prompt engineering question inference for complex questions. Through qualitative analysis, the research progress of question-answering techniques based on large language models is comprehensively summarized, and future research directions are proposed. [Results] Open-sourced pre-trained large language models continue to emerge, and efficient fine-tuning strategies can significantly improve model adaptability in vertical domains. Retrieval-augmented generation techniques, aided by text embeddings and approximate nearest neighbor retrieval technology, effectively enhance the interpretability and credibility of question-answering. With carefully crafted prompt engineering, the inference capabilities of large models for complex questions can be significantly expanded. [Limitations] The rapid development of research related to large models may result in incomplete coverage of relevant survey work. [Conclusions] Question-answering techniques based on large language models have made remarkable progress in semantic representation, complex reasoning, and other aspects. Retrieval-enhanced generation techniques and prompt engineering, which integrate external knowledge, are the main research hotspots in large models. Future research may focus on exploring aspects such as controllable and credible content generation.
文森, 钱力, 胡懋地, 常志军. 基于大语言模型的问答技术研究进展综述*[J]. 数据分析与知识发现, 2024, 8(6): 16-29.
Wen Sen, Qian Li, Hu Maodi, Chang Zhijun. Review of Research Progress on Question-Answering Techniques Based on Large Language Models. Data Analysis and Knowledge Discovery, 2024, 8(6): 16-29.
Tony Hey, Stewart Tansley, Kristin Tolle. 第四范式: 数据密集型科学发现[M]. 潘教峰, 张晓林等译. 北京: 科学出版社, 2012.
[1]
(Tony Hey, Stewart Tansley, Kristin Tolle. The Fourth Paradigm: Data-Intensive Scientific Discovery[M]. Translated by Pan Jiaofeng, Zhang Xiaolin, et al. Beijing: Science Press, 2012.)
[2]
Bawden D, Robinson L. The Dark Side of Information: Overload, Anxiety and Other Paradoxes and Pathologies[J]. Journal of Information Science, 2009, 35(2): 180-191.
[3]
Green B F, Wolf A K, Chomsky C, et al. Baseball: An Automatic Question-Answerer[C]// Proceedings of Western Joint IRE-AIEE-ACM Computer Conference. 1961: 219-224.
[4]
Woods W A. Lunar Rocks in Natural English: Explorations in Natural Language Question Answering[M]// Linguistic Structures Processing. Amsterdam: North Holland. 1977: 521-569.
[5]
Li X, Roth D. Learning Question Classifiers[C]// Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.
[6]
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[7]
Tellex S, Katz B, Lin J, et al. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 41-47.
[8]
Wallace R S. The Anatomy of A.L.I.C.E.[M]// Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer. Berlin: Springer Netherlands, 2009.
[9]
Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning[C]// Proceedings of the 48th annual meeting of the association for computational linguistics. 2010: 384-394.
[10]
Jeon J, Croft W B, Lee J H. Finding Similar Questions in Large Question and Answer Archives[C]// Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005: 84-90.
[11]
Chen B, Sun L, Han X P. Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing[OL]. arXiv Preprint, arXiv:1809.00773.
[12]
Shen W, Wang J Y, Han J W. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 443-460.
[13]
Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of COLING 2014 - 25th International Conference on Computational Linguistics. 2014: 2335-2344.
[14]
Berant J, Chou A, Frostig R, et al. Semantic Parsing on Freebase from Question-Answer Pairs[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1533-1544.
[15]
Singhal A. Introducing the Knowledge Graph: Things, not Strings[EB/OL].[2023-09-01]. Official Google Blog, https://blog.google/products/search/introducing-knowledge-graph-things-not/.
[16]
Liu Y H, Han T L, Ma S Y, et al. Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models[OL]. arXiv Preprint, arXiv:2304.01852.
[17]
Carlini N, Tramer F, Wallace E, et al. Extracting Training Data from Large Language Models[C]// Proceedings of the 30th USENIX Security Symposium. 2021: 2633-2650.
[18]
Zhao W X, Zhou K, Li J Y, et al. A Survey of Large Language Models[OL]. arXiv Preprint, arXiv:2303.18223.
[19]
Hoffmann J, Borgeaud S, Mensch A, et al. Training Compute-Optimal Large Language Models[OL]. arXiv Preprint, arXiv:2203.15556.
(Qian Li, Liu Yi, Zhang Zhixiong, et al. An Analysis on the Basic Technologies of ChatGPT[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 6-15.)
[21]
Chowdhery A, Narang S R, Devlin J, et al. PaLM: Scaling Language Modeling with Pathways[OL]. arXiv Preprint, arXiv: 2204.02311.
[22]
Chung H W, Hou L, Longpre S, et al. Scaling Instruction-Finetuned Language Models[OL]. arXiv Preprint, arXiv: 2210.11416.
[23]
Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models[OL]. arXiv Preprint, arXiv: 2302.13971.
[24]
Penedo G, Malartic Q, Hesslow D, et al. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only[OL]. arXiv Preprint, arXiv: 2306.01116.
[25]
Touvron H, Martin L, Stone K, et al. LLaMA2: Open Foundation and Fine-Tuned Chat Models[OL]. arXiv Preprint, arXiv: 2307.09288.
[26]
Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[OL]. [2023-09-01]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[27]
Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[J]. OpenAI Blog, 2019, 1(8): 9.
[28]
Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 1877-1901.
[29]
Wei J, Tay Y, Bommasani R, et al. Emergent Abilities of Large Language Models[OL]. arXiv Preprint, arXiv: 2206.07682.
[30]
Fedus W, Zoph B, Shazeer N. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity[J]. The Journal of Machine Learning Research, 2022, 23(1): 5232-5270.
[31]
Chen S Y, Hou Y T, Cui Y M, et al. Recall and Learn: Fine-Tuning Deep Pretrained Language Models with Less Forgetting[OL]. arXiv Preprint, arXiv:2004.12651.
[32]
Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-Efficient Transfer Learning for NLP[C]// Proceedings of the 36th International Conference on Machine Learning. 2019: 2790-2799.
[33]
Li X L, Liang P. Prefix-Tuning: Optimizing Continuous Prompts for Generation[OL]. arXiv Preprint, arXiv: 2101.00190.
[34]
Liu X, Zheng Y N, Du Z X, et al. GPT Understands, Too[OL]. arXiv Preprint, arXiv: 2103.10385.
[35]
Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning[OL]. arXiv Preprint, arXiv: 2104.08691.
[36]
Hu E J, Shen Y L, Wallis P, et al. LoRA: Low-Rank Adaptation of Large Language Models[OL]. arXiv Preprint, arXiv: 2106.09685.
[37]
Liu X, Ji K X, Fu Y C, et al. P-Tuning v2: Prompt Tuning can be Comparable to Fine-Tuning Universally across Scales and Tasks[OL]. arXiv Preprint, arXiv: 2110.07602.
[38]
Zhang Q R, Chen M S, Bukharin A, et al. AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning[OL]. arXiv Preprint, arXiv: 2303.10512.
[39]
Dettmers T, Pagnoni A, Holtzman A, et al. QLoRA: Efficient Finetuning of Quantized LLMS[OL]. arXiv Preprint, arXiv: 2305.14314.
[40]
Pfeiffer J, Vulić I, Gurevych I, et al. MAD-X: An Adapter-Based Framework for Multi-task Cross-Lingual Transfer[OL]. arXiv Preprint, arXiv: 2005.00052.
[41]
He J X, Zhou C T, Ma X Z, et al. Towards a Unified View of Parameter-Efficient Transfer Learning[OL]. arXiv Preprint, arXiv: 2110.04366.
[42]
Lewis P, Perez E, Piktus A, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 9459-9474.
[43]
Cambria E, White B. Jumping NLP Curves: A Review of Natural Language Processing Research[J]. IEEE Computational Intelligence Magazine, 2014, 9(2): 48-57.
(Zhao Yueyang, Cui Lei. Progress in Research and Application of Text Embedding Technology[J]. Frontiers of Data & Computing, 2023, 5(3): 92-110.)
[45]
Harris Z S. Distributional Structure[J]. WORD, 1954, 10(2-3): 146-162.
[46]
Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model[J]. The Journal of Machine Learning Research, 2003, 3: 1137-1155.
[47]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[48]
Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[49]
McCann B, Bradbury J, Xiong C M, et al. Learned in Translation: Contextualized Word Vectors[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6297-6308.
[50]
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv: 1802.05365.
[51]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[52]
Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 5753-5763.
[53]
Sun Y, Wang S H, Feng S K, et al. ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2107.02137.
[54]
Su H J, Shi W J, Kasai J, et al. One Embedder, Any Task: Instruction-Finetuned Text Embeddings[OL]. arXiv Preprint, arXiv: 2212.09741.
[55]
Wang L, Yang N, Huang X L, et al. Text Embeddings by Weakly-Supervised Contrastive Pre-training[OL]. arXiv Preprint, arXiv: 2212.03533.
[56]
Muennighoff N, Tazi N, Magne L, et al. MTEB: Massive Text Embedding Benchmark[OL]. arXiv Preprint, arXiv: 2210.07316.
[57]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013: 3111-3119.
[58]
Ram P, Gray A G. Maximum Inner-Product Search Using Cone Trees[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 931-939.
[59]
刘英帆. 基于局部敏感哈希的近似最近邻查询研究[D]. 西安: 西安电子科技大学, 2014.
[59]
(Liu Yingfan. Research of Approximate Nearest Neighbor Search Based on Locality Sensitive Hashing[D]. Xi’an: Xidian University, 2014.)
[60]
陈健. 基于空间划分的搜索算法[D]. 济南: 山东大学, 2005.
[60]
(Chen Jian. Search Algorithm Based on Space Division[D]. Jinan: Shandong University, 2005.)
[61]
Jégou H, Douze M, Schmid C. Product Quantization for Nearest Neighbor Search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128.
doi: 10.1109/TPAMI.2010.57
pmid: 21088323
[62]
Li W, Zhang Y, Sun Y F, et al. Approximate Nearest Neighbor Search on High Dimensional Data—Experiments, Analyses, and Improvement[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(8): 1475-1488.
[63]
Wei J, Wang X Z, Schuurmans D, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models[OL]. arXiv Preprint, arXiv: 2201.11903.
[64]
Wang X Z, Wei J, Schuurmans D, et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models[OL]. arXiv Preprint, arXiv: 2203.11171.
[65]
Liu B, Jiang Y Q, Zhang X H, et al. LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency[OL]. arXiv Preprint, arXiv: 2304.11477.
[66]
Yao S Y, Yu D, Zhao J, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models[OL]. arXiv Preprint, arXiv: 2305.10601.
[67]
Ning X F, Lin Z N, Zhou Z X, et al. Skeleton-of-Thought: Large Language Models can do Parallel Decoding[OL]. arXiv Preprint, arXiv: 2307.15337.
[68]
Besta M, Blach N, Kubicek A, et al. Graph of Thoughts: Solving Elaborate Problems with Large Language Models[OL]. arXiv Preprint, arXiv: 2308.09687.
[69]
Yao S Y, Zhao J, Yu D, et al. ReAct: Synergizing Reasoning and Acting in Language Models[OL]. arXiv Preprint, arXiv: 2210.03629.
[70]
Press O, Zhang M R, Min S, et al. Measuring and Narrowing the Compositionality Gap in Language Models[OL]. arXiv Preprint, arXiv: 2210.03350.
[71]
Shinn N, Labash B, Gopinath A. Reflexion: An Autonomous Agent with Dynamic Memory and Self-Reflection[OL]. arXiv Preprint, arXiv: 2303.11366.
[72]
Wang L, Xu W Y, Lan Y H, et al. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models[OL]. arXiv Preprint, arXiv: 2305.04091.
[73]
Li B, Wang R, Guo J L, et al. Deliberate then Generate: Enhanced Prompting Framework for Text Generation[OL]. arXiv Preprint, arXiv: 2305.19835.