Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (3): 26-35     https://doi.org/10.11925/infotech.2096-3467.2023.0216
  专题 本期目录 | 过刊浏览 | 高级检索 |
ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*
赵朝阳(),朱贵波,王金桥
中国科学院自动化研究所 北京 100190
The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model
Zhao Chaoyang(),Zhu Guibo,Wang Jinqiao
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (1583 KB)   HTML ( 26
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 剖析ChatGPT的基础技术原理,探讨其对大语言模型发展产生的影响,以及对多模态大模型发展思路产生的影响。【方法】 通过分析ChatGPT的发展过程和技术原理,探讨指令微调、数据采集与标注、基于人类反馈的强化学习等模型构建方法对大语言模型产生的影响。分析当前多模态大模型构建过程中遇到的关键科学问题,并借鉴ChatGPT的技术方案,探讨多模态大模型未来的发展发展思路。【结论】 ChatGPT为预训练大模型向下游任务的发展提供了良好的参考技术路径,未来的多模态大模型构建以及下游任务实现过程中,可以充分利用高质量的指令微调等技术来显著提升多模态大模型的下游任务性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
赵朝阳
朱贵波
王金桥
关键词 语言大模型预训练大模型多模态预训练模型ChatGPT    
Abstract

[Objective] This paper analyzes the basic technical principles of ChatGPT, and discusses its influence on the development of large language model and the development of multi-modal pretrained model. [Methods] By analyzing the development process and technical principles of ChatGPT, this paper discusses the influence of model building methods such as instruct fine-tuning, data acquisition and annotation, and reinforcement learning based on human feedback on the large language model. At the same time, this paper analyzes several key scientific problems encountered in the construction of multi-modal model, and discusses the future development of multi-modal pretrained model by referring to ChatGPT’s technical scheme. [Conclusions] The success of ChatGPT provides a good reference technology path for the development of pretrained fundamental model to downstream tasks. In the future construction of multi-modal large model and the realization of downstream tasks, we can make full use of high-quality instruction fine-tuning and other technologies to significantly improve the performance of downstream tasks.

Key wordsLarge Language Model (LLM)    Pretrained Foundation Model    Multi-modal Pretrained Model    ChatGPT
收稿日期: 2023-03-13      出版日期: 2023-04-13
ZTFLH:  TP393 G250  
基金资助:国家自然科学基金项目(61976210);国家自然科学基金项目(62176254)
通讯作者: 赵朝阳,ORCID:0000-0002-0341-0166,E-mail:chaoyang.zhao@nlpr.ia.ac.cn。   
引用本文:   
赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model. Data Analysis and Knowledge Discovery, 2023, 7(3): 26-35.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0216      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I3/26
Fig.1  预训练语言模型发展脉络
Fig.2  基于人类反馈学习的ChatGPT训练范式示意图
Fig.3  视觉自监督方法框架[20]
Fig.4  GPT3到GPT3.5模型的进化示意图
Fig.5  多模态大模型的跨模态任务示例
[1] Zhou C, Li Q, Li C, et al. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT[OL]. arXiv Preprint, arXiv:2302.09419.
[2] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805v2.
[3] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv:1706.03762.
[4] ChatGPT[EB/OL].(2022-11-30). https://openai.com/blog/chatgpt/.
[5] Chen M, Tworek J, Jun H, et al. Evaluating Large Language Models Trained on Code[OL]. arXiv Preprint, arXiv:2107.03374.
[6] Neelakantan A, Xu T, Puri R, et al. Text and Code Embeddings by Contrastive Pre-training[OL]. arXiv Preprint, arXiv:2201.10005.
[7] Brown T B, Mann B, Ryder N, et al. Language Models are Few-shot Learners[OL]. arXiv Preprint, arXiv:2005.14165.
[8] Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning[OL]. arXiv Preprint, arXiv:2104.08691.
[9] Schick T, Schütze H. Exploiting Cloze-questions for Few-shot Text Classification and Natural Language Inference[OL]. arXiv Preprint, arXiv:2001.07676.
[10] Zhang Z, Zhang A, Li M, et al. Automatic Chain of Thought Prompting in Large Language Models[OL]. arXiv Preprint, arXiv:2210.03493.
[11] Wei J, Wang X, Schuurmans D, et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models[OL]. arXiv Preprint, arXiv:2201.11903.
[12] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv:1802.05365.
[13] Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-Training[OL]. .https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
[14] Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[OL]. .https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[15] OpenAI. WebGPT:Brower-assisted Question-Answering with Human Feedback[EB/OL].(2021-12-16). https://openai.com/blog/webgpt/.
[16] OpenAI. Aligning Language Models to Follow Instructions[OL].(2022-01-27). https://openai.com/blog/instruction-following/.
[17] Christiano P, Leike J, Brown T B, et al. Deep Reinforcement Learning from Human Preferences[OL]. arXiv Preprint, arXiv:1706.03741.
[18] Lin J, Men R, Yang A, et al. M6: A Chinese Multimodal Pretrainer[OL]. arXiv Preprint, arXiv:2103.00823.
[19] Liu Y, Zhu G, Zhu B, et. al. TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training[C]// Proceedings of the 36th Conference on Neural Information Processing Systems Trank on Datasets and Benchmarks. 2022.
[20] Jing L, Tian Y. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11):4037-4058.
doi: 10.1109/TPAMI.2020.2992393
[21] Zhang R, Isola P, Efros A A. Colorful Image Colorization[OL]. arXiv Preprint, arXiv:1603.08511.
[22] Caron M, Bojanowski P, Joulin A, et al. Deep Clustering for Unsupervised Learning of Visual Features[OL]. arXiv Preprint, arXiv:1807.05520.
[23] Pathak D, Krahenbuhl P, Donahue J, et al. Context Encoders: Feature Learning by Inpainting[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2536-2544.
[24] He K, Chen X, Xie S, et al. Masked Autoencoders are Scalable Vision Learners[C]// Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition. 2021.
[25] He K, Fan H, Wu Y, et al. Momentum Contrast for Unsupervised Visual Representation Learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 9729-9738.
[26] Chen X, Fan H, Girshick R, et al. Improved Baselines with Momentum Contrastive Learning[OL]. arXiv Preprint, arXiv:2003.04297.
[27] Chen X, Xie S, He K. An Empirical Study of Training Self-supervised Vision TRANSFORMERS[OL]. arXiv Preprint, arXiv:2104.02057.
[28] Chen T, Kornblith S, Swersky K, et al. Big Self-supervised Models are Strong Semi-supervised Learners[OL]. arXiv Preprint, arXiv:2006.10029.
[29] Chen T, Kornblith S, Norouzi M, et al. A Simple Framework for Contrastive Learning of Visual Representations[OL]. arXiv Preprint, arXiv:2006.05709.
[30] Xu H, Yan M, Li C, et al. E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning[OL]. arXiv Preprint, arXiv:2106.01804.
[31] Grill J B, Strub F, Altché F, et al. Bootstrap Your Own Latent: A New Approach to Self-supervised Learning[OL]. arXiv Preprint, arXiv:2006.07733.
[32] Pascual S, Ravanelli M, Serrà J, et al. Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks[OL]. arXiv Preprint, arXiv:1904.03416.
[33] Su W, Zhu X, Cao Y, et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations[OL]. arXiv Preprint, arXiv:1908.08530.
[34] Sun C, Myers A, Vondrick C, et al. Videobert: A Joint Model for Video and Language Representation Learning[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019: 7464-7473.
[35] Chen Y C, Li L, Yu L, et al. Uniter: Universal Image-text Representation Learning[OL]. arXiv Preprint, arXiv:1909.11740.
[36] Ramesh A, Pavlov M, Goh G, et al. Zero-shot Text-to-Image Generation[OL]. arXiv Preprint, arXiv:2102.12092.
[37] Ding M, Yang Z, Hong W, et al. CogView: Mastering Text-to-Image Generation via Transformers[OL]. arXiv Preprint, arXiv:2105.13290.
[38] Cho J, Lei J, Tan H, et al. Unifying Vision-and-Language Tasks via Text Generation[OL]. arXiv Preprint, arXiv:2102.02779.
[1] 张华平, 李林翰, 李春锦. ChatGPT中文性能测评与风险应对*[J]. 数据分析与知识发现, 2023, 7(3): 16-25.
[2] 张智雄, 于改红, 刘熠, 林歆, 张梦婷, 钱力. ChatGPT对文献情报工作的影响*[J]. 数据分析与知识发现, 2023, 7(3): 36-42.
[3] 钱力, 刘熠, 张智雄, 李雪思, 谢靖, 许钦亚, 黎洋, 管铮懿, 李西雨, 文森. ChatGPT的技术基础分析*[J]. 数据分析与知识发现, 2023, 7(3): 6-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn