Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (3): 26-35    DOI: 10.11925/infotech.2096-3467.2023.0216
Current Issue | Archive | Adv Search |
The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model
Zhao Chaoyang(),Zhu Guibo,Wang Jinqiao
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (1583 KB)   HTML ( 28
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the basic technical principles of ChatGPT, and discusses its influence on the development of large language model and the development of multi-modal pretrained model. [Methods] By analyzing the development process and technical principles of ChatGPT, this paper discusses the influence of model building methods such as instruct fine-tuning, data acquisition and annotation, and reinforcement learning based on human feedback on the large language model. At the same time, this paper analyzes several key scientific problems encountered in the construction of multi-modal model, and discusses the future development of multi-modal pretrained model by referring to ChatGPT’s technical scheme. [Conclusions] The success of ChatGPT provides a good reference technology path for the development of pretrained fundamental model to downstream tasks. In the future construction of multi-modal large model and the realization of downstream tasks, we can make full use of high-quality instruction fine-tuning and other technologies to significantly improve the performance of downstream tasks.

Key wordsLarge Language Model (LLM)      Pretrained Foundation Model      Multi-modal Pretrained Model      ChatGPT     
Received: 13 March 2023      Published: 13 April 2023
ZTFLH:  TP393 G250  
Fund:National Natural Science Foundation of China(61976210);National Natural Science Foundation of China(62176254)
Corresponding Authors: Zhao Chaoyang,ORCID:0000-0002-0341-0166,E-mail:chaoyang.zhao@nlpr.ia.ac.cn。   

Cite this article:

Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model. Data Analysis and Knowledge Discovery, 2023, 7(3): 26-35.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0216     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I3/26

Development of Pretrained Language Model
Illustration of ChatGPT Training Paradigm Based on Human Feedback Learning
20]
">
Framework for Visual Self-Supervised Learning[20]
Evolution from GPT3 to GPT3.5
Illustration of Cross-modal Tasks for Multi-modal Large Model
[1] Zhou C, Li Q, Li C, et al. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT[OL]. arXiv Preprint, arXiv:2302.09419.
[2] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805v2.
[3] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv:1706.03762.
[4] ChatGPT[EB/OL].(2022-11-30). https://openai.com/blog/chatgpt/.
[5] Chen M, Tworek J, Jun H, et al. Evaluating Large Language Models Trained on Code[OL]. arXiv Preprint, arXiv:2107.03374.
[6] Neelakantan A, Xu T, Puri R, et al. Text and Code Embeddings by Contrastive Pre-training[OL]. arXiv Preprint, arXiv:2201.10005.
[7] Brown T B, Mann B, Ryder N, et al. Language Models are Few-shot Learners[OL]. arXiv Preprint, arXiv:2005.14165.
[8] Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning[OL]. arXiv Preprint, arXiv:2104.08691.
[9] Schick T, Schütze H. Exploiting Cloze-questions for Few-shot Text Classification and Natural Language Inference[OL]. arXiv Preprint, arXiv:2001.07676.
[10] Zhang Z, Zhang A, Li M, et al. Automatic Chain of Thought Prompting in Large Language Models[OL]. arXiv Preprint, arXiv:2210.03493.
[11] Wei J, Wang X, Schuurmans D, et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models[OL]. arXiv Preprint, arXiv:2201.11903.
[12] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv:1802.05365.
[13] Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-Training[OL]. .https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
[14] Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[OL]. .https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[15] OpenAI. WebGPT:Brower-assisted Question-Answering with Human Feedback[EB/OL].(2021-12-16). https://openai.com/blog/webgpt/.
[16] OpenAI. Aligning Language Models to Follow Instructions[OL].(2022-01-27). https://openai.com/blog/instruction-following/.
[17] Christiano P, Leike J, Brown T B, et al. Deep Reinforcement Learning from Human Preferences[OL]. arXiv Preprint, arXiv:1706.03741.
[18] Lin J, Men R, Yang A, et al. M6: A Chinese Multimodal Pretrainer[OL]. arXiv Preprint, arXiv:2103.00823.
[19] Liu Y, Zhu G, Zhu B, et. al. TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training[C]// Proceedings of the 36th Conference on Neural Information Processing Systems Trank on Datasets and Benchmarks. 2022.
[20] Jing L, Tian Y. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11):4037-4058.
doi: 10.1109/TPAMI.2020.2992393
[21] Zhang R, Isola P, Efros A A. Colorful Image Colorization[OL]. arXiv Preprint, arXiv:1603.08511.
[22] Caron M, Bojanowski P, Joulin A, et al. Deep Clustering for Unsupervised Learning of Visual Features[OL]. arXiv Preprint, arXiv:1807.05520.
[23] Pathak D, Krahenbuhl P, Donahue J, et al. Context Encoders: Feature Learning by Inpainting[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2536-2544.
[24] He K, Chen X, Xie S, et al. Masked Autoencoders are Scalable Vision Learners[C]// Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition. 2021.
[25] He K, Fan H, Wu Y, et al. Momentum Contrast for Unsupervised Visual Representation Learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 9729-9738.
[26] Chen X, Fan H, Girshick R, et al. Improved Baselines with Momentum Contrastive Learning[OL]. arXiv Preprint, arXiv:2003.04297.
[27] Chen X, Xie S, He K. An Empirical Study of Training Self-supervised Vision TRANSFORMERS[OL]. arXiv Preprint, arXiv:2104.02057.
[28] Chen T, Kornblith S, Swersky K, et al. Big Self-supervised Models are Strong Semi-supervised Learners[OL]. arXiv Preprint, arXiv:2006.10029.
[29] Chen T, Kornblith S, Norouzi M, et al. A Simple Framework for Contrastive Learning of Visual Representations[OL]. arXiv Preprint, arXiv:2006.05709.
[30] Xu H, Yan M, Li C, et al. E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning[OL]. arXiv Preprint, arXiv:2106.01804.
[31] Grill J B, Strub F, Altché F, et al. Bootstrap Your Own Latent: A New Approach to Self-supervised Learning[OL]. arXiv Preprint, arXiv:2006.07733.
[32] Pascual S, Ravanelli M, Serrà J, et al. Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks[OL]. arXiv Preprint, arXiv:1904.03416.
[33] Su W, Zhu X, Cao Y, et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations[OL]. arXiv Preprint, arXiv:1908.08530.
[34] Sun C, Myers A, Vondrick C, et al. Videobert: A Joint Model for Video and Language Representation Learning[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019: 7464-7473.
[35] Chen Y C, Li L, Yu L, et al. Uniter: Universal Image-text Representation Learning[OL]. arXiv Preprint, arXiv:1909.11740.
[36] Ramesh A, Pavlov M, Goh G, et al. Zero-shot Text-to-Image Generation[OL]. arXiv Preprint, arXiv:2102.12092.
[37] Ding M, Yang Z, Hong W, et al. CogView: Mastering Text-to-Image Generation via Transformers[OL]. arXiv Preprint, arXiv:2105.13290.
[38] Cho J, Lei J, Tan H, et al. Unifying Vision-and-Language Tasks via Text Generation[OL]. arXiv Preprint, arXiv:2102.02779.
[1] Zhang Huaping, Li Linhan, Li Chunjin. ChatGPT Performance Evaluation on Chinese Language and Risk Measures[J]. 数据分析与知识发现, 2023, 7(3): 16-25.
[2] Zhang Zhixiong, Yu Gaihong, Liu Yi, Lin Xin, Zhang Menting, Qian Li. The Influence of ChatGPT on Library & Information Services[J]. 数据分析与知识发现, 2023, 7(3): 36-42.
[3] Qian Li, Liu Yi, Zhang Zhixiong, Li Xuesi, Xie Jing, Xu Qinya, Li Yang, Guan Zhengyi, Li Xiyu, Wen Sen. An Analysis on the Basic Technologies of ChatGPT[J]. 数据分析与知识发现, 2023, 7(3): 6-15.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn