[Objective] This paper evaluates the performance of typical Chinese information extraction tasks such as named entity recognition, relationship extraction, and event extraction with ChatGPT. It also analyzes the performance differences of ChatGPT in different tasks and domains, which provides recommendations for ChatGPT in Chinese contexts. [Methods] We used manual prompts to evaluate the test results with exact matching or loose matching on three typical information extraction tasks across seven datasets. We evaluated the named entity recognition of ChatGPT on MSRA, Weibo, Resume, and CCKS2019 datasets and compared it with GlyceBERT and ERNIE3.0 models. We extracted the relationships with ChatGPT and ERNIE3.0 Titan on FinRE and SanWen datasets. We ran the event extraction of ChatGPT and ERNIE3.0 on the CCKS2020 dataset. [Results] In the named entity recognition task, ChatGPT was outperformed by GlyceBERT and ERNIE3.0 models. ERNIE3.0 Titan was also superior to ChatGPT significantly in the relationship extraction task. In the event extraction task, ChatGPT’s performance was slightly better than ERNIE3.0 under loose matching. [Limitations] The evaluation of ChatGPT’s performance using prompts is subjective, and different prompts may lead to different results. [Conclusions] ChatGPT needs to improve its performance on typical Chinese information extraction tasks, and users should choose appropriate prompts for better results.
鲍彤, 章成志. ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例*[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks. Data Analysis and Knowledge Discovery, 2023, 7(9): 1-11.
Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM, 2020: 1877-1901.
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[8]
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[9]
Colin R, Noam S, Adam R, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J]. Journal of Machine Learning Research, 2020, 21: 5485-5551.
[10]
Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[11]
Lewis M, Liu Y H, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension[OL]. arXiv Preprint, arXiv: 1910.13461.
[12]
Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[OL]. [2023-04-15]. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.
[13]
Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning[OL]. arXiv Preprint, arXiv: 2104.08691.
[14]
Reynolds L, McDonell K. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm[C]// Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, 2021: 1-7.
[15]
Zhao Z, Wallace E, Feng S, et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models[C]// Proceedings of the 38th International Conference on Machine Learning Research. 2021: 12697-12706.
[16]
Chung H W, Hou L, Longpre S, et al. Scaling Instruction-Finetuned Language Models[OL]. arXiv Preprint, arXiv: 2210.11416.
[17]
Wei J, Wang X Z, Schuurmans D, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models[OL]. arXiv Preprint, arXiv: 2201.11903.
[18]
Chen M, Tworek J, Jun H, et al. Evaluating Large Language Models Trained on Code[OL]. arXiv Preprint, arXiv: 2107.03374.
[19]
Ouyang L, Wu J, Jiang X, et al. Training Language Models to Follow Instructions with Human Feedback[C]// Proceedings of the 2022 Conference on Neural Information Processing Systems. 2022: 27730-27744.
[20]
Qin C W, Zhang A, Zhang Z S, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?[OL]. arXiv Preprint, arXiv: 2302.06476.
[21]
Bang Y J, Cahyawijaya S, Lee N, et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity[OL]. arXiv Preprint, arXiv: 2302.04023.
[22]
Tan Y M, Min D H, Li Y, et al. Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions[OL]. arXiv Preprint, arXiv: 2303.07992.
[23]
Jiao W X, Wang W X, Huang J T, et al. Is ChatGPT a Good Translator? A Preliminary Study[OL]. arXiv Preprint, arXiv: 2301.08745.
[24]
Pan W B, Chen Q G, Xu X, et al. A Preliminary Evaluation of ChatGPT for Zero-Shot Dialogue Understanding[OL]. arXiv Preprint, arXiv: 2304.04256.
[25]
Wang S, Scells H, Koopman B, et al. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?[OL]. arXiv Preprint, arXiv: 2302.03495.
[26]
Wei X, Cui X Y, Cheng N, et al. Zero-Shot Information Extraction via Chatting with ChatGPT[OL]. arXiv Preprint, arXiv: 2302.10205.
[27]
Sobania D, Briesch M, Hanna C, et al. An Analysis of the Automatic Bug Fixing Performance of ChatGPT[OL]. arXiv Preprint, arXiv: 2301.08653.
[28]
Gao J, Zhao H, Yu C L, et al. Exploring the Feasibility of ChatGPT for Event Extraction[OL]. arXiv Preprint, arXiv: 2303.03836.
[29]
Hu Y, Ameer I, Zuo X, et al. Zero-Shot Clinical Entity Recognition Using ChatGPT[OL]. arXiv Preprint, arXiv: 2303.16416.
(Zhang Huaping, Li Linhan, Li Chunjin. ChatGPT Performance Evaluation on Chinese Language and Risk Measures[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 16-25.)
[31]
Levow G A. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. 2006: 108-117.
[32]
Peng N Y, Dredze M. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, 2015: 548-554.
[33]
Zhang Y, Yang J. Chinese NER Using Lattice LSTM[OL]. arXiv Preprint, arXiv: 1805.02023.
FinRE: Chinese_NRE/data at master·thunlp/Chinese_NRE[EB/OL]. [2023-04-15]. https://github.com/thunlp/Chinese_NRE.
[36]
Xu J J, Wen J, Sun X, et al. A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text[OL]. arXiv Preprint, arXiv: 1711.07010.
Grishman R, Sundheim B. Message Understanding Conference-6: A Brief History[C]// Proceedings of the 16th Conference on Computational Linguistics. ACL, 1996: 466-471.
Meng Y X, Wu W, Wang F, et al. Glyce: Glyph-Vectors for Chinese Character Representations[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 2746-2757.
[41]
Sun Y, Wang S H, Feng S K, et al. ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2107.02137.
[42]
Zhao S D, Liu T, Zhao S C, et al. A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019, 33(1): 817-824.
[43]
Wang S H, Sun Y, Xiang Y, et al. ERNIE 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2112.12731.