Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (9): 1-11     https://doi.org/10.11925/infotech.2096-3467.2023.0473
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例*
鲍彤,章成志()
南京理工大学经济管理学院 南京 210094
Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks
Bao Tong,Zhang Chengzhi()
School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China
全文: PDF (868 KB)   HTML ( 118
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】评估ChatGPT在中文命名实体识别、关系抽取以及事件抽取等典型中文信息抽取任务中的性能,分析不同任务和领域ChatGPT的表现差异,给出ChatGPT中文场景下的使用建议。【方法】采用Prompt提示的方式,分别依据精确匹配和宽松匹配两种方式,测评ChatGPT在三个典型信息抽取任务、共7个数据集上的性能:在MSRA、Weibo、Resume和CCKS2019数据集评估ChatGPT的命名实体识别效果,并与GlyceBERT和ERNIE3.0模型对比;在FinRE和SanWen数据集测试ChatGPT与ERNIE3.0 Titan的关系抽取效果;在CCKS2020数据集测试ChatGPT与ERNIE3.0的事件抽取效果。【结果】ChatGPT在命名实体识别任务中的表现不及GlyceBERT和ERNIE3.0模型。在关系抽取任务中,ERNIE3.0 Titan优于ChatGPT。在事件抽取任务中,ChatGPT在宽松匹配下的表现优于ERNIE3.0。【局限】以Prompt提示的方式评估ChatGPT的性能表现存在主观性,不同的Prompt会产生效果差异。【结论】ChatGPT在典型的中文信息抽取任务上的表现还有很大改进空间,用户在使用过程中需选择合适的Prompt和问题。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
鲍彤
章成志
关键词 ChatGPT信息抽取中文信息处理预训练语言模型    
Abstract

[Objective] This paper evaluates the performance of typical Chinese information extraction tasks such as named entity recognition, relationship extraction, and event extraction with ChatGPT. It also analyzes the performance differences of ChatGPT in different tasks and domains, which provides recommendations for ChatGPT in Chinese contexts. [Methods] We used manual prompts to evaluate the test results with exact matching or loose matching on three typical information extraction tasks across seven datasets. We evaluated the named entity recognition of ChatGPT on MSRA, Weibo, Resume, and CCKS2019 datasets and compared it with GlyceBERT and ERNIE3.0 models. We extracted the relationships with ChatGPT and ERNIE3.0 Titan on FinRE and SanWen datasets. We ran the event extraction of ChatGPT and ERNIE3.0 on the CCKS2020 dataset. [Results] In the named entity recognition task, ChatGPT was outperformed by GlyceBERT and ERNIE3.0 models. ERNIE3.0 Titan was also superior to ChatGPT significantly in the relationship extraction task. In the event extraction task, ChatGPT’s performance was slightly better than ERNIE3.0 under loose matching. [Limitations] The evaluation of ChatGPT’s performance using prompts is subjective, and different prompts may lead to different results. [Conclusions] ChatGPT needs to improve its performance on typical Chinese information extraction tasks, and users should choose appropriate prompts for better results.

Key wordsChatGPT    Information Extraction    Chinese Information Processing    Pre-trained Language Models
收稿日期: 2023-05-19      出版日期: 2023-09-12
ZTFLH:  TP391  
  G350  
基金资助:*国家自然科学基金项目(72074113)
通讯作者: 章成志,ORCID:0000-0001-9522-2914,E-mail: zhangcz@njust.edu.cn。   
引用本文:   
鲍彤, 章成志. ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例*[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks. Data Analysis and Knowledge Discovery, 2023, 7(9): 1-11.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0473      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I9/1
Fig.1  ChatGPT中文信息抽取性能测评流程
任务名称 数据集 领域 抽取内容
命名实体识别 MSRA[31] 通用领域 包含人名、地名、机构名三类实体
WeiboNER[32] 社交媒体 包含人名、位置、机构和行政区4类实体
Resume[33] 人物简历 包含人名、籍贯、专业、单位和职称等8类实体
CCKS2019[34] 电子病历 包含实验室检验、影像检查、疾病诊断、手术、药物和部位6类实体
关系抽取 FinRE[35] 财经金融 包含44种(双向)财经金融行业商业公司的关系
SanWen[36] 汉语文学 包含9种中国汉语言文学中的话语级关系
事件抽取 CCKS2020[37] 公司公告 包含8种上市公司公告涉及的事件类型
Table 1  ChatGPT中文信息抽取性能测评数据集
测试步骤 测试详情
提示模板 请给出以下句子中的{Entity type}:{text} \n
输入样本 请给出以下句子中的人名、地名、机构名:最近,东南亚个别国家政局不稳,朝鲜半岛局势仍未明朗,都构成中美战略对话的新背景
标注结果 (东南亚,地名),(朝鲜半岛,地名),(中,地名),(美,地名)
ChatGPT答案 人名:无。地名:东南亚、朝鲜半岛。机构名:中美战略对话。
Table 2  ChatGPT中文命名实体识别任务测试样例
数据集 样本数量 标注实体 ChatGPT输出实体 GlyceBERT[40] ERNIE3.0 ChatGPT
F1(%) F1(%) F1/ P-F1(%)
MSRA 1 000 3 008 3 595 95.54 / 56.28/75.95
WeiboNER 700 1 402 2 055 67.60 69.23 31.58/54.44
Resume 1 000 5 912 6 994 96.54 / 64.50/84.47
CCKS2019 500 6 933 7 670 / 82.70 27.76/41.61
Table 3  ChatGPT中文命名实体识别测试结果
测试步骤 测试详情
输入样本 给定关系:[‘Unknown’,‘Create’,‘Use’,‘Near’,‘Social’,‘Located’,‘Owership’,‘General-Spicial’,‘Family’,‘Part-Whole’]
阅读下面的句子并判断实体间属于什么关系?
句子:我曾在铜梁和你有过相遇。叫你你不应,头发胡子盖住了半边脸
实体:你,胡子
关系:Part-Whole
句子:漫步在这山间的小道上
实体:山间,小道
关系:
标注结果 Part-Whole
ChatGPT答案 Located
Table 4  ChatGPT中文关系抽取(One-shot)测试样例
数据集 样本数 指标 ERNIE3.0 Titan ChatGPT
Few-shot Zero-shot/One-shot/Few-shot
FinRE 1 000 F1(%) 63.15 22.96/26.65/26.90
SanWen 1 000 F1(%) 82.70 18.64/25.64/27.11
Table 5  ChatGPT中文关系抽取测试结果
测试步骤 测试详情
提示步骤1 请判断下列文本属于{event type}中的哪一类事件?\n {text} \n
输入样本 请判断以下文本属于[破产清算,股权冻结,…,安全事故]中的哪个事件类型?
2017年1月12日,长航凤凰股份有限公司(以下简称“公司”)通过中国登记结算有限公司系统查询获知公司第一大股东天津顺航海运有限公司(以下简称“顺航海运”)持有公司股票181,015,974股,持股比例17.89%被天津市第二中级人民法院(以下简称“天津二中院”)司法冻结,具体内容详见……
标注结果 股权冻结
ChatGPT答案 这段文本属于[股权冻结]事件类型。
提示步骤2 请给出{event elements∈event type } \n
输入样本 请给出被冻结股东名称、执行冻结机构、起始日期
标注结果 [(被冻结股东名称:天津顺航海运有限公司),(执行冻结机构:天津市第二中级人民法院),(起始日期:2017年1月12日)]
ChatGPT答案 被冻结股东名称:天津顺航海运有限公司,执行冻结机构:天津市第二中级人民法院,起始日期:2017年1月12日。
Table 6  ChatGPT中文事件抽取任务测试样例
数据集 样本数 平均长度 标注元素 ChatGPT输出元素 ERNIE3.0 ChatGPT
F1(%) F1/ P-F1(%)
CCKS2020 500 403 2 050 2 458 64.33 45.05/65.80
Table 7  ChatGPT中文事件抽取测试结果
Fig.2  ChatGPT分词歧义和嵌套实体识别错误样例
Fig.3  ChatGPT中文关系抽取错误识别样例
Fig.4  ChatGPT语义理解错误样例
[1] Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM, 2020: 1877-1901.
[2] OpenAI: Introducing ChatGPT[EB/OL]. [2023-04-15]. https://openai.com/blog/chatgpt.
[3] Thoppilan R, De Freitas D, Hall J, et al. LaMDA: Language Models for Dialog Applications[OL]. arXiv Preprint, arXiv: 2201.08239.
[4] Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models[OL]. arXiv Preprint, arXiv: 2302.13971.
[5] 百度: 文心一言[EB/OL]. [2023-04-15]. https://yiyan.baidu.com/welcome.
[5] (BaiDu:ERNIE Bot[EB/OL]. [2023-04-15]. https://yiyan.baidu.com/welcome.)
[6] 阿里巴巴: 通义千问[EB/OL]. [2023-04-15]. https://qianwen.aliyun.com/.
[6] (Alibaba:qianwen[EB/OL]. [2023-04-15]. https://qianwen.aliyun.com/.)
[7] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[8] Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[9] Colin R, Noam S, Adam R, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J]. Journal of Machine Learning Research, 2020, 21: 5485-5551.
[10] Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[11] Lewis M, Liu Y H, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension[OL]. arXiv Preprint, arXiv: 1910.13461.
[12] Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[OL]. [2023-04-15]. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.
[13] Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning[OL]. arXiv Preprint, arXiv: 2104.08691.
[14] Reynolds L, McDonell K. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm[C]// Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, 2021: 1-7.
[15] Zhao Z, Wallace E, Feng S, et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models[C]// Proceedings of the 38th International Conference on Machine Learning Research. 2021: 12697-12706.
[16] Chung H W, Hou L, Longpre S, et al. Scaling Instruction-Finetuned Language Models[OL]. arXiv Preprint, arXiv: 2210.11416.
[17] Wei J, Wang X Z, Schuurmans D, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models[OL]. arXiv Preprint, arXiv: 2201.11903.
[18] Chen M, Tworek J, Jun H, et al. Evaluating Large Language Models Trained on Code[OL]. arXiv Preprint, arXiv: 2107.03374.
[19] Ouyang L, Wu J, Jiang X, et al. Training Language Models to Follow Instructions with Human Feedback[C]// Proceedings of the 2022 Conference on Neural Information Processing Systems. 2022: 27730-27744.
[20] Qin C W, Zhang A, Zhang Z S, et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?[OL]. arXiv Preprint, arXiv: 2302.06476.
[21] Bang Y J, Cahyawijaya S, Lee N, et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity[OL]. arXiv Preprint, arXiv: 2302.04023.
[22] Tan Y M, Min D H, Li Y, et al. Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions[OL]. arXiv Preprint, arXiv: 2303.07992.
[23] Jiao W X, Wang W X, Huang J T, et al. Is ChatGPT a Good Translator? A Preliminary Study[OL]. arXiv Preprint, arXiv: 2301.08745.
[24] Pan W B, Chen Q G, Xu X, et al. A Preliminary Evaluation of ChatGPT for Zero-Shot Dialogue Understanding[OL]. arXiv Preprint, arXiv: 2304.04256.
[25] Wang S, Scells H, Koopman B, et al. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?[OL]. arXiv Preprint, arXiv: 2302.03495.
[26] Wei X, Cui X Y, Cheng N, et al. Zero-Shot Information Extraction via Chatting with ChatGPT[OL]. arXiv Preprint, arXiv: 2302.10205.
[27] Sobania D, Briesch M, Hanna C, et al. An Analysis of the Automatic Bug Fixing Performance of ChatGPT[OL]. arXiv Preprint, arXiv: 2301.08653.
[28] Gao J, Zhao H, Yu C L, et al. Exploring the Feasibility of ChatGPT for Event Extraction[OL]. arXiv Preprint, arXiv: 2303.03836.
[29] Hu Y, Ameer I, Zuo X, et al. Zero-Shot Clinical Entity Recognition Using ChatGPT[OL]. arXiv Preprint, arXiv: 2303.16416.
[30] 张华平, 李林翰, 李春锦. ChatGPT中文性能测评与风险应对[J]. 数据分析与知识发现, 2023, 7(3): 16-25.
[30] (Zhang Huaping, Li Linhan, Li Chunjin. ChatGPT Performance Evaluation on Chinese Language and Risk Measures[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 16-25.)
[31] Levow G A. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. 2006: 108-117.
[32] Peng N Y, Dredze M. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, 2015: 548-554.
[33] Zhang Y, Yang J. Chinese NER Using Lattice LSTM[OL]. arXiv Preprint, arXiv: 1805.02023.
[34] CCKS2019[EB/OL]. [2023-04-23]. https://www.biendata.net/competition/ccks_2019_1/.
[35] FinRE: Chinese_NRE/data at master·thunlp/Chinese_NRE[EB/OL]. [2023-04-15]. https://github.com/thunlp/Chinese_NRE.
[36] Xu J J, Wen J, Sun X, et al. A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text[OL]. arXiv Preprint, arXiv: 1711.07010.
[37] CCKS2020[EB/OL]. [2023-04-23]. https://www.biendata.net/competition/ccks_2020_2_2/.
[38] Grishman R, Sundheim B. Message Understanding Conference-6: A Brief History[C]// Proceedings of the 16th Conference on Computational Linguistics. ACL, 1996: 466-471.
[39] OpenAI API[EB/OL]. [2023-04-24]. https://platform.openai.com/docs/api-reference.
[40] Meng Y X, Wu W, Wang F, et al. Glyce: Glyph-Vectors for Chinese Character Representations[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 2746-2757.
[41] Sun Y, Wang S H, Feng S K, et al. ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2107.02137.
[42] Zhao S D, Liu T, Zhao S C, et al. A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019, 33(1): 817-824.
[43] Wang S H, Sun Y, Xiang Y, et al. ERNIE 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 2112.12731.
[1] 张颖怡, 章成志, 周毅, 陈必坤. 基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究*[J]. 数据分析与知识发现, 2023, 7(9): 12-24.
[2] 邓宇扬, 吴丹. 面向藏族传统节日的汉藏双语命名实体识别研究*[J]. 数据分析与知识发现, 2023, 7(7): 125-135.
[3] 陈诺, 李旭晖. 一种基于模板提示学习的事件抽取方法*[J]. 数据分析与知识发现, 2023, 7(6): 86-98.
[4] 张华平, 李林翰, 李春锦. ChatGPT中文性能测评与风险应对*[J]. 数据分析与知识发现, 2023, 7(3): 16-25.
[5] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[6] 张智雄, 于改红, 刘熠, 林歆, 张梦婷, 钱力. ChatGPT对文献情报工作的影响*[J]. 数据分析与知识发现, 2023, 7(3): 36-42.
[7] 钱力, 刘熠, 张智雄, 李雪思, 谢靖, 许钦亚, 黎洋, 管铮懿, 李西雨, 文森. ChatGPT的技术基础分析*[J]. 数据分析与知识发现, 2023, 7(3): 6-15.
[8] 叶瀚,孙海春,李欣,焦凯楠. 融合注意力机制与句向量压缩的长文本分类模型[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[9] 景慎旗, 赵又霖. 基于医学领域知识和远程监督的医学实体关系抽取研究*[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[10] 谭荧, 唐亦非. 基于指代消解的引文内容抽取研究*[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[11] 王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[12] 沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[13] 陶玥,余丽,张润杰. 科技文献中短语级主题抽取的主动学习方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[14] 刘志强,都云程,施水才. 基于改进的隐马尔科夫模型的网页新闻关键信息抽取*[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[15] 章成志,李铮. 基于学术论文全文的创新研究评价句抽取研究 *[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn