Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (9): 1-11    DOI: 10.11925/infotech.2096-3467.2023.0473
Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks
Bao Tong,Zhang Chengzhi()
School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China
[Objective] This paper evaluates the performance of typical Chinese information extraction tasks such as named entity recognition, relationship extraction, and event extraction with ChatGPT. It also analyzes the performance differences of ChatGPT in different tasks and domains, which provides recommendations for ChatGPT in Chinese contexts. [Methods] We used manual prompts to evaluate the test results with exact matching or loose matching on three typical information extraction tasks across seven datasets. We evaluated the named entity recognition of ChatGPT on MSRA, Weibo, Resume, and CCKS2019 datasets and compared it with GlyceBERT and ERNIE3.0 models. We extracted the relationships with ChatGPT and ERNIE3.0 Titan on FinRE and SanWen datasets. We ran the event extraction of ChatGPT and ERNIE3.0 on the CCKS2020 dataset. [Results] In the named entity recognition task, ChatGPT was outperformed by GlyceBERT and ERNIE3.0 models. ERNIE3.0 Titan was also superior to ChatGPT significantly in the relationship extraction task. In the event extraction task, ChatGPT’s performance was slightly better than ERNIE3.0 under loose matching. [Limitations] The evaluation of ChatGPT’s performance using prompts is subjective, and different prompts may lead to different results. [Conclusions] ChatGPT needs to improve its performance on typical Chinese information extraction tasks, and users should choose appropriate prompts for better results.

Key wordsChatGPT      Information Extraction      Chinese Information Processing      Pre-trained Language Models     
Received: 19 May 2023      Published: 12 September 2023
ZTFLH:  TP391  
Fund:The National Natural Science Foundation of China(72074113)
Corresponding Authors: Zhang Chengzhi,ORCID:0000-0001-9522-2914,E-mail:。   

Cite this article:

Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks. Data Analysis and Knowledge Discovery, 2023, 7(9): 1-11.

ChatGPT Evaluation Process for Chinese Information Extraction
任务名称 数据集 领域 抽取内容
命名实体识别 MSRA[31] 通用领域 包含人名、地名、机构名三类实体
WeiboNER[32] 社交媒体 包含人名、位置、机构和行政区4类实体
Resume[33] 人物简历 包含人名、籍贯、专业、单位和职称等8类实体
CCKS2019[34] 电子病历 包含实验室检验、影像检查、疾病诊断、手术、药物和部位6类实体
关系抽取 FinRE[35] 财经金融 包含44种(双向)财经金融行业商业公司的关系
SanWen[36] 汉语文学 包含9种中国汉语言文学中的话语级关系
事件抽取 CCKS2020[37] 公司公告 包含8种上市公司公告涉及的事件类型
ChatGPT Evaluation Datasets for Chinese Information Extraction
测试步骤 测试详情
提示模板 请给出以下句子中的{Entity type}:{text} \n
输入样本 请给出以下句子中的人名、地名、机构名:最近,东南亚个别国家政局不稳,朝鲜半岛局势仍未明朗,都构成中美战略对话的新背景
标注结果 (东南亚,地名),(朝鲜半岛,地名),(中,地名),(美,地名)
ChatGPT答案 人名:无。地名:东南亚、朝鲜半岛。机构名:中美战略对话。
Test Example for Chinese Named Entity Recognition Task of ChatGPT
数据集 样本数量 标注实体 ChatGPT输出实体 GlyceBERT[40] ERNIE3.0 ChatGPT
F1(%) F1(%) F1/ P-F1(%)
MSRA 1 000 3 008 3 595 95.54 / 56.28/75.95
WeiboNER 700 1 402 2 055 67.60 69.23 31.58/54.44
Resume 1 000 5 912 6 994 96.54 / 64.50/84.47
CCKS2019 500 6 933 7 670 / 82.70 27.76/41.61
Test Results of ChatGPT in Chinese Named Entity Recognition
测试步骤 测试详情
输入样本 给定关系:[‘Unknown’,‘Create’,‘Use’,‘Near’,‘Social’,‘Located’,‘Owership’,‘General-Spicial’,‘Family’,‘Part-Whole’]
标注结果 Part-Whole
ChatGPT答案 Located
Test Example for Chinese Relation Extraction(One-shot) of ChatGPT
数据集 样本数 指标 ERNIE3.0 Titan ChatGPT
Few-shot Zero-shot/One-shot/Few-shot
FinRE 1 000 F1(%) 63.15 22.96/26.65/26.90
SanWen 1 000 F1(%) 82.70 18.64/25.64/27.11
Test Results of ChatGPT in Chinese Relation Extraction
测试步骤 测试详情
提示步骤1 请判断下列文本属于{event type}中的哪一类事件?\n {text} \n
输入样本 请判断以下文本属于[破产清算,股权冻结,…,安全事故]中的哪个事件类型?
标注结果 股权冻结
ChatGPT答案 这段文本属于[股权冻结]事件类型。
提示步骤2 请给出{event elements∈event type } \n
输入样本 请给出被冻结股东名称、执行冻结机构、起始日期
标注结果 [(被冻结股东名称:天津顺航海运有限公司),(执行冻结机构:天津市第二中级人民法院),(起始日期:2017年1月12日)]
ChatGPT答案 被冻结股东名称:天津顺航海运有限公司,执行冻结机构:天津市第二中级人民法院,起始日期:2017年1月12日。
Test Example for Chinese Event Extraction Task of ChatGPT
数据集 样本数 平均长度 标注元素 ChatGPT输出元素 ERNIE3.0 ChatGPT
F1(%) F1/ P-F1(%)
CCKS2020 500 403 2 050 2 458 64.33 45.05/65.80
Test Results of ChatGPT in Chinese Event Extraction
Errors Examples of Word Segmentation Ambiguity and Nested Entity Recognition in ChatGPT
Error Example of Chinese Relation Extraction in ChatGPT
Error Example of Semantic Comprehension in ChatGPT
