Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (1): 114-124    DOI: 10.11925/infotech.2096-3467.2022.1367
Current Issue | Archive | Adv Search |
Classifying Ancient Chinese Text Relations with Entity Information
Tang Xuemei1,2,Su Qi2,3(),Wang Jun1,2
1Department of Information Management, Peking University, Beijing 100871, China
2Center for Digital Humanities, Peking University, Beijing 100871, China
3School of Foreign Languages, Peking University, Beijing 100871, China
Download: PDF (1329 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper integrates entity information with pre-trained language models, which help us classify ancient Chinese relations. [Methods] Firstly, we utilized special tokens in the input layer of the pre-trained model to mark the positions of entity pairs. We also appended entity-type descriptions following the original relation sentences. Secondly, we extracted semantic information of entities from the output of the pre-trained language model. Thirdly, we employed a CNN model to incorporate positional information of each token relative to the start and end entities into the model. Finally, we concatenated sentence representations, entity semantic representations, and CNN outputs and passed them through a classifier to obtain relation labels. [Results] Compared to pre-trained language models, our new model’s Macro F1 score was 3.5% higher on average. [Limitations] Analysis of the confusion matrix reveals a tendency for errors in predicting relations with the same entity type pairs. [Conclusions] Combining entity information and pre-trained language models enhances the effectiveness of ancient Chinese relation classification.

Key wordsAncient Chinese      Relation Extraction      Relation Classification      Pre-trained Language Model      Entity Information     
Received: 30 December 2022      Published: 30 March 2023
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(72010107003)
Corresponding Authors: Su Qi,ORCID:0000-0002-4769-2812,E-mail:sukia@pku.edu.cn。   

Cite this article:

Tang Xuemei, Su Qi, Wang Jun. Classifying Ancient Chinese Text Relations with Entity Information. Data Analysis and Knowledge Discovery, 2024, 8(1): 114-124.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1367     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/114

Model Framework
编号 关系类型 训练集样本数 验证集样本数 测试集样本数
1 245 65 25
2 70 23 9
3 66 23 12
4 41 13 -
5 22 2 -
6 32 4 -
7 106 44 18
8 113 38 18
9 130 26 20
10 同名于 239 65 28
11 朋友 26 8 -
12 任职 1 062 307 162
13 31 10 7
14 升迁 31 12 -
15 管理 70 20 12
16 隶属于 187 50 29
17 属于 86 18 16
18 归属 34 7 -
19 作战 37 15 12
20 讨伐 44 10 9
21 位于 103 33 17
22 葬于 31 9 -
23 出生地 79 20 -
24 去往 179 47 24
25 依附 49 13 -
总计 3 113 882 418
Datasets Statistics
超参数
Train batch size 16
Evaluation batch size 8
Max sequence length 128
CNN kernel size 2
CNN out channels 20
Learning rate 2e-5
Dropout 0.5
Epochs 50
Hyperparameter Settings
模型 特征组合 Micro F1 /% Macro F1 /%
BERT - 90.67 49.06
+entity 91.87 50.90
+entity+type 94.02 54.32
+entity+type+CNN 93.78 51.88
+entity+type+pe+CNN 95.22 52.70
Guwen_BERT - 91.39 51.33
+entity 93.77 52.57
+entity+type 95.93 56.05
+entity+type+CNN 95.22 56.17
+entity+type+pe+CNN 95.22 56.25
RoBERTa - 90.19 50.05
+entity 92.34 51.60
+entity+type 93.06 53.32
+entity+type+CNN 94.74 54.68
+entity+type+pe+CNN 94.26 52.62
Guwen_RoBERTa - 94.02 52.79
+entity 94.74 53.24
+entity+type 95.45 53.45
+entity+type+CNN 95.45 55.57
+entity+type+pe+CNN 95.69 53.18
Experimental Results on the C-CLUE Dataset
模型 Micro F1/% Macro F1/%
Soares等[30](BERT) 89.23 46.33
Wu等[27](BERT) 92.58 49.26
本文(BERT) 95.22 52.70
Comparative Experiment Results of Previous Studies
关系分类 Micro F1/% 关系分类 Micro F1/%
子(e1e2 95.83 子(e2e1 100.00
隶属于(e1e2 95.65 隶属于(e2e1 88.89
任职(e1e2 100.00 任职(e2e1 100.00
同名于(e1e2 79.99 同名于(e2e1 96.55
号(e1e2 91.43 号(e2e1 0.00
作战(e1e2 100.00 作战(e2e1 100.00
位于(e1e2 85.71 位于(e2e1 100.00
弟(e1e2 94.12 弟(e2e1 100.00
杀(e1e2 94.74 杀(e2e1 71.43
管理(e1e2 80.00 管理(e2e1 100.00
属于(e1e2 100.00 属于(e2e1 100.00
讨伐(e1e2 90.90 讨伐(e2e1 100.00
去往(e1e2 95.83 名(e1e2 90.90
作(e1e2 100.00 兄(e2e1 91.66
The Classification F1 Values for Guwen_RoBERTa+entity+type+pe+CNN Model
Relation Classification Confusion Matrix
案例 测试集样本 本文模型(w/o entity) 本文模型
1 嫘祖为【黄帝|PER】正妃,生二子,其后皆有天下:其一曰玄嚣,是为青阳,青阳降居江水。其二曰【昌意|PER】,降居若水 同名于(e1e2 子(e1e2
2 十年,烈王崩,弟【扁|PER】立,是为【显王|PER】。显王五年,贺秦献公,献公称伯。九年,致文武胙於秦孝公。二十五年,秦会诸侯於周 兄(e2e1 同名于(e2e1
3 然王不亲兵,以兵三千属浚而已。浚屯于阴地。河东叛将冯霸杀潞州守将李克恭来降,遣【葛从周|PER】入【潞州|LOC】。李克用遣康君立攻之,从周走河阳 兄(e2e1 管理(e2e1
案例 测试集样本 本文模型(w/o type) 本文模型
4 其后十六年而秦灭赵。其后二十馀年,高帝过赵,问“乐毅有后世乎”对曰“有【乐叔|PER】”高帝封之【乐卿|JOB】,号曰华成君。华成君,乐毅之孙也 子(e1e2 任职(e1e2
5 秦使公子少官率师会诸侯逢泽,朝天子。二十一年,齐败魏马陵。二十二年,卫鞅击魏,虏魏公子卬。封【鞅|PER】为【列侯|JOB】,号商君 任职(e1e2 号(e1e2
6 诸将稍稍得出成皋,从汉王。楚遂拔成皋,欲西。汉使兵距之巩,令其不得西。是时,【彭越|PER】渡河击【楚|ORG】东阿,杀楚将军薛公 杀(e2e1 讨伐(e2e1
Relation Classification Results for Different Models
[1] 胡韧奋, 李绅, 诸雨辰. 基于深层语言模型的古汉语知识表示及自动断句研究[J]. 中文信息学报, 2021, 35(4): 8-15.
[1] (Hu Renfen, Li Shen, Zhu Yuchen. Knowledge Representation and Sentence Segmentation of Ancient Chinese Based on Deep Language Models[J]. Journal of Chinese Information Processing, 2021, 35(4): 8-15.)
[2] 唐雪梅, 苏祺, 王军, 等. 基于预训练语言模型的繁体古文自动句读研究[J]. 中文信息学报, 2023, 37(8): 159-168.
[2] (Tang Xuemei, Su Qi, Wang Jun, et al. Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-trained Language Model[J]. Journal of Chinese Information Processing, 2023, 37(8): 159-168.)
[3] Tang X M, Su Q. That Slepen AI the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-Memory[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022:7830-7840.
[4] 张琪, 江川, 纪有书, 等. 面向多领域先秦典籍的分词词性一体化自动标注模型构建[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[4] (Zhang Qi, Jiang Chuan, Ji Youshu, et al. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 2-11.)
[5] Yan C X, Su Q, Wang J. MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts[J]. IEEE Access, 2020, 8: 181629-181639.
doi: 10.1109/Access.6287639
[6] 王一钒, 李博, 史话, 等. 古汉语实体关系联合抽取的标注方法[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[6] (Wang Yifan, Li Bo, Shi Hua, et al. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 63-74.)
[7] 柳润杰. 面向纪传体史书的知识图谱构建与检索的研究[D]. 太原: 中北大学, 2020.
[7] (Liu Runjie. The Construction and Retrieval of Knowledge Graph for the Biographical History Books[D]. Taiyuan: North University of China, 2020.)
[8] 孙玉轩. 古汉语知识图谱的构建方法研究[D]. 大连: 大连理工大学, 2020.
[8] (Sun Yuxuan. Research on the Construction Method of Knowledge Map in Ancient Chinese[D]. Dalian: Dalian University of Technology, 2020.)
[9] 韩立帆, 季紫荆, 陈子睿, 等. 数字人文视域下面向历史古籍的信息抽取方法研究[J]. 大数据, 2022, 8(6): 26-39.
doi: 10.11959/j.issn.2096-0271.2022058
[9] (Han Lifan, Ji Zijing, Chen Zirui, et al. Research on Information Extraction Methods for Historical Classics under the Threshold of Digital Humanities[J]. Big Data Research, 2022, 8(6): 26-39.)
doi: 10.11959/j.issn.2096-0271.2022058
[10] 李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020, 57(7): 1424-1448.
[10] (Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7): 1424-1448.)
[11] 邓擘, 樊孝忠, 杨立公. 用语义模式提取实体关系的方法[J]. 计算机工程, 2007, 33(10): 212-214.
doi: 10.3969/j.issn.1000-3428.2007.10.076
[11] (Deng Bo, Fan Xiaozhong, Yang Ligong. Entity Relation Extraction Method Using Semantic Pattern[J]. Computer Engineering, 2007, 33(10): 212-214.)
doi: 10.3969/j.issn.1000-3428.2007.10.076
[12] Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 1201-1211.
[13] Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers. 2014: 2335-2344.
[14] Zeng D, Liu K, Chen Y, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[15] Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004: 178-181.
[16] 张东东, 彭敦陆. ENT-BERT:结合BERT和实体信息的实体关系分类模型[J]. 小型微型计算机系统, 2020, 41(12): 2557-2562.
[16] (Zhang Dongdong, Peng Dunlu. ENT-BERT: Entity Relation Classification Model Combining BERT and Entity Information[J]. Journal of Chinese Computer Systems, 2020, 41(12): 2557-2562.)
[17] 左亚尧, 易彪, 黎文杰. 融合细粒度实体类型的多特征关系分类算法[J]. 计算机工程与应用, 2022, 58(22): 65-71.
doi: 10.3778/j.issn.1002-8331.2106-0278
[17] (Zuo Yayao, Yi Biao, Li Wenjie. Multi-feature Relationship Classification Algorithm Fused with Fine-Grained Entity Types[J]. Computer Engineering and Applications, 2022, 58(22): 65-71.)
doi: 10.3778/j.issn.1002-8331.2106-0278
[18] 万莹, 孙连英, 赵平, 等. 基于信息增强BERT的关系分类[J]. 中文信息学报, 2021, 35(3): 69-77.
[18] (Wan Ying, Sun Lianying, Zhao Ping, et al. Relation Classification Based on Information Enhanced BERT[J]. Journal of Chinese Information Processing, 2021, 35(3): 69-77.)
[19] Zhong Z X, Chen D Q. A Frustratingly Easy Approach for Entity and Relation Extraction[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 50-61.
[20] Ye D M, Lin Y K, Li P, et al. Packed Levitated Marker for Entity and Relation Extraction[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 4904-4917.
[21] 杨泽. 中国古典文学文本的命名实体识别及知识图谱构建研究[D]. 南京: 南京邮电大学, 2021.
[21] (Yang Ze. Research on Named Entity Recognition and Knowledge Graph Construction of Chinese Classical Literature Texts[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2021.)
[22] 梁科. 《山经》专名的知识图谱构建及价值分析[D]. 北京: 中国社会科学院研究生院, 2021.
[22] (Liang Ke. Knowledge Map Construction and Value Analysis of Proper Names in Mountain Classic[D]. Beijing: Graduate School of Chinese Academy of Social Sciences, 2021.)
[23] 余宏辉. 三国历史战役知识图谱构建研究[D]. 南昌: 江西财经大学, 2021.
[23] (Yu Honghui. The Research on the Construction of Knowledge Graph of Historical Battles in Three Kingdoms Periods[D]. Nanchang: Jiangxi University of Finance and Economics, 2021.)
[24] 张琪. 《史记》多维知识组织与可视化研究[D]. 南京: 南京农业大学, 2020.
[24] (Zhang Qi. Research on Multi-dimensional Knowledge Organization and Visualization of Records of the Grand Historian[D]. Nanjing: Nanjing Agricultural University, 2020.)
[25] 陈晓洁. 基于本体的《左传》战争知识地图构建研究[D]. 南京: 南京农业大学, 2018.
[25] (Chen Xiaojie. Research on the Construction of War Knowledge Map of Zuozhuan Based on Ontology[D]. Nanjing: Nanjing Agricultural University, 2018.)
[26] Li B, Wei J Y, Liu Y, et al. Few-Shot Relation Extraction on Ancient Chinese Documents[J]. Applied Sciences, 2021, 11(24): 12060.
doi: 10.3390/app112412060
[27] Wu S C, He Y F. Enriching Pre-trained Language Model with Entity Information for Relation Classification[OL]. arXiv Preprint, arXiv: 1905.08284.
[28] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[29] Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv:1907.11692.
[30] Soares L B, Fitzgerald N, Ling J, et al. Matching the Blanks: Distributional Similarity for Relation Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019:2895-2905.
[1] Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT:An Empirical Study by Three Typical Tasks[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
[2] Chen Nuo, Li Xuhui. An Event Extraction Method Based on Template Prompt Learning[J]. 数据分析与知识发现, 2023, 7(6): 86-98.
[3] Cui Huanqing, Yang Junzhu, Song Weiqing. Name Disambiguation Based on Similar Features and Relation Graph Optimization[J]. 数据分析与知识发现, 2023, 7(5): 71-80.
[4] Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[5] Zhang Yiqin, Deng Sanhong, Hu Haotian, Wang Dongbo. Identifying Styles of Cross-Language Classics with Pre-Trained Models[J]. 数据分析与知识发现, 2023, 7(10): 50-62.
[6] Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[7] Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[8] Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[9] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[10] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[11] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[12] Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[13] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[14] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[15] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn