Classifying Ancient Chinese Text Relations with Entity Information

doi:10.11925/infotech.2096-3467.2022.1367

Data Analysis and Knowledge Discovery

2024, Vol. 8

Issue (1): 114-124 DOI: 10.11925/infotech.2096-3467.2022.1367

Current Issue | Archive | Adv Search

Classifying Ancient Chinese Text Relations with Entity Information

Tang Xuemei^1,²,Su Qi^2,³(

),Wang Jun^1,²

¹Department of Information Management, Peking University, Beijing 100871, China
²Center for Digital Humanities, Peking University, Beijing 100871, China
³School of Foreign Languages, Peking University, Beijing 100871, China

Download: PDF (1329 KB) HTML ( 14 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper integrates entity information with pre-trained language models, which help us classify ancient Chinese relations. [Methods] Firstly, we utilized special tokens in the input layer of the pre-trained model to mark the positions of entity pairs. We also appended entity-type descriptions following the original relation sentences. Secondly, we extracted semantic information of entities from the output of the pre-trained language model. Thirdly, we employed a CNN model to incorporate positional information of each token relative to the start and end entities into the model. Finally, we concatenated sentence representations, entity semantic representations, and CNN outputs and passed them through a classifier to obtain relation labels. [Results] Compared to pre-trained language models, our new model’s Macro F1 score was 3.5% higher on average. [Limitations] Analysis of the confusion matrix reveals a tendency for errors in predicting relations with the same entity type pairs. [Conclusions] Combining entity information and pre-trained language models enhances the effectiveness of ancient Chinese relation classification.

Key words： Ancient Chinese Relation Extraction Relation Classification Pre-trained Language Model Entity Information

Received: 30 December 2022 Published: 30 March 2023

ZTFLH:

TP391

Fund:National Natural Science Foundation of China(72010107003)

Corresponding Authors: Su Qi，ORCID：0000-0002-4769-2812，E-mail：sukia@pku.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xuemei Tang
	Qi Su
	Jun Wang

Cite this article:

Tang Xuemei, Su Qi, Wang Jun. Classifying Ancient Chinese Text Relations with Entity Information. Data Analysis and Knowledge Discovery, 2024, 8(1): 114-124.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1367 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/114

Model Framework

Datasets Statistics

Hyperparameter Settings

Experimental Results on the C-CLUE Dataset

Comparative Experiment Results of Previous Studies

The Classification F1 Values for Guwen_RoBERTa+entity+type+pe+CNN Model

Relation Classification Confusion Matrix

Relation Classification Results for Different Models

[1]	胡韧奋, 李绅, 诸雨辰. 基于深层语言模型的古汉语知识表示及自动断句研究[J]. 中文信息学报, 2021, 35(4): 8-15.
[1]	(Hu Renfen, Li Shen, Zhu Yuchen. Knowledge Representation and Sentence Segmentation of Ancient Chinese Based on Deep Language Models[J]. Journal of Chinese Information Processing, 2021, 35(4): 8-15.)
[2]	唐雪梅, 苏祺, 王军, 等. 基于预训练语言模型的繁体古文自动句读研究[J]. 中文信息学报, 2023, 37(8): 159-168.
[2]	(Tang Xuemei, Su Qi, Wang Jun, et al. Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-trained Language Model[J]. Journal of Chinese Information Processing, 2023, 37(8): 159-168.)
[3]	Tang X M, Su Q. That Slepen AI the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-Memory[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022:7830-7840.
[4]	张琪, 江川, 纪有书, 等. 面向多领域先秦典籍的分词词性一体化自动标注模型构建[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[4]	(Zhang Qi, Jiang Chuan, Ji Youshu, et al. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 2-11.)
[5]	Yan C X, Su Q, Wang J. MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts[J]. IEEE Access, 2020, 8: 181629-181639. doi: 10.1109/Access.6287639
[6]	王一钒, 李博, 史话, 等. 古汉语实体关系联合抽取的标注方法[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[6]	(Wang Yifan, Li Bo, Shi Hua, et al. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 63-74.)
[7]	柳润杰. 面向纪传体史书的知识图谱构建与检索的研究[D]. 太原: 中北大学, 2020.
[7]	(Liu Runjie. The Construction and Retrieval of Knowledge Graph for the Biographical History Books[D]. Taiyuan: North University of China, 2020.)
[8]	孙玉轩. 古汉语知识图谱的构建方法研究[D]. 大连: 大连理工大学, 2020.
[8]	(Sun Yuxuan. Research on the Construction Method of Knowledge Map in Ancient Chinese[D]. Dalian: Dalian University of Technology, 2020.)
[9]	韩立帆, 季紫荆, 陈子睿, 等. 数字人文视域下面向历史古籍的信息抽取方法研究[J]. 大数据, 2022, 8(6): 26-39. doi: 10.11959/j.issn.2096-0271.2022058
[9]	(Han Lifan, Ji Zijing, Chen Zirui, et al. Research on Information Extraction Methods for Historical Classics under the Threshold of Digital Humanities[J]. Big Data Research, 2022, 8(6): 26-39.) doi: 10.11959/j.issn.2096-0271.2022058
[10]	李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020, 57(7): 1424-1448.
[10]	(Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7): 1424-1448.)
[11]	邓擘, 樊孝忠, 杨立公. 用语义模式提取实体关系的方法[J]. 计算机工程, 2007, 33(10): 212-214. doi: 10.3969/j.issn.1000-3428.2007.10.076
[11]	(Deng Bo, Fan Xiaozhong, Yang Ligong. Entity Relation Extraction Method Using Semantic Pattern[J]. Computer Engineering, 2007, 33(10): 212-214.) doi: 10.3969/j.issn.1000-3428.2007.10.076
[12]	Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 1201-1211.
[13]	Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers. 2014: 2335-2344.
[14]	Zeng D, Liu K, Chen Y, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[15]	Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004: 178-181.
[16]	张东东, 彭敦陆. ENT-BERT:结合BERT和实体信息的实体关系分类模型[J]. 小型微型计算机系统, 2020, 41(12): 2557-2562.
[16]	(Zhang Dongdong, Peng Dunlu. ENT-BERT: Entity Relation Classification Model Combining BERT and Entity Information[J]. Journal of Chinese Computer Systems, 2020, 41(12): 2557-2562.)
[17]	左亚尧, 易彪, 黎文杰. 融合细粒度实体类型的多特征关系分类算法[J]. 计算机工程与应用, 2022, 58(22): 65-71. doi: 10.3778/j.issn.1002-8331.2106-0278
[17]	(Zuo Yayao, Yi Biao, Li Wenjie. Multi-feature Relationship Classification Algorithm Fused with Fine-Grained Entity Types[J]. Computer Engineering and Applications, 2022, 58(22): 65-71.) doi: 10.3778/j.issn.1002-8331.2106-0278
[18]	万莹, 孙连英, 赵平, 等. 基于信息增强BERT的关系分类[J]. 中文信息学报, 2021, 35(3): 69-77.
[18]	(Wan Ying, Sun Lianying, Zhao Ping, et al. Relation Classification Based on Information Enhanced BERT[J]. Journal of Chinese Information Processing, 2021, 35(3): 69-77.)
[19]	Zhong Z X, Chen D Q. A Frustratingly Easy Approach for Entity and Relation Extraction[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 50-61.
[20]	Ye D M, Lin Y K, Li P, et al. Packed Levitated Marker for Entity and Relation Extraction[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 4904-4917.
[21]	杨泽. 中国古典文学文本的命名实体识别及知识图谱构建研究[D]. 南京: 南京邮电大学, 2021.
[21]	(Yang Ze. Research on Named Entity Recognition and Knowledge Graph Construction of Chinese Classical Literature Texts[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2021.)
[22]	梁科. 《山经》专名的知识图谱构建及价值分析[D]. 北京: 中国社会科学院研究生院, 2021.
[22]	(Liang Ke. Knowledge Map Construction and Value Analysis of Proper Names in Mountain Classic[D]. Beijing: Graduate School of Chinese Academy of Social Sciences, 2021.)
[23]	余宏辉. 三国历史战役知识图谱构建研究[D]. 南昌: 江西财经大学, 2021.
[23]	(Yu Honghui. The Research on the Construction of Knowledge Graph of Historical Battles in Three Kingdoms Periods[D]. Nanchang: Jiangxi University of Finance and Economics, 2021.)
[24]	张琪. 《史记》多维知识组织与可视化研究[D]. 南京: 南京农业大学, 2020.
[24]	(Zhang Qi. Research on Multi-dimensional Knowledge Organization and Visualization of Records of the Grand Historian[D]. Nanjing: Nanjing Agricultural University, 2020.)
[25]	陈晓洁. 基于本体的《左传》战争知识地图构建研究[D]. 南京: 南京农业大学, 2018.
[25]	(Chen Xiaojie. Research on the Construction of War Knowledge Map of Zuozhuan Based on Ontology[D]. Nanjing: Nanjing Agricultural University, 2018.)
[26]	Li B, Wei J Y, Liu Y, et al. Few-Shot Relation Extraction on Ancient Chinese Documents[J]. Applied Sciences, 2021, 11(24): 12060. doi: 10.3390/app112412060
[27]	Wu S C, He Y F. Enriching Pre-trained Language Model with Entity Information for Relation Classification[OL]. arXiv Preprint, arXiv: 1905.08284.
[28]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[29]	Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv:1907.11692.
[30]	Soares L B, Fitzgerald N, Ling J, et al. Matching the Blanks: Distributional Similarity for Relation Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019:2895-2905.

[1]	Bao Tong, Zhang Chengzhi. Extracting Chinese Information with ChatGPT：An Empirical Study by Three Typical Tasks[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
[2]	Chen Nuo, Li Xuhui. An Event Extraction Method Based on Template Prompt Learning[J]. 数据分析与知识发现, 2023, 7(6): 86-98.
[3]	Cui Huanqing, Yang Junzhu, Song Weiqing. Name Disambiguation Based on Similar Features and Relation Graph Optimization[J]. 数据分析与知识发现, 2023, 7(5): 71-80.
[4]	Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[5]	Zhang Yiqin, Deng Sanhong, Hu Haotian, Wang Dongbo. Identifying Styles of Cross-Language Classics with Pre-Trained Models[J]. 数据分析与知识发现, 2023, 7(10): 50-62.
[6]	Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[7]	Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
[8]	Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[9]	Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[10]	Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[11]	Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[12]	Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[13]	Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[14]	Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[15]	Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.

Viewed

Full text

Abstract

Cited

Shared

Discussed