Classifying Ancient Chinese Text Relations with Entity Information
Tang Xuemei1,2,Su Qi2,3(),Wang Jun1,2
1Department of Information Management, Peking University, Beijing 100871, China 2Center for Digital Humanities, Peking University, Beijing 100871, China 3School of Foreign Languages, Peking University, Beijing 100871, China
[Objective] This paper integrates entity information with pre-trained language models, which help us classify ancient Chinese relations. [Methods] Firstly, we utilized special tokens in the input layer of the pre-trained model to mark the positions of entity pairs. We also appended entity-type descriptions following the original relation sentences. Secondly, we extracted semantic information of entities from the output of the pre-trained language model. Thirdly, we employed a CNN model to incorporate positional information of each token relative to the start and end entities into the model. Finally, we concatenated sentence representations, entity semantic representations, and CNN outputs and passed them through a classifier to obtain relation labels. [Results] Compared to pre-trained language models, our new model’s Macro F1 score was 3.5% higher on average. [Limitations] Analysis of the confusion matrix reveals a tendency for errors in predicting relations with the same entity type pairs. [Conclusions] Combining entity information and pre-trained language models enhances the effectiveness of ancient Chinese relation classification.
唐雪梅, 苏祺, 王军. 融合实体信息的古汉语关系分类研究*[J]. 数据分析与知识发现, 2024, 8(1): 114-124.
Tang Xuemei, Su Qi, Wang Jun. Classifying Ancient Chinese Text Relations with Entity Information. Data Analysis and Knowledge Discovery, 2024, 8(1): 114-124.
(Hu Renfen, Li Shen, Zhu Yuchen. Knowledge Representation and Sentence Segmentation of Ancient Chinese Based on Deep Language Models[J]. Journal of Chinese Information Processing, 2021, 35(4): 8-15.)
(Tang Xuemei, Su Qi, Wang Jun, et al. Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-trained Language Model[J]. Journal of Chinese Information Processing, 2023, 37(8): 159-168.)
[3]
Tang X M, Su Q. That Slepen AI the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-Memory[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022:7830-7840.
(Zhang Qi, Jiang Chuan, Ji Youshu, et al. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 2-11.)
[5]
Yan C X, Su Q, Wang J. MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts[J]. IEEE Access, 2020, 8: 181629-181639.
doi: 10.1109/Access.6287639
(Wang Yifan, Li Bo, Shi Hua, et al. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 63-74.)
[7]
柳润杰. 面向纪传体史书的知识图谱构建与检索的研究[D]. 太原: 中北大学, 2020.
[7]
(Liu Runjie. The Construction and Retrieval of Knowledge Graph for the Biographical History Books[D]. Taiyuan: North University of China, 2020.)
[8]
孙玉轩. 古汉语知识图谱的构建方法研究[D]. 大连: 大连理工大学, 2020.
[8]
(Sun Yuxuan. Research on the Construction Method of Knowledge Map in Ancient Chinese[D]. Dalian: Dalian University of Technology, 2020.)
(Han Lifan, Ji Zijing, Chen Zirui, et al. Research on Information Extraction Methods for Historical Classics under the Threshold of Digital Humanities[J]. Big Data Research, 2022, 8(6): 26-39.)
doi: 10.11959/j.issn.2096-0271.2022058
(Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7): 1424-1448.)
(Deng Bo, Fan Xiaozhong, Yang Ligong. Entity Relation Extraction Method Using Semantic Pattern[J]. Computer Engineering, 2007, 33(10): 212-214.)
doi: 10.3969/j.issn.1000-3428.2007.10.076
[12]
Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 1201-1211.
[13]
Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers. 2014: 2335-2344.
[14]
Zeng D, Liu K, Chen Y, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[15]
Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004: 178-181.
(Wan Ying, Sun Lianying, Zhao Ping, et al. Relation Classification Based on Information Enhanced BERT[J]. Journal of Chinese Information Processing, 2021, 35(3): 69-77.)
[19]
Zhong Z X, Chen D Q. A Frustratingly Easy Approach for Entity and Relation Extraction[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 50-61.
[20]
Ye D M, Lin Y K, Li P, et al. Packed Levitated Marker for Entity and Relation Extraction[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 4904-4917.
(Yang Ze. Research on Named Entity Recognition and Knowledge Graph Construction of Chinese Classical Literature Texts[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2021.)
[22]
梁科. 《山经》专名的知识图谱构建及价值分析[D]. 北京: 中国社会科学院研究生院, 2021.
[22]
(Liang Ke. Knowledge Map Construction and Value Analysis of Proper Names in Mountain Classic[D]. Beijing: Graduate School of Chinese Academy of Social Sciences, 2021.)
[23]
余宏辉. 三国历史战役知识图谱构建研究[D]. 南昌: 江西财经大学, 2021.
[23]
(Yu Honghui. The Research on the Construction of Knowledge Graph of Historical Battles in Three Kingdoms Periods[D]. Nanchang: Jiangxi University of Finance and Economics, 2021.)
[24]
张琪. 《史记》多维知识组织与可视化研究[D]. 南京: 南京农业大学, 2020.
[24]
(Zhang Qi. Research on Multi-dimensional Knowledge Organization and Visualization of Records of the Grand Historian[D]. Nanjing: Nanjing Agricultural University, 2020.)
[25]
陈晓洁. 基于本体的《左传》战争知识地图构建研究[D]. 南京: 南京农业大学, 2018.
[25]
(Chen Xiaojie. Research on the Construction of War Knowledge Map of Zuozhuan Based on Ontology[D]. Nanjing: Nanjing Agricultural University, 2018.)
[26]
Li B, Wei J Y, Liu Y, et al. Few-Shot Relation Extraction on Ancient Chinese Documents[J]. Applied Sciences, 2021, 11(24): 12060.
doi: 10.3390/app112412060
[27]
Wu S C, He Y F. Enriching Pre-trained Language Model with Entity Information for Relation Classification[OL]. arXiv Preprint, arXiv: 1905.08284.
[28]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[29]
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv:1907.11692.
[30]
Soares L B, Fitzgerald N, Ling J, et al. Matching the Blanks: Distributional Similarity for Relation Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019:2895-2905.