Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 63-74    DOI: 10.11925/infotech.2096-3467.2021.0460
Annotation Method for Extracting Entity Relationship from Ancient Chinese Works
Wang Yifan1,Li Bo2,Shi Hua3,Miao Wei1(),Jiang Bin2
1School of Northeast Asia Studies, Shandong University, Weihai 264209, China
2School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai 264209, China
3School of Transborder Studies, Arizona State University, Tucson 85257, USA
[Objective] This paper proposes an annotation method for ancient Chinese datasets, aiming to standardize the annotation procedures. [Objective] We proposed a new method integrating logical semantics, deep learning and history knowledge. This model, which is suitable for few-shot learning, includes three principles of “annotation of relationship valence”, “annotation of propositional logic”, “existence of a single relationship”. [Results] We examined the proposed annotation model with the text dataset of Shiji (Historical Records in Chinese), and found its F1 values for the tasks of relationship extraction and the propositional logic extraction reached 42.02% and 34.07% respectively. [Limitations] The proposed method, which did not include the pre-trained models like BERT or ALBERT, only used the classic Word2Vec model for word embedding. The model's performance could be further improved. [Conclusions] Our new annotation method could effectively extract entity relationship from Ancient Chinese works.

Key wordsNatural Language Processing      Relation Extraction      Sequence Tagging      Shiji     
Received: 10 May 2021      Published: 15 October 2021
ZTFLH:  分类号: TP391  
Fund:*National Social Science Fund of China(17VGB005)
Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works. Data Analysis and Knowledge Discovery, 2021, 5(9): 63-74.

名称 标记 意义
生命/Life LFE 任何与创造、伤害甚至毁灭生命有关的语句核心
社交/Social Contact SCT 任何社交类语句核心
空间/Location LOC 任何有关空间位置的语句核心
政治/Politics POL 任何政治性的语句核心
动作/Action Towards Object ATO 任何与创造、伤害甚至毁灭非生命有关的语句核心
战争/War WAR 任何与战争有关的语句核心
Annotation Method for Parent Class
名称 标记 对照设置(子类别数量/总数)
生命/Life LFE 36/209
社交/Social Contact SCT 49/85
空间/Location LOC 88/171
政治/Politics POL 437/789
动作/Action Towards Object ATO 74/126
战争/War WAR 23/36
Control Group for Parent Class
关系类型 主体 受体 标注位置 原始数据
空间/LOC 黄帝 轩辕之丘 黄帝居轩辕之丘
并列/SJ-BL ,而 黄帝居轩辕之丘,而娶于西陵之女
并列/SJ-BL ,而 黄帝居轩辕之丘,而娶于西陵之女
社交/SCT 黄帝 西陵之女 娶于 黄帝居轩辕之丘,而娶于西陵之女
属性/SX 西陵之女 嫘祖 是为 而娶于西陵之女,是为嫘祖
Example of the Annotation Method for Joint Extraction of Entity and Relationship in Ancient Chinese Field
The Annotation Method for Joint Extraction of Entity and Relationship in Ancient Chinese Field
The Word Embedding-BiGRU-CRF Model
类型 标注数量 类型 标注数量
主体/SBJ 3 283 受体/OBJ 2 328
生命/LFE 209 属性/SX 157
社交/SCT 85 事件属性/SJSX 440
空间/LOC 171 递进/SJ-DJ 810
政治/POL 789 并列/SJ-BL 140
动作/ATO 126 因果/SJ-YG 57
战争/WAR 36 转折/SJ-ZZ 20
Type and Quantity of the Relationship Data
关系类型 准确率 召回率 F1值
SBJ 65.68% 65.88% 65.78%
OBJ 53.29% 43.55% 47.93%
LFE 90.00% 75.00% 81.82%
SCT 16.67% 7.69% 10.53%
LOC 29.63% 20.00% 23.88%
POL 33.57% 36.36% 34.91%
ATO 0.00% 0.00% 0.00%
WAR 71.43% 50.00% 58.82%
SX 56.41% 53.66% 55.00%
SJSX 18.92% 9.21% 12.39%
整体结果 48.10% 38.38% 42.02%
Training Results of Relationship Extraction Model
关系类型 标注条数 子类别数量
SCT 85 52
ATO 126 87
LOC 171 96
POL 789 485
SJSX 440 396
DJ 810 52
BL 140 10
YG 57 16
ZZ 20 6
Number of Propositional Logic Annotations and Subclass
命题逻辑 准确率 召回率 F1
DJ 60.38% 57.55% 58.93%
BL 67.35% 27.73% 39.29%
YG 28.57% 12.50% 17.39%
ZZ 0.00% 0.00% 0.00%
整体结果 42.79% 30.35% 34.07%
Training Results of Propositional Logic Extraction Model
