Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 318-328    DOI: 10.11925/infotech.2096-3467.2021.0922
Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents
Wang Yongsheng,Wang Hao(),Yu Wei,Zhou Zeyu
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This study proposes a new method to extract relationship among characters from local chronicles, aiming to explore the culture and history information embedded in Yiwu Local Chronicles—Chapter of Persons. [Methods] We constructed the relationship extraction model based on text structures and contents. For text structures, we used the rule templates and word features to extract relationship from the original texts, which was also categorized with different granularity. For the text contents, we introduced a remotely supervised approach to extract relationship. Then, we combined the BERT+Bi-GRU+ATT and BERT+FC deep learning models to transform the relationship extraction to a multi-label classification task. Finally, we reduced the impacts of the noise from remote supervision on the model’s accuracy by correcting relationship labels. [Results] The proposed method realized high automation and yielded better extracted information. The BERT+FC models improved the F1 values by up-to 27%, while different relationship categories showed some affinity. The F1 value of the “strong co-occurrence relationship” was increased by 3% after label correction. [Limitations] We only investigated the relationships among characters in local chronicles. [Conclusions] The new method could effectively extract relationship among the same type of entities in historical Chinese documents.

Key wordsLocal Chronicles      Relationship Extraction      Remote Supervision      BERT      Bi-GRU     
Received: 28 August 2021      Published: 18 February 2022
ZTFLH:  G254  
Fund:National Natural Science Foundation of China(72074108);Fundamental Research Funds for the Central Universities(010814370113)
Corresponding Authors: Wang Hao,ORCID:0000-0002-0131-0823

Cite this article:

Wang Yongsheng, Wang Hao, Yu Wei, Zhou Zeyu. Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 318-328.

Research Route
类别 样例(节选)
人物简介 杨乔,字圣达。高祖杨茂,河东人,随汉光武帝刘秀,······,汉桓帝爱其才貌,欲招其为驸马,乔坚决推辞,但皇命难抗,于是绝食7日而死。
人物传记 骆宾王(619~约684),字观光。骆家塘人。祖父和父亲都是饱学之士。······骆宾王给太常寺卿刘祥道等高官上书陈情,企求引荐。······骆宾王与王勃、杨炯、卢照邻以文词齐名,史称“初唐四杰”。······
Sample Character Relationships
BERT+Bi-GRU+Attention Model
一级关系类 二级关系类 三级关系类
社会关系 上下级关系 君臣
亲好关系 社会亲好
社交关系 朋友
类亲属关系 师徒
亲属关系 亲属关系 祖孙
共现关系 共现关系 共现关系
Multi-granularity Relationship Categories
关系词类别 主关系词 扩展关系词
仕途类 随、请、荐、助、率 追随、跟随、跟从、部下、左右手、器重、保护、举荐
族亲类 父、母、妻、子、孙、兄、弟、祖 祖父、季父、长子、嫁、妻子、伉俪、从弟、堂弟、从兄、从子、从祖、外祖、裔、后裔、曾祖、高祖、从曾祖、六世祖、六世孙、七世孙、九世孙、裔孙、舅父、外甥、年伯、侄、同乡、抚育、后代、亲家
书文类 从、师、事、见、供、入、讨、为、学、同、友、学于 弟子、从师、师事、学生、学文于、行学于、受业于、门下、同舍、同门、帮助、同学、同事、齐名、结识、知己
Relationship Words
关系词类别 关系模板
一般名词([NOUN]) Entity1 [是/为/作/担任][或省略]Entity2 (的) [NOUN]
一般名词([NOUN]) Entity1 [指代词] [NOUN] (是) Entity2
动词([VERB]) Entity1 [VERB] Entity2
动介组合词([VERB_PRON]) Entity1 [VERB_PRON] Entity2
名介组合词([NOUN_PRON]) Entity1 [NOUN_PRON] Entity2
Relationship Rule Templates
类别 结果样例
基于规则模板抽取 <王固>,<胡瑗>,<师徒>,<王固,北宋,字天贶,佛堂蒲潭人,受业于胡瑗。>
基于词特征抽取 <金佛庄>,<吴农华>,<共现>,<吴农华 经 金佛庄 介绍 参加 中国共产党组织>
Example of Relationship Extraction Results Based on Text Structure
模型 一级粒度关系分类结果 二级粒度关系分类结果
社会关系 亲属关系 共现关系 上下级 亲好 社交 类亲属 亲属 共现
BERT+Bi-GRU+ATT 0.97 0.87 0.55 0.73 0.53 0.81 0.73 0.84 0.71
BERT+FC 0.99 0.95 0.80 0.78 0.57 0.82 0.80 0.92 0.83
Results of First/Second Level Granularity Relationship Classification
Tertiary Particle Size Relationship Classification Results
实体对 关系记录 实际关系 预测关系
<虞抟,南轩> 虞抟,父南轩,兄怀德,均精于岐黄之术。 父子 父子
<南轩,怀德> 虞抟,父南轩,兄怀德,均精于岐黄之术。 父子 父子
<虞抟,怀德> 虞抟,父南轩,兄怀德,均精于岐黄之术。 兄弟 父子
<毛泽东,毛岸英> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 父子 父子
<毛泽东,毛岸青> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 父子 父子
<毛岸英,毛岸青> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 兄弟 父子
Sample of Father-Son and Brother Relationship Prediction
属性 样例
Name 杨乔(东汉官员)
BaiduCARD 桓帝时官吏,累官至尚书左丞。乔才貌双全,数上言政事。桓帝欲妻以公主,乔固辞不从,遂不食而死。
BaiduTAG 官员
字号 圣达
所处时代 东汉末
本名 杨乔
籍贯 会稽[今浙江绍兴]
CN-DBpedia Samples
共现关系 样例
强共现关系 <骆俊>,<袁术>,<强共现关系>,<骆俊,字孝远,以孝廉荐举,补任尚书郎,升任陈国相。时群雄并起割据混战,建安二年,袁术称帝,骆俊加强军备加以抗拒,反对袁术称帝。>
弱共现关系 <许谦>,<王顺>,<弱共现关系>,<王顺,元,字性之,许谦弟子。>
Sample of Strong and Weak Co-Occurrence Relationship
Relationship Extraction Results Based on Remote Supervision
实体对 关系记录 实际关系标签 预测关系标签
<叶味道,徐侨> 理宗派叶味道传谕徐侨。 强共现关系 社会关系(62.66%)
<虞德烨,张好一> 虞德烨先擒获苗帅张好一、安松,义释其缚,放还山寨。 强共现关系 社会关系
<陈德钱,陈德清> 革命武装队伍成员有:朱有元、朱有法、朱有富、蒋乌皮、金大春、陈德钱、陈德清、陈三弟、俞卢元等。 强共现关系 社会关系
Prediction Results of Mixed Data
