Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 318-328    DOI: 10.11925/infotech.2096-3467.2021.0922
Current Issue | Archive | Adv Search |
Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents
Wang Yongsheng,Wang Hao(),Yu Wei,Zhou Zeyu
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF (1255 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study proposes a new method to extract relationship among characters from local chronicles, aiming to explore the culture and history information embedded in Yiwu Local Chronicles—Chapter of Persons. [Methods] We constructed the relationship extraction model based on text structures and contents. For text structures, we used the rule templates and word features to extract relationship from the original texts, which was also categorized with different granularity. For the text contents, we introduced a remotely supervised approach to extract relationship. Then, we combined the BERT+Bi-GRU+ATT and BERT+FC deep learning models to transform the relationship extraction to a multi-label classification task. Finally, we reduced the impacts of the noise from remote supervision on the model’s accuracy by correcting relationship labels. [Results] The proposed method realized high automation and yielded better extracted information. The BERT+FC models improved the F1 values by up-to 27%, while different relationship categories showed some affinity. The F1 value of the “strong co-occurrence relationship” was increased by 3% after label correction. [Limitations] We only investigated the relationships among characters in local chronicles. [Conclusions] The new method could effectively extract relationship among the same type of entities in historical Chinese documents.

Key wordsLocal Chronicles      Relationship Extraction      Remote Supervision      BERT      Bi-GRU     
Received: 28 August 2021      Published: 18 February 2022
ZTFLH:  G254  
Fund:National Natural Science Foundation of China(72074108);Fundamental Research Funds for the Central Universities(010814370113)
Corresponding Authors: Wang Hao,ORCID:0000-0002-0131-0823     E-mail: ywhaowang@nju.edu.cn

Cite this article:

Wang Yongsheng, Wang Hao, Yu Wei, Zhou Zeyu. Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 318-328.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0922     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/318

Research Route
类别 样例(节选)
人物简介 杨乔,字圣达。高祖杨茂,河东人,随汉光武帝刘秀,······,汉桓帝爱其才貌,欲招其为驸马,乔坚决推辞,但皇命难抗,于是绝食7日而死。
人物传记 骆宾王(619~约684),字观光。骆家塘人。祖父和父亲都是饱学之士。······骆宾王给太常寺卿刘祥道等高官上书陈情,企求引荐。······骆宾王与王勃、杨炯、卢照邻以文词齐名,史称“初唐四杰”。······
Sample Character Relationships
BERT+Bi-GRU+Attention Model
BERT+FC Model
一级关系类 二级关系类 三级关系类
社会关系 上下级关系 君臣
将臣
赏识
社会敌对
政治敌对
亲好关系 社会亲好
政治亲好
社交关系 朋友
战友
同事
同僚
类亲属关系 师徒
亲属关系 亲属关系 祖孙
兄弟
父子
其他亲属
共现关系 共现关系 共现关系
Multi-granularity Relationship Categories
关系词类别 主关系词 扩展关系词
仕途类 随、请、荐、助、率 追随、跟随、跟从、部下、左右手、器重、保护、举荐
族亲类 父、母、妻、子、孙、兄、弟、祖 祖父、季父、长子、嫁、妻子、伉俪、从弟、堂弟、从兄、从子、从祖、外祖、裔、后裔、曾祖、高祖、从曾祖、六世祖、六世孙、七世孙、九世孙、裔孙、舅父、外甥、年伯、侄、同乡、抚育、后代、亲家
书文类 从、师、事、见、供、入、讨、为、学、同、友、学于 弟子、从师、师事、学生、学文于、行学于、受业于、门下、同舍、同门、帮助、同学、同事、齐名、结识、知己
Relationship Words
关系词类别 关系模板
一般名词([NOUN]) Entity1 [是/为/作/担任][或省略]Entity2 (的) [NOUN]
一般名词([NOUN]) Entity1 [指代词] [NOUN] (是) Entity2
动词([VERB]) Entity1 [VERB] Entity2
动介组合词([VERB_PRON]) Entity1 [VERB_PRON] Entity2
名介组合词([NOUN_PRON]) Entity1 [NOUN_PRON] Entity2
Relationship Rule Templates
类别 结果样例
基于规则模板抽取 <王固>,<胡瑗>,<师徒>,<王固,北宋,字天贶,佛堂蒲潭人,受业于胡瑗。>
基于词特征抽取 <金佛庄>,<吴农华>,<共现>,<吴农华 经 金佛庄 介绍 参加 中国共产党组织>
Example of Relationship Extraction Results Based on Text Structure
模型 一级粒度关系分类结果 二级粒度关系分类结果
社会关系 亲属关系 共现关系 上下级 亲好 社交 类亲属 亲属 共现
BERT+Bi-GRU+ATT 0.97 0.87 0.55 0.73 0.53 0.81 0.73 0.84 0.71
BERT+FC 0.99 0.95 0.80 0.78 0.57 0.82 0.80 0.92 0.83
Results of First/Second Level Granularity Relationship Classification
Tertiary Particle Size Relationship Classification Results
实体对 关系记录 实际关系 预测关系
<虞抟,南轩> 虞抟,父南轩,兄怀德,均精于岐黄之术。 父子 父子
<南轩,怀德> 虞抟,父南轩,兄怀德,均精于岐黄之术。 父子 父子
<虞抟,怀德> 虞抟,父南轩,兄怀德,均精于岐黄之术。 兄弟 父子
父子:80.37%
兄弟:27.57%
<毛泽东,毛岸英> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 父子 父子
<毛泽东,毛岸青> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 父子 父子
<毛岸英,毛岸青> 冯雪峰为毛泽东寻找到失落的儿子毛岸英和毛岸青。 兄弟 父子
父子:54.68%
兄弟:26.43%
Sample of Father-Son and Brother Relationship Prediction
属性 样例
Name 杨乔(东汉官员)
BaiduCARD 桓帝时官吏,累官至尚书左丞。乔才貌双全,数上言政事。桓帝欲妻以公主,乔固辞不从,遂不食而死。
BaiduTAG 官员
字号 圣达
所处时代 东汉末
本名 杨乔
籍贯 会稽[今浙江绍兴]
CN-DBpedia Samples
共现关系 样例
强共现关系 <骆俊>,<袁术>,<强共现关系>,<骆俊,字孝远,以孝廉荐举,补任尚书郎,升任陈国相。时群雄并起割据混战,建安二年,袁术称帝,骆俊加强军备加以抗拒,反对袁术称帝。>
弱共现关系 <许谦>,<王顺>,<弱共现关系>,<王顺,元,字性之,许谦弟子。>
Sample of Strong and Weak Co-Occurrence Relationship
Relationship Extraction Results Based on Remote Supervision
实体对 关系记录 实际关系标签 预测关系标签
<叶味道,徐侨> 理宗派叶味道传谕徐侨。 强共现关系 社会关系(62.66%)
<虞德烨,张好一> 虞德烨先擒获苗帅张好一、安松,义释其缚,放还山寨。 强共现关系 社会关系
(75.35%)
<陈德钱,陈德清> 革命武装队伍成员有:朱有元、朱有法、朱有富、蒋乌皮、金大春、陈德钱、陈德清、陈三弟、俞卢元等。 强共现关系 社会关系
(67.62%)
Prediction Results of Mixed Data
[1] 王宋祥. 非限定类型的实体关系抽取研究[D]. 长沙: 湖南师范大学, 2018.
[1] ( Wang Songxiang. Research on Unrestricted Type Entity Relation Extraction[D]. Changsha: Hunan Normal University, 2018.)
[2] 张世民. 关中理学与史志关系的典型例证——《高陵县续志》[J]. 华夏文化, 2020(2):5-11.
[2] ( Zhang Shimin. A Typical Example of the Relationship Between Science and History in Guanzhong-Gaoling County Continuing Chronicle[J]. Chinese Culture, 2020(2):5-11.)
[3] Zhou Z Y, Zhang H Y. Research on Entity Relationship Extraction in Financial and Economic Field Based on Deep Learning[C]// Proceedings of the 4th International Conference on Computer and Communications. IEEE, 2018: 2430-2435.
[4] Rosario B. Extraction of Semantic Relations from Bioscience Text[M]. University of California, Berkeley, 2005.
[5] Singhal A, Simmons M, Lu Z Y. Text Mining for Precision Medicine: Automating Disease-Mutation Relationship Extraction from Biomedical Literature[J]. Journal of the American Medical Informatics Association, 2016, 23(4):766-772.
doi: 10.1093/jamia/ocw041 pmid: 27121612
[6] Liang C, Zan H, Liu Y, et al. Research on Entity Relation Extraction for Military Field[C]// Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. 2018.
[7] 卢克治. 基于中医古籍的知识图谱构建与应用[D]. 北京: 北京交通大学, 2020.
[7] ( Lu Kezhi. The Construction and Application of Knowledge Graph Based on the Ancient Books of Traditional Chinese Medicine[D]. Beijing: Beijing Jiaotong University, 2020.)
[8] 李娜, 包平. 方志类古籍中物产名与别名关系的可视化——基于社会网络分析技术视角[J]. 图书馆论坛, 2017, 37(12):108-114.
[8] ( Li Na, Bao Ping. Visual Exploration of the Relationship Between Produce Names and Their Alias in Ancient Local Chronicles[J]. Library Tribune, 2017, 37(12):108-114.)
[9] 黄蓓静. 深度学习技术在中文人物关系抽取中的应用研究[D]. 上海: 华东师范大学, 2017.
[9] ( Huang Beijing. Study on the Application of Deep Learning Technology in Chinese Personal Relation Extraction[D]. Shanghai: East China Normal University, 2017.)
[10] 韩红旗, 徐硕, 桂婕, 等. 基于词形规则模板的术语层次关系抽取方法[J]. 情报学报, 2013, 32(7):708-715.
[10] ( Han Hongqi, Xu Shuo, Gui Jie, et al. Term Hierarchical Relation Extraction Method Based on Morphology Rule Template[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(7):708-715.)
[11] 李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020, 57(7):1424-1448.
[11] ( Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7):1424-1448.)
[12] 刘辉, 江千军, 桂前进, 等. 实体关系抽取技术研究进展综述[J]. 计算机应用研究, 2020, 37(S2):1-5.
[12] ( Liu Hui, Jiang Qianjun, Gui Qianjin, et al. Review of Research Progress of Entity Relationship Extraction[J]. Application Research of Computers, 2020, 37(S2):1-5.)
[13] 张兰霞, 胡文心. 基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J]. 计算机应用与软件, 2018, 35(11):130-135.
[13] ( Zhang Lanxia, Hu Wenxin. Character Relation Extraction in Chinese Text Based on Bidirectional GRU Neural Network and Dual-Attention Mechanism[J]. Computer Applications and Software, 2018, 35(11):130-135.)
[14] Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2361-2364.
[15] Yi R L, Hu W X. Pre-Trained BERT-GRU Model for Relation Extraction[C]// Proceedings of the 8th International Conference on Computing and Pattern Recognition. ACM, 2019: 453-457.
[16] 胡欣. 基于网络媒体的人物关系分析方法研究[D]. 成都: 电子科技大学, 2020.
[16] ( Hu Xin. Research on Person Relationship Analysis Method Based on Network Media[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
[17] 谢腾, 杨俊安, 刘辉. 融合多特征BERT模型的中文实体关系抽取[J]. 计算机系统应用, 2021, 30(5):253-261.
[17] ( Xie Teng, Yang Junan, Liu Hui. Chinese Entity Relation Extraction Based on Multi-Feature BERT Model[J]. Computer Systems & Applications, 2021, 30(5):253-261.)
[18] 刘忠宝, 党建飞, 张志剑. 《史记》历史事件自动抽取与事理图谱构建研究[J]. 图书情报工作, 2020, 64(11):116-124.
[18] ( Liu Zhongbao, Dang Jianfei, Zhang Zhijian. Research on Automatic Extraction of Historical Events and Construction of Event Graph Based on Historical Records[J]. Library and Information Service, 2020, 64(11):116-124.)
[19] 李跃艳, 王昊, 孟镇, 等. 基于关联数据的汉语文本语义化描述和展示[J]. 情报理论与实践, 2021, 44(6):171-179.
[19] ( Li Yueyan, Wang Hao, Meng Zhen, et al. Semantic Description and Display of Chinese Text Based on Linked Data[J]. Information Studies: Theory & Application, 2021, 44(6):171-179.)
[20] 王一钒, 李博, 史话, 等. 古汉语实体关系联合抽取的标注方法[J]. 数据分析与知识发现, 2021, 5(9):63-74.
[20] ( Wang Yifan, Li Bo, Shi Hua, et al. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. Data Analysis and Knowledge Discovery, 2021, 5(9):63-74.)
[21] 王晓莉, 叶东毅. 基于字词特征自注意力学习的社交媒体文本分类方法[J]. 模式识别与人工智能, 2020, 33(4):287-294.
[21] ( Wang Xiaoli, Ye Dongyi. Social Media Text Classification Method Based on Character-Word Feature Self-Attention Learning[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(4):287-294.)
[22] 范青, 史中超, 谈国新. 非物质文化遗产的知识图谱构建[J]. 图书馆论坛, 2021, 41(10):100-109.
[22] ( Fan Qing, Shi Zhongchao, Tan Guoxin. Construction of Intangible Cultural Heritage Knowledge Graphs[J]. Library Tribune, 2021, 41(10):100-109.)
[23] 来新夏. 中国地方志的史料价值及其利用[J]. 国家图书馆学刊, 2005(1):5-8.
[23] ( Lai Xinxia. Chinese Local Histories: Historical Values and Utilization[J]. Journal of the National Library of China, 2005(1):5-8.)
[24] 梁启超. 中国近三百年学术史[M]. 北京: 商务印书馆, 2011.
[24] ( Liang Qichao. A History of Chinese Scholarship in the Last Three Centuries[M]. Beijing: The Commercial Press, 2011.)
[25] 李娜. 社会网络分析视角下方志古籍知识组织研究——以《方志物产》山西分卷为例[D]. 南京: 南京农业大学, 2017.
[25] ( Li Na. On the Knowledge Organization of Ancient Local Chronicle from the Perspective of Social Network Analysis—Taking Local Chronicle: Produce of Shanxi for Example[D]. Nanjing: Nanjing Agricultural University, 2017.)
[26] 李娜, 包平. 面向数字人文的馆藏方志古籍地名自动识别模型构建[J]. 图书馆, 2018(5):67-73.
[26] ( Li Na, Bao Ping. Establishment of Automatic Recognition Model of Location Names in Collection of Ancient Local Chronicles Oriented to Digital Humanities[J]. Library, 2018(5):67-73.)
[27] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究[J]. 数据分析与知识发现, 2020, 4(8):86-97.
[27] ( Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):86-97.)
[28] 李娜. 面向方志类古籍的多类型命名实体联合自动识别模型构建[J]. 图书馆论坛, 2021, 41(12):113-123.
[28] ( Li Na. Construction of Automatic Recognition Model of Multi-Type Named Entities for Local Gazetteers[J]. Library Tribune, 2021, 41(12):113-123.)
[29] Mintz M, Bills S, Snow R, et al. Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 1003-1011.
[30] Zeng D, Liu K, Chen Y, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[31] 王志邦, 汪志华. 义乌历史的解构与呈现——《义乌市志》读后[J]. 中国地方志, 2013(7):19-24.
[31] ( Wang Zhibang, Wang Zhihua. The Deconstruction and Presentation of Yiwu’s History - “Yiwu City Magazine” After Reading[J]. China Local Records, 2013(7):19-24.)
[1] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[2] Guo Hangcheng, He Yanqing, Lan Tian, Wu Zhenfeng, Dong Cheng. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[3] Xie Xingyu, Yu Bengong. Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[4] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[5] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[6] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[7] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[8] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[9] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[10] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[11] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[12] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[13] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[14] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[15] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn