|
|
Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents |
Wang Yongsheng,Wang Hao( ),Yu Wei,Zhou Zeyu |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This study proposes a new method to extract relationship among characters from local chronicles, aiming to explore the culture and history information embedded in Yiwu Local Chronicles—Chapter of Persons. [Methods] We constructed the relationship extraction model based on text structures and contents. For text structures, we used the rule templates and word features to extract relationship from the original texts, which was also categorized with different granularity. For the text contents, we introduced a remotely supervised approach to extract relationship. Then, we combined the BERT+Bi-GRU+ATT and BERT+FC deep learning models to transform the relationship extraction to a multi-label classification task. Finally, we reduced the impacts of the noise from remote supervision on the model’s accuracy by correcting relationship labels. [Results] The proposed method realized high automation and yielded better extracted information. The BERT+FC models improved the F1 values by up-to 27%, while different relationship categories showed some affinity. The F1 value of the “strong co-occurrence relationship” was increased by 3% after label correction. [Limitations] We only investigated the relationships among characters in local chronicles. [Conclusions] The new method could effectively extract relationship among the same type of entities in historical Chinese documents.
|
Received: 28 August 2021
Published: 18 February 2022
|
|
Fund:National Natural Science Foundation of China(72074108);Fundamental Research Funds for the Central Universities(010814370113) |
Corresponding Authors:
Wang Hao,ORCID:0000-0002-0131-0823
E-mail: ywhaowang@nju.edu.cn
|
[1] |
王宋祥. 非限定类型的实体关系抽取研究[D]. 长沙: 湖南师范大学, 2018.
|
[1] |
( Wang Songxiang. Research on Unrestricted Type Entity Relation Extraction[D]. Changsha: Hunan Normal University, 2018.)
|
[2] |
张世民. 关中理学与史志关系的典型例证——《高陵县续志》[J]. 华夏文化, 2020(2):5-11.
|
[2] |
( Zhang Shimin. A Typical Example of the Relationship Between Science and History in Guanzhong-Gaoling County Continuing Chronicle[J]. Chinese Culture, 2020(2):5-11.)
|
[3] |
Zhou Z Y, Zhang H Y. Research on Entity Relationship Extraction in Financial and Economic Field Based on Deep Learning[C]// Proceedings of the 4th International Conference on Computer and Communications. IEEE, 2018: 2430-2435.
|
[4] |
Rosario B. Extraction of Semantic Relations from Bioscience Text[M]. University of California, Berkeley, 2005.
|
[5] |
Singhal A, Simmons M, Lu Z Y. Text Mining for Precision Medicine: Automating Disease-Mutation Relationship Extraction from Biomedical Literature[J]. Journal of the American Medical Informatics Association, 2016, 23(4):766-772.
doi: 10.1093/jamia/ocw041
pmid: 27121612
|
[6] |
Liang C, Zan H, Liu Y, et al. Research on Entity Relation Extraction for Military Field[C]// Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. 2018.
|
[7] |
卢克治. 基于中医古籍的知识图谱构建与应用[D]. 北京: 北京交通大学, 2020.
|
[7] |
( Lu Kezhi. The Construction and Application of Knowledge Graph Based on the Ancient Books of Traditional Chinese Medicine[D]. Beijing: Beijing Jiaotong University, 2020.)
|
[8] |
李娜, 包平. 方志类古籍中物产名与别名关系的可视化——基于社会网络分析技术视角[J]. 图书馆论坛, 2017, 37(12):108-114.
|
[8] |
( Li Na, Bao Ping. Visual Exploration of the Relationship Between Produce Names and Their Alias in Ancient Local Chronicles[J]. Library Tribune, 2017, 37(12):108-114.)
|
[9] |
黄蓓静. 深度学习技术在中文人物关系抽取中的应用研究[D]. 上海: 华东师范大学, 2017.
|
[9] |
( Huang Beijing. Study on the Application of Deep Learning Technology in Chinese Personal Relation Extraction[D]. Shanghai: East China Normal University, 2017.)
|
[10] |
韩红旗, 徐硕, 桂婕, 等. 基于词形规则模板的术语层次关系抽取方法[J]. 情报学报, 2013, 32(7):708-715.
|
[10] |
( Han Hongqi, Xu Shuo, Gui Jie, et al. Term Hierarchical Relation Extraction Method Based on Morphology Rule Template[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(7):708-715.)
|
[11] |
李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020, 57(7):1424-1448.
|
[11] |
( Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7):1424-1448.)
|
[12] |
刘辉, 江千军, 桂前进, 等. 实体关系抽取技术研究进展综述[J]. 计算机应用研究, 2020, 37(S2):1-5.
|
[12] |
( Liu Hui, Jiang Qianjun, Gui Qianjin, et al. Review of Research Progress of Entity Relationship Extraction[J]. Application Research of Computers, 2020, 37(S2):1-5.)
|
[13] |
张兰霞, 胡文心. 基于双向GRU神经网络和双层注意力机制的中文文本中人物关系抽取研究[J]. 计算机应用与软件, 2018, 35(11):130-135.
|
[13] |
( Zhang Lanxia, Hu Wenxin. Character Relation Extraction in Chinese Text Based on Bidirectional GRU Neural Network and Dual-Attention Mechanism[J]. Computer Applications and Software, 2018, 35(11):130-135.)
|
[14] |
Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2361-2364.
|
[15] |
Yi R L, Hu W X. Pre-Trained BERT-GRU Model for Relation Extraction[C]// Proceedings of the 8th International Conference on Computing and Pattern Recognition. ACM, 2019: 453-457.
|
[16] |
胡欣. 基于网络媒体的人物关系分析方法研究[D]. 成都: 电子科技大学, 2020.
|
[16] |
( Hu Xin. Research on Person Relationship Analysis Method Based on Network Media[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
|
[17] |
谢腾, 杨俊安, 刘辉. 融合多特征BERT模型的中文实体关系抽取[J]. 计算机系统应用, 2021, 30(5):253-261.
|
[17] |
( Xie Teng, Yang Junan, Liu Hui. Chinese Entity Relation Extraction Based on Multi-Feature BERT Model[J]. Computer Systems & Applications, 2021, 30(5):253-261.)
|
[18] |
刘忠宝, 党建飞, 张志剑. 《史记》历史事件自动抽取与事理图谱构建研究[J]. 图书情报工作, 2020, 64(11):116-124.
|
[18] |
( Liu Zhongbao, Dang Jianfei, Zhang Zhijian. Research on Automatic Extraction of Historical Events and Construction of Event Graph Based on Historical Records[J]. Library and Information Service, 2020, 64(11):116-124.)
|
[19] |
李跃艳, 王昊, 孟镇, 等. 基于关联数据的汉语文本语义化描述和展示[J]. 情报理论与实践, 2021, 44(6):171-179.
|
[19] |
( Li Yueyan, Wang Hao, Meng Zhen, et al. Semantic Description and Display of Chinese Text Based on Linked Data[J]. Information Studies: Theory & Application, 2021, 44(6):171-179.)
|
[20] |
王一钒, 李博, 史话, 等. 古汉语实体关系联合抽取的标注方法[J]. 数据分析与知识发现, 2021, 5(9):63-74.
|
[20] |
( Wang Yifan, Li Bo, Shi Hua, et al. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. Data Analysis and Knowledge Discovery, 2021, 5(9):63-74.)
|
[21] |
王晓莉, 叶东毅. 基于字词特征自注意力学习的社交媒体文本分类方法[J]. 模式识别与人工智能, 2020, 33(4):287-294.
|
[21] |
( Wang Xiaoli, Ye Dongyi. Social Media Text Classification Method Based on Character-Word Feature Self-Attention Learning[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(4):287-294.)
|
[22] |
范青, 史中超, 谈国新. 非物质文化遗产的知识图谱构建[J]. 图书馆论坛, 2021, 41(10):100-109.
|
[22] |
( Fan Qing, Shi Zhongchao, Tan Guoxin. Construction of Intangible Cultural Heritage Knowledge Graphs[J]. Library Tribune, 2021, 41(10):100-109.)
|
[23] |
来新夏. 中国地方志的史料价值及其利用[J]. 国家图书馆学刊, 2005(1):5-8.
|
[23] |
( Lai Xinxia. Chinese Local Histories: Historical Values and Utilization[J]. Journal of the National Library of China, 2005(1):5-8.)
|
[24] |
梁启超. 中国近三百年学术史[M]. 北京: 商务印书馆, 2011.
|
[24] |
( Liang Qichao. A History of Chinese Scholarship in the Last Three Centuries[M]. Beijing: The Commercial Press, 2011.)
|
[25] |
李娜. 社会网络分析视角下方志古籍知识组织研究——以《方志物产》山西分卷为例[D]. 南京: 南京农业大学, 2017.
|
[25] |
( Li Na. On the Knowledge Organization of Ancient Local Chronicle from the Perspective of Social Network Analysis—Taking Local Chronicle: Produce of Shanxi for Example[D]. Nanjing: Nanjing Agricultural University, 2017.)
|
[26] |
李娜, 包平. 面向数字人文的馆藏方志古籍地名自动识别模型构建[J]. 图书馆, 2018(5):67-73.
|
[26] |
( Li Na, Bao Ping. Establishment of Automatic Recognition Model of Location Names in Collection of Ancient Local Chronicles Oriented to Digital Humanities[J]. Library, 2018(5):67-73.)
|
[27] |
徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究[J]. 数据分析与知识发现, 2020, 4(8):86-97.
|
[27] |
( Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):86-97.)
|
[28] |
李娜. 面向方志类古籍的多类型命名实体联合自动识别模型构建[J]. 图书馆论坛, 2021, 41(12):113-123.
|
[28] |
( Li Na. Construction of Automatic Recognition Model of Multi-Type Named Entities for Local Gazetteers[J]. Library Tribune, 2021, 41(12):113-123.)
|
[29] |
Mintz M, Bills S, Snow R, et al. Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 1003-1011.
|
[30] |
Zeng D, Liu K, Chen Y, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
|
[31] |
王志邦, 汪志华. 义乌历史的解构与呈现——《义乌市志》读后[J]. 中国地方志, 2013(7):19-24.
|
[31] |
( Wang Zhibang, Wang Zhihua. The Deconstruction and Presentation of Yiwu’s History - “Yiwu City Magazine” After Reading[J]. China Local Records, 2013(7):19-24.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|