|
|
Extracting Events from Ancient Books Based on RoBERTa-CRF |
Yu Xuehan,He Lin(),Xu Jian |
College of Information Management, Nanjing Agricultural University, Nanjing 210095, China |
|
|
Abstract [Objective] This paper constructs a framework to extract events from ancient books, which uses the RoBERTa-CRF model to identify event types, argument roles and arguments. [Methods] We collected the war sentences from Zuozhuan as the experimental data, which helped us establish the classification schema for event types and argument roles. Based on the RoBERTa-CRF model, we used the multi-layer transformer to extract the corpus features, which were combined with the sequence tags to learn the correlation constraints. Finally, we identified and extracted the arguments by the tag sequence. [Results] The accuracy, recall and F1 values of the proposed model were 87.6%, 77.2% and 82.1%, which were higher than results of the GuwenBERT-LSTM, Bert-LSTM, RoBERTa-LSTM, Bert-CRF and RoBERTa-CRF on the same dataset. [Limitations] The size of the experimental dataset needs to be expanded, which could make the topic categories more balanced. [Conclusions] The RoBERTa-CRF model constructed in this paper could effectively extract events from ancient Chinese books.
|
Received: 29 January 2021
Published: 11 August 2021
|
|
Fund:Fundamental Research Funds for the Central Universities(SKCX2020006);China Postdoctoral Science Foundation(2020M681652) |
Corresponding Authors:
He Lin,ORCID:0000-0002-4207-3588
E-mail: helin@njau.edu.cn
|
[1] |
夏翠娟. 面向人文研究的“数据基础设施”建设——试论图书馆学对数字人文的方法论贡献[J]. 中国图书馆学报, 2020, 46(3):24-37.
|
[1] |
(Xia Cuijuan. The Construction of “Data Infrastructure” for Humanities Research: The Methodological Contribution of Library Science to Digital Humanities[J]. Journal of Library Science in China, 2020, 46(3):24-37.)
|
[2] |
李章超, 李忠凯, 何琳. 《左传》战争事件抽取技术研究[J]. 图书情报工作, 2020, 64(7):20-29.
|
[2] |
(Li Zhangchao, Li Zhongkai, He Lin. Study on the Extraction Method of War Events in Zuo Zhuan[J]. Library and Information Service, 2020, 64(7):20-29.)
|
[3] |
陈佩辉. 人文数据库建设中人文学者何为——以《全宋文》墓志铭亲属信息提取为例[J]. 图书馆论坛, 2019, 39(5):17-23.
|
[3] |
(Chen Peihui. What Humanities Scholars Can Do in the Construction of Humanities Databases——Taking the Extraction of Kinship Data from Epitaphs in Quansongwen for Example[J]. Library Forum, 2019, 39(5):17-23.)
|
[4] |
刘忠宝, 党建飞, 张志剑. 《史记》历史事件自动抽取与事理图谱构建研究[J]. 图书情报工作, 2020, 64(11):116-124.
|
[4] |
(Liu Zhongbao, Dang Jianfei, Zhang Zhijian. Research on Automatic Extraction of Historical Events and Construction of Event Graph Based on Historical Records[J]. Library and Information Service, 2020, 64(11):116-124.)
|
[5] |
Riloff E. Automatically Constructing a Dictionary for Information Extraction Tasks[C]// Proceedings of the 11th National Conference on Artificial Intelligence. 1993: 811-816.
|
[6] |
Cohen K B, Verspoor K, Johnson H L, et al. High-precision Biological Event Extraction with a Concept Recognizer[C]// Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, 2009:50-58.
|
[7] |
Arendarenko E, Kakkonen T. Ontology-Based Information and Event Extraction for Business Intelligence[C]// Proceedings of the 15th International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer Berlin Heidelberg, 2012: 89-102.
|
[8] |
陈慧炜. 刑事案件文本信息抽取研究[D]. 南京: 南京师范大学, 2011.
|
[8] |
(Chen Huiwei. Research on Text Information Extraction of Criminal Cases[D]. Nanjing: Nanjing Normal University, 2011.)
|
[9] |
赵文娟, 刘忠宝, 王永芳. 基于句法依存分析的事件角色填充研究[J]. 情报科学, 2017, 35(7):65-69.
|
[9] |
(Zhao Wenjuan, Liu Zhongbao, Wang Yongfang. Research on Event Role Annotation Based on Syntactic Dependency Analysis[J]. Information Science, 2017, 35(7):65-69.)
|
[10] |
Chen Y, Xu L, Liu K, et al. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 2015: 167-176.
|
[11] |
Sha L, Qian F, Chang B, et al. Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5916-5923.
|
[12] |
Duan S, He R, Zhao W. Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks[C]// Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017: 352-361.
|
[13] |
阮元. 十三经注疏[M]. 北京: 中华书局, 1980.
|
[13] |
(Ruan Yuan. The Confucian Bible[M]. Beijing: China Publishing House, 1980.)
|
[14] |
李学勤. 春秋左传正义[M]. 北京: 北京大学出版社, 1999.
|
[14] |
(Li Xueqin. The Standard of Chunqiu Zuozhuan[M]. Beijing: Peking University Press, 1999.)
|
[15] |
朱宝庆. 左氏兵法[M]. 西安: 陕西人民出版社, 1991.
|
[15] |
(Zhu Baoqing. Zuo’s Art of War[M]. Xi’an: Shaanxi People’s Publishing House, 1991.)
|
[16] |
中国军事史编写组. 中国历代战争年表[M]. 北京: 解放军出版社, 2003.
|
[16] |
(Compilation Group of Chinese Military History. Chronology of Chinese Wars[M]. Beijing: People’s Liberation Army Press, 2003.)
|
[17] |
邓勇. 王霸: 正义与秩序——从春秋战争到普遍正义[D]. 武汉:武汉大学, 2007: 270-295.
|
[17] |
(Deng Yong. Wang-Ba: Justice and Order——From Wars in Spring-Autumn Period to Universal Justice[D]. Wuhan: Wuhan University, 2007: 270-295.)
|
[18] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
|
[19] |
Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[OL]. arXiv Preprint, arXiv: 1906. 08101.
|
[20] |
阎覃. GuwenBERT:古文预训练语言模型(古文BERT)[EB/OL]. [2020-11-22]. https://github.com/Ethan-yt/guwenbert.
|
[20] |
(Yan Tan. GuwenBERT:a Pre-trained Language Model for Classical Chinese (Literary Chinese) [EB/OL]. [2020-11-22]. https://github.com/Ethan-yt/guwenbert.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|