[Objective] This paper constructs a framework to extract events from ancient books, which uses the RoBERTa-CRF model to identify event types, argument roles and arguments. [Methods] We collected the war sentences from Zuozhuan as the experimental data, which helped us establish the classification schema for event types and argument roles. Based on the RoBERTa-CRF model, we used the multi-layer transformer to extract the corpus features, which were combined with the sequence tags to learn the correlation constraints. Finally, we identified and extracted the arguments by the tag sequence. [Results] The accuracy, recall and F1 values of the proposed model were 87.6%, 77.2% and 82.1%, which were higher than results of the GuwenBERT-LSTM, Bert-LSTM, RoBERTa-LSTM, Bert-CRF and RoBERTa-CRF on the same dataset. [Limitations] The size of the experimental dataset needs to be expanded, which could make the topic categories more balanced. [Conclusions] The RoBERTa-CRF model constructed in this paper could effectively extract events from ancient Chinese books.
喻雪寒, 何琳, 徐健. 基于RoBERTa-CRF的古文历史事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF. Data Analysis and Knowledge Discovery, 2021, 5(7): 26-35.
(Xia Cuijuan. The Construction of “Data Infrastructure” for Humanities Research: The Methodological Contribution of Library Science to Digital Humanities[J]. Journal of Library Science in China, 2020, 46(3):24-37.)
(Chen Peihui. What Humanities Scholars Can Do in the Construction of Humanities Databases——Taking the Extraction of Kinship Data from Epitaphs in Quansongwen for Example[J]. Library Forum, 2019, 39(5):17-23.)
(Liu Zhongbao, Dang Jianfei, Zhang Zhijian. Research on Automatic Extraction of Historical Events and Construction of Event Graph Based on Historical Records[J]. Library and Information Service, 2020, 64(11):116-124.)
[5]
Riloff E. Automatically Constructing a Dictionary for Information Extraction Tasks[C]// Proceedings of the 11th National Conference on Artificial Intelligence. 1993: 811-816.
[6]
Cohen K B, Verspoor K, Johnson H L, et al. High-precision Biological Event Extraction with a Concept Recognizer[C]// Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, 2009:50-58.
[7]
Arendarenko E, Kakkonen T. Ontology-Based Information and Event Extraction for Business Intelligence[C]// Proceedings of the 15th International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer Berlin Heidelberg, 2012: 89-102.
[8]
陈慧炜. 刑事案件文本信息抽取研究[D]. 南京: 南京师范大学, 2011.
[8]
(Chen Huiwei. Research on Text Information Extraction of Criminal Cases[D]. Nanjing: Nanjing Normal University, 2011.)
(Zhao Wenjuan, Liu Zhongbao, Wang Yongfang. Research on Event Role Annotation Based on Syntactic Dependency Analysis[J]. Information Science, 2017, 35(7):65-69.)
[10]
Chen Y, Xu L, Liu K, et al. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 2015: 167-176.
[11]
Sha L, Qian F, Chang B, et al. Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5916-5923.
[12]
Duan S, He R, Zhao W. Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks[C]// Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017: 352-361.
[13]
阮元. 十三经注疏[M]. 北京: 中华书局, 1980.
[13]
(Ruan Yuan. The Confucian Bible[M]. Beijing: China Publishing House, 1980.)
[14]
李学勤. 春秋左传正义[M]. 北京: 北京大学出版社, 1999.
[14]
(Li Xueqin. The Standard of Chunqiu Zuozhuan[M]. Beijing: Peking University Press, 1999.)
[15]
朱宝庆. 左氏兵法[M]. 西安: 陕西人民出版社, 1991.
[15]
(Zhu Baoqing. Zuo’s Art of War[M]. Xi’an: Shaanxi People’s Publishing House, 1991.)
[16]
中国军事史编写组. 中国历代战争年表[M]. 北京: 解放军出版社, 2003.
[16]
(Compilation Group of Chinese Military History. Chronology of Chinese Wars[M]. Beijing: People’s Liberation Army Press, 2003.)
(Deng Yong. Wang-Ba: Justice and Order——From Wars in Spring-Autumn Period to Universal Justice[D]. Wuhan: Wuhan University, 2007: 270-295.)
[18]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[19]
Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[OL]. arXiv Preprint, arXiv: 1906. 08101.
(Yan Tan. GuwenBERT:a Pre-trained Language Model for Classical Chinese (Literary Chinese) [EB/OL]. [2020-11-22]. https://github.com/Ethan-yt/guwenbert.)