Annotation Method for Extracting Entity Relationship from Ancient Chinese Works
Wang Yifan1,Li Bo2,Shi Hua3,Miao Wei1(),Jiang Bin2
1School of Northeast Asia Studies, Shandong University, Weihai 264209, China 2School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai 264209, China 3School of Transborder Studies, Arizona State University, Tucson 85257, USA
[Objective] This paper proposes an annotation method for ancient Chinese datasets, aiming to standardize the annotation procedures. [Objective] We proposed a new method integrating logical semantics, deep learning and history knowledge. This model, which is suitable for few-shot learning, includes three principles of “annotation of relationship valence”, “annotation of propositional logic”, “existence of a single relationship”. [Results] We examined the proposed annotation model with the text dataset of Shiji (Historical Records in Chinese), and found its F1 values for the tasks of relationship extraction and the propositional logic extraction reached 42.02% and 34.07% respectively. [Limitations] The proposed method, which did not include the pre-trained models like BERT or ALBERT, only used the classic Word2Vec model for word embedding. The model's performance could be further improved. [Conclusions] Our new annotation method could effectively extract entity relationship from Ancient Chinese works.
王一钒,李博,史话,苗威,姜斌. 古汉语实体关系联合抽取的标注方法*[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works. Data Analysis and Knowledge Discovery, 2021, 5(9): 63-74.
( Yu Jingsong, Wei Yi, Zhang Yongwei, et al. Word Segmentation for Ancient Chinese Texts Based on Nonparametric Bayesian Models and Deep Learning[J]. Journal of Chinese Information Processing, 2020, 34(6):1-8.)
( Huang Shuiqing, Wang Dongbo, He Lin. Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015, 59(12):135-140.)
( Cui Dandan, Liu Xiulei, Chen Ruoyu, et al. Named Entity Recognition in Field of Ancient Chinese Based on Lattice LSTM[J]. Computer Science, 2020, 47(S2):18-22.)
Marrero M, Urbano J, Sánchez-Cuadrado S, et al. Named Entity Recognition: Fallacies, Challenges and Opportunities[J]. Computer Standards & Interfaces, 2013, 35(5):482-489.
Kumar S. A Survey of Deep Learning Methods for Relation Extraction[OL]. arXiv Preprint, arXiv: 1705.03645.
Zheng S, Wang F, Bao H, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017.
( Cao Mingyu, Yang Zhihao, Luo Ling, et al. Joint Drug Entities and Relations Extraction Based on Neural Networks[J]. Journal of Computer Research and Development, 2019, 56(7):1432-1440.)
Zeng X R, Zeng D J, He S Z, et al. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 506-514.
Devlin J, Chang M, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations [C]//Proceedings of the International Conference on Learning Representations. 2020.
( Li Dongmei, Zhang Yang, Li Dongyuan, et al. Review of Entity Relation Extraction Methods[J]. Journal of Computer Research and Development, 2020, 57(7):1424-1448.)
Mikolov T, Chen K, Corrado G S, et al. Efficient Estimation of Word Representations in Vector Space [C]//Proceedings of the 1st International Conference on Learning Representations. 2013.
Miwa M, Bansal M. End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures [C] //Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1105-1116.
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
Levy O, Goldberg Y, Dagan I. Improving Distributional Similarity with Lessons Learned from Word Embeddings[J]. Transactions of the Association for Computational Linguistics, 2015, 3:211-225.
Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406.1078.
Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]//Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
Nakagawa T, Inui K, Kurohashi S, et al. Dependency Tree-based Sentiment Classification Using CRFs with Hidden Variables [C]//Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010: 786-794.
Bahdanau D, Cho K, Bengio Y, et al. Neural Machine Translation by Jointly Learning to Align and Translate [C]//Proceedings of the 3rd International Conference on Learning Representations. 2015.