Continual Learning for One-to-many Entity Relationship Generation with Small Samples
Jiang Yaren,Le Xiaoqiu()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper tries to recognize the one-to-many entity relationship instances (such as inclusion relation,coordination relation) from sentences using small amount of samples, aiming to realize continual learning with new data. [Methods] First, we generated the one-to-many inclusion and coordination entities from sentences using LaserTagger. Then, with the help of position embedding and weighted loss,our model captured more features with limited data. Finally, the model achieved continual learning by model compression and expansion. [Results] Our approach’s SARI was 1% better than those of the baseline models in all tests. The model compression and expansion can effectively retain the learned knowledge on previous data and the SARI was about 16.92% higher than the performance of baseline models. [Limitations] More research is needed to examine the proposed method with more complex data sets. [Conclusions] Our study could effectively identify entity relationship with small amout of training data from different categories.
江雅仁, 乐小虬. 一对多实体关系少样本持续学习方法研究[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples. Data Analysis and Knowledge Discovery, 2021, 5(8): 45-53.
The ADM model is composed of an Eulerian model, a Lagrangian particle dispersion model, and a probabilistic Lagrangian puff model.
The ADM model(an Eulerian model; a Lagrangian particle dispersion model; a probabilistic Lagrangian puff model )
4
E1 be comprised of E2, E3
The WPD is comprised of low-pass filter and high-pass filter.
The WPD(low-pass filter; high-pass filter)
5
E1 such as E2, E3
He used several classifiers such as SVM, NB, DT.
classifier(SVM; NB; DT)
Table 1 5个类别实验数据示例
参数
值
Batch Size
4
最大输入文本长度(max_seq_length)
128
学习率(lr)
3e-5
Dropout
0.1
隐藏层节点数
768
词向量维度
512
Table 2 模型参数
项目
配置
操作系统
Ubuntu 16.04.12
GPU
GeForce RTX 2080 Ti
内存
64GB
Python
Python 3.6.10
TensorFlow
TensorFlow-gpu 1.15.0
Table 3 实验环境
模型
类别1 SARI
类别2 SARI
类别3 SARI
类别4 SARI
类别5 SARI
BERT+Transformer
38.39%
38.50%
20.13%
23.16%
15.49%
LaserTagger
86.03%
87.35%
77.36%
78.99%
69.05%
LaserTagger+加权Loss
86.51%
89.09%
78.21%
80.66%
69.79%
本文模型(LaserTagger+加权Loss+位置编码)
87.77%
89.88%
78.91%
81.96%
70.09%
Table 4 模型少样本学习结果
模型
类别1(SARI)
类别2(SARI)
类别3(SARI)
类别4(SARI)
类别5(SARI)
本文模型_模型不扩展
72.29%
72.91%
64.84%
63.27%
74.48%
本文模型_模型扩展
84.04%
84.24%
73.22%
80.19%
71.67%
Δ
11.75%
11.33%
8.38%
16.92%
-2.81%
Table 5 模型持续学习实验结果
语料/方法
实例
一对多实体关系例句
The sludge train included a gravity thickener, an aerobic digester and belt filter presses, with additional power consumptions of 8 and 11.2 kW respectively.
人工标注结果
The sludge train (gravity thickener; aerobic digester; belt filter presses)
本文模型_模型扩展
The sludge train (gravity thickener; aerobic digester; belt filter presses)
本文模型_模型不扩展
belt filter presses 8; 11(2 kW respectively )
Table 6 一对多实体关系生成结果实例
[1]
Eric M, Sebastian K, Sascha R, et al. Encode, Tag, Realize: High-Precision Text Editing[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019.
[2]
Zhang S C, Wang F, Bao H Y, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1227-1236.
[3]
Li Q, Ji H. Incremental Joint Extraction of Entity Mentions and Relations[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014. DOI: 10.3115/v1/P14-1038.
doi: 10.3115/v1/P14-1038
[4]
Dai D, Xiao X Y, Lyu Y J, et al. Joint Extraction of Entities and Overlapping Relations Using Position-Attentive Sequence Labeling[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 6300-6308.
[5]
Wei Z P, Su J L, Wang Y, et al. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1476-1488.
[6]
Sui D B, Chen Y B, Liu K, et al. Joint Entity and Relation Extraction with Set Prediction Networks[OL]. arXiv Preprint, arXiv:2011.01675.
[7]
Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 3104-3112.
[8]
Zeng X R, Zeng D J, He S Z, et al. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 506-514.
[9]
Nayak T, Ng H T. Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020:8528-8535.
[10]
Zeng D, Zhang H, Liu Q. CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020:9507-9514.
[11]
Wang Y Q, Yao Q M, Kwok J T, et al. Generalizing from a Few Examples: A Survey on Few-shot Learning[J]. ACM Computing Surveys, 2020, 53(3):63.
[12]
Han X, Zhu H, Yu P F, et al. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 4803-4809.
[13]
Gao T Y, Han X, Zhu H, et al. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 6250-6255.
[14]
Snell J, Swersky K, Zemel R. Prototypical Networks for Few-shot Learning[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017: 4077-4087.
[15]
Fritzler A, Logacheva V, Kretov M. Few-shot Classification in Named Entity Recognition Task[C]// Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. 2019: 993-1000.
[16]
Ye Z X, Ling Z H. Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2872-2881.
[17]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[18]
Yang Y, Katiyar A. Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 6365-6375.
[19]
戴尚峰. 少样本关系抽取方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
[19]
( Dai Shangfeng. Research on Few-Shot Relation Extraction Method[D]. Harbin: Harbin Institute of Technology, 2020.)
[20]
Soares L B, FitzGerald N, Ling J, et al. Matching the Blanks: Distributional Similarity for Relation Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2895-2905.
[21]
Yu H Y, Zhang N Y, Deng S M, et al. Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 6399-6410.
[22]
McCloskey M, Cohen N J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem[J]. Psychology of Learning and Motivation, 1989, 24:109-165.
[23]
Xu H, Liu B, Shu L, et al. Lifelong Domain Word Embedding via Meta-Learning[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4510-4516.
[24]
Liu T L, Ungar L, Sedoc J. Continual Learning for Sentence Representations Using Conceptors[C]// Proceedings of the NAACL-HLT 2019. 2019:3274-3279.
[25]
de Masson d'Autume C, Ruder S, Kong L P, et al. Episodic Memory in Lifelong Language Learning[OL]. arXiv Preprint, arXiv: 1906.01076.
[26]
Jong W P. Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature[OL]. arXiv Preprint, arXiv: 2007.03405.
[27]
Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming Catastrophic Forgetting in Neural Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2016, 114(13):3521-3526.
[28]
Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning[OL]. arXiv Preprint, arXiv: 1711.05225.
[29]
Wang X S, Peng Y F, Lu L, et al. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 3462-3471.
[30]
Xu W, Napoles C, Pavlick E, et al. Optimizing Statistical Machine Translation for Text Simplification[J]. Transactions of the Association for Computational Lingus, 2016, 4(4):401-415.