Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 71-80    DOI: 10.11925/infotech.2096-3467.2021.0302
Current Issue | Archive | Adv Search |
Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship
Wang Yuan1,Shi Kaize1,2,Niu Zhendong1,3()
1School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
2Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney 2007, Australia
3Beijing Institute of Technology Library, Beijing 100081, China
Download: PDF (1485 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper designs a joint model for overlapping scenes, aiming to effectively extract triples from unstructured texts. [Methods] We designed a tagging method with position-aware stepwise technique. First, the main entities were determined by tagging their start and end positions. Then, we tagged the corresponding objects under each predefined relations. We also added multiple position-aware information to the tagging procedures. Finally, we shared the encoded sequences with the pre-order results and the attention mechanism. [Results] We examined our new model with DuIE, a Chinese public dataset. The performance of our method is better than those of the baseline models, with an F1 value of 0.886. We also verified the effectiveness of the model’s components through ablation studies. [Limitations] More research is needed to investigate the occasionally nested entities. [Conclusions] The proposed method could effectively address the issues facing triple extraction for overlapping scenes, and provide reference for future studies.

Key wordsJoint Extraction      Position-Aware      Stepwise Tagging Method     
Received: 26 March 2021      Published: 23 November 2021
ZTFLH:  TP391  
Fund:National Key R&D Program of China(2019YFB1406302);National Key R&D Program of China(2019YFB1406303)
Corresponding Authors: Niu Zhendong     E-mail: zniu@bit.edu.cn

Cite this article:

Wang Yuan,Shi Kaize,Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship. Data Analysis and Knowledge Discovery, 2021, 5(10): 71-80.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0302     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I10/71

The Extraction Process Based on Position-Aware Stepwise Tagging Method
The Structure of the Position-Aware Stepwise Tagging Method
符号 描述
h i 原始编码序列,作为S抽取器中头部标记器的输入序列
y i start 头部标记器中第i个字符的标签分布
y i end 尾部标记器中第i个字符的标签分布
P y i start 头部标记器中第i个字符对应的标签概率分布
P y i end 尾部标记器中第i个字符对应的标签概率分布
start _ tag ( x i ) 头部标记器中第i个字符的预测标签
end _ tag ( x i ) 尾部标记器中第i个字符的预测标签
y ? i start 头部标记器中第i个字符的真实标签
y ? i end 尾部标记器中第i个字符的真实标签
p i s 头部相对位置向量
h i e S抽取器中尾部标记器的输入序列
v s S中所包含的所有字符的平均向量表示
h i sub v s相加至 h i所得的序列, h i sub = { x 1 + v s , x 2 + v s , , x n + v s }
p i sub S相对位置向量
h i ' h i sub p i sub拼接所得的序列, h i ' = [ h i sub ; p i sub ]
h i o OforR抽取器中头部标记器的输入序列
h i oe OforR抽取器中尾部标记器的输入序列
P i start 头部标记器中第i个字符是头部位置的概率
P i end 尾部标记器中第i个字符是尾部位置的概率
Notations
实验数据 总数 训练集 测试集
句子 194 747 173 108 21 639
实例 409 795 364 218 45 577
Dataset
编号 示例
1 {"文本":《上位》是2012年由中国书籍出版社出版的一部作品,作者是王清平。"三元组列表":[{"S": "上位","R": "作者","O": "王清平", "S_type": "图书作品","O_type": "人物"},{"S": "上位","R": "出版社","O": "中国书籍出版社"," S_type ": "图书作品","O_type ": "出版社"}]}
Data Sample
模型 Precision Recall F1
BERT-Based Multi-Head Selection 0.821 0.855 0.837
BERT+LSTM with Softmax Layer 0.837 0.863 0.850
BERT+MLP with Sigmoid Layer 0.855 0.878 0.866
ETL-Span 0.874 0.876 0.875
位置辅助分步标记模型 0.882 0.891 0.886
Results of Different Models
案例1 案例2
文本 1948年张宗祜在北京大学毕业,即到甘肃兰州中国石油公司地质勘探处工作,在中国石油地质先驱者孙健初领导下从事石油地质工作。 瑞安市华仪精密铸造厂座落在瑞安市安阳镇上望界路头村,紧靠飞云江三桥,离市区3公里,该厂创办于一九九零年,注册资本人民币40万元。
真实三元组 (张宗祜,毕业院校,北京大学)(孙健,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇上望界路头村)
误判三元组 (张宗祜,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇)
Bad Case
探究内容 Precision Recall F1
去除三重位置嵌入 0.846 0.859 0.852
去除Self Attention 0.862 0.883 0.872
将实体类型标签改为1/0 0.867 0.889 0.878
位置辅助分步标记模型 0.882 0.891 0.886
Results of Ablation Studies
[1] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786): 504-507.
pmid: 16873662
[2] Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces [C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 1201-1211.
[3] Hashimoto K, Miwa M, Tsuruoka Y, et al. Simple Customization of Recursive Neural Networks for Semantic Relation Classification [C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1372-1376.
[4] Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network [C]//Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. 2014: 2335-2344.
[5] Santos C N D, Xiang B, Zhou B W. Classifying Relations by Ranking with Convolutional Neural Networks [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015: 626-634.
[6] Wang L L, Cao Z, de Melo G, et al. Relation Classification via Multi-level Attention CNNs [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016: 1298-1307.
[7] 孙建东, 顾秀森, 李彦, 等. 基于COAE2016数据集的中文实体关系抽取算法研究[J]. 山东大学学报(理学版), 2017, 52(9): 7-12.
[7] (Sun Jiandong, Gu Xiusen, Li Yan, et al. Chinese Entity Relation Extraction Algorithms Based on COAE2016 Datasets[J]. Journal of Shandong University (Natural Science), 2017, 52(9): 7-12.)
[8] 高丹, 彭敦陆, 刘丛. 海量法律文书中基于CNN的实体关系抽取技术[J]. 小型微型计算机系统, 2018, 39(5): 1021-1026.
[8] (Gao Dan, Peng Dunlu, Liu Cong. Entity Relation Extraction Based on CNN in Large-scale Text Data[J]. Journal of Chinese Computer Systems, 2018, 39(5): 1021-1026.)
[9] Xu Y, Mou L L, Li G, et al. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1785-1794.
[10] Zhang S, Zheng D Q, Hu X C, et al. Bidirectional Long Short-Term Memory Networks for Relation Classification [C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015: 73-78.
[11] Zhang Y H, Qi P, Manning C D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction[OL]. arXiv Preprint, arXiv: 1809.10185.
[12] Guo Z J, Zhang Y, Lu W. Attention Guided Graph Convolutional Networks for Relation Extraction[OL]. arXiv Preprint, arXiv:1906.07510.
[13] Li Z H, Yang Z H, Shen C, et al. Integrating Shortest Dependency Path and Sentence Sequence into a Deep Learning Framework for Relation Extraction in Clinical Text[J]. BMC Medical Informatics and Decision Making, 2019, 19(1): Article No.22.
[14] 陈宇, 郑德权, 赵铁军. 基于Deep Belief Nets的中文名实体关系抽取[J]. 软件学报, 2012, 23(10): 2572-2585.
[14] (Chen Yu, Zheng Dequan, Zhao Tiejun. Chinese Relation Extraction Based on Deep Belief Nets[J]. Journal of Software, 2012, 23(10): 2572-2585.)
[15] Miwa M, Bansal M. End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[OL]. arXiv Preprint, arXiv: 1601.00770.
[16] Katiyar A, Cardie C. Going Out on a Limb: Joint Extraction of Entity Mentions and Relations Without Dependency Trees [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017: 917-928.
[17] Zheng S C, Wang F, Bao H Y, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[OL]. arXiv Preprint, arXiv: 1706.05075.
[18] Bekoulis G, Deleu J, Demeester T, et al. Joint Entity Recognition and Relation Extraction as a Multi-head Selection Problem[J]. Expert Systems with Applications, 2018, 114: 34-45.
doi: 10.1016/j.eswa.2018.07.032
[19] Wang S L, Zhang Y, Che W X, et al. Joint Extraction of Entities and Relations Based on a Novel Graph Scheme [C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4461-4467.
[20] Fu T J, Li P H, Ma W Y. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1409-1418.
[21] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[22] Yu B W, Zhang Z Y, Shu X L, et al. Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy[OL]. arXiv Preprint, arXiv:1909.04273.
[23] Wang J, Lu W. Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders[OL]. arXiv:2010.03851.
[24] DuIE Dataset [DS/OL].[2019-09-30]. http://ai.baidu.com/broad/download.
[25] Huang W P, Cheng X Y, Wang T F, et al. BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction [C]//Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham. 2019: 713-723.
[26] Hu Z S, Yin H, Xu G L, et al. An Empirical Study on Joint Entities-Relations Extraction of Chinese Text Based on BERT [C]//Proceedings of the 12th International Conference on Machine Learning and Computing. 2020: 473-478.
[1] Dai Zhihong,Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[2] Hua Bin,Wu Nuo,He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[3] Yang Chen,Chen Xiaohong,Wang Chuhan,Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[4] Wang Xuefeng, Ren Huichao, Liu Yuqin. Research on the Visualization Method of Drawing Technology Theme Map with Clusters [J]. 数据分析与知识发现, 0, (): 1-.
[5] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[6] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[7] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[8] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[9] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[10] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[11] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[12] Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
[13] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[14] Shen Kejie, Huang Huanting, Hua Bolin. Constructing Knowledge Graph with Public Resumes[J]. 数据分析与知识发现, 2021, 5(7): 81-90.
[15] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn