Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (10): 71-80     https://doi.org/10.11925/infotech.2096-3467.2021.0302
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种用于实体关系三元组抽取的位置辅助分步标记方法*
王媛1,时恺泽1,2,牛振东1,3()
1北京理工大学计算机学院 北京 100081
2悉尼科技大学澳大利亚人工智能研究所 悉尼 2007
3北京理工大学图书馆 北京 100081
Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship
Wang Yuan1,Shi Kaize1,2,Niu Zhendong1,3()
1School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
2Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney 2007, Australia
3Beijing Institute of Technology Library, Beijing 100081, China
全文: PDF (1485 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对非结构化文本中的三元组抽取问题,设计能够提升抽取效果并适用于重叠场景的联合抽取模型。【方法】 设计一种基于位置感知的分步标记方法,首先通过标记头尾位置确定主实体,接着在逐一预设的关系属性下标记相应的客实体。为提升抽取效果,在标记过程中引入三重位置辅助信息,并结合前序结果及注意力机制共享底层编码。【结果】 在中文公开数据集DuIE上进行实验,结果表明所提方法优于其他基线方法,F1值达0.886。此外,还通过消融研究对各组件的有效性进行验证。【局限】 标记机制和匹配模式尚未考虑到偶现的嵌套实体问题,有待进一步探索。【结论】 所提联合抽取方法可以妥善解决包括重叠场景在内的三元组抽取问题,模型采用的位置辅助设计对后续研究有借鉴意义。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王媛
时恺泽
牛振东
关键词 联合抽取位置感知分步标记方法    
Abstract

[Objective] This paper designs a joint model for overlapping scenes, aiming to effectively extract triples from unstructured texts. [Methods] We designed a tagging method with position-aware stepwise technique. First, the main entities were determined by tagging their start and end positions. Then, we tagged the corresponding objects under each predefined relations. We also added multiple position-aware information to the tagging procedures. Finally, we shared the encoded sequences with the pre-order results and the attention mechanism. [Results] We examined our new model with DuIE, a Chinese public dataset. The performance of our method is better than those of the baseline models, with an F1 value of 0.886. We also verified the effectiveness of the model’s components through ablation studies. [Limitations] More research is needed to investigate the occasionally nested entities. [Conclusions] The proposed method could effectively address the issues facing triple extraction for overlapping scenes, and provide reference for future studies.

Key wordsJoint Extraction    Position-Aware    Stepwise Tagging Method
收稿日期: 2021-03-26      出版日期: 2021-11-23
ZTFLH:  TP391  
基金资助:*国家重点研发计划项目(2019YFB1406302);国家重点研发计划项目(2019YFB1406303)
通讯作者: 牛振东,ORCID:0000-0002-0576-7572     E-mail: zniu@bit.edu.cn
引用本文:   
王媛, 时恺泽, 牛振东. 一种用于实体关系三元组抽取的位置辅助分步标记方法*[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
Wang Yuan, Shi Kaize, Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship. Data Analysis and Knowledge Discovery, 2021, 5(10): 71-80.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0302      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I10/71
Fig.1  基于位置感知的分步标记抽取流程
Fig.2  基于位置感知的分步标记模型结构
符号 描述
h i 原始编码序列,作为S抽取器中头部标记器的输入序列
y i start 头部标记器中第i个字符的标签分布
y i end 尾部标记器中第i个字符的标签分布
P y i start 头部标记器中第i个字符对应的标签概率分布
P y i end 尾部标记器中第i个字符对应的标签概率分布
start _ tag ( x i ) 头部标记器中第i个字符的预测标签
end _ tag ( x i ) 尾部标记器中第i个字符的预测标签
y ? i start 头部标记器中第i个字符的真实标签
y ? i end 尾部标记器中第i个字符的真实标签
p i s 头部相对位置向量
h i e S抽取器中尾部标记器的输入序列
v s S中所包含的所有字符的平均向量表示
h i sub v s相加至 h i所得的序列, h i sub = { x 1 + v s , x 2 + v s , , x n + v s }
p i sub S相对位置向量
h i ' h i sub p i sub拼接所得的序列, h i ' = [ h i sub ; p i sub ]
h i o OforR抽取器中头部标记器的输入序列
h i oe OforR抽取器中尾部标记器的输入序列
P i start 头部标记器中第i个字符是头部位置的概率
P i end 尾部标记器中第i个字符是尾部位置的概率
Table 1  符号表
实验数据 总数 训练集 测试集
句子 194 747 173 108 21 639
实例 409 795 364 218 45 577
Table 2  实验数据集
编号 示例
1 {"文本":《上位》是2012年由中国书籍出版社出版的一部作品,作者是王清平。"三元组列表":[{"S": "上位","R": "作者","O": "王清平", "S_type": "图书作品","O_type": "人物"},{"S": "上位","R": "出版社","O": "中国书籍出版社"," S_type ": "图书作品","O_type ": "出版社"}]}
Table 3  数据样例
模型 Precision Recall F1
BERT-Based Multi-Head Selection 0.821 0.855 0.837
BERT+LSTM with Softmax Layer 0.837 0.863 0.850
BERT+MLP with Sigmoid Layer 0.855 0.878 0.866
ETL-Span 0.874 0.876 0.875
位置辅助分步标记模型 0.882 0.891 0.886
Table 4  不同模型实验结果
案例1 案例2
文本 1948年张宗祜在北京大学毕业,即到甘肃兰州中国石油公司地质勘探处工作,在中国石油地质先驱者孙健初领导下从事石油地质工作。 瑞安市华仪精密铸造厂座落在瑞安市安阳镇上望界路头村,紧靠飞云江三桥,离市区3公里,该厂创办于一九九零年,注册资本人民币40万元。
真实三元组 (张宗祜,毕业院校,北京大学)(孙健,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇上望界路头村)
误判三元组 (张宗祜,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇)
Table 5  负例分析
探究内容 Precision Recall F1
去除三重位置嵌入 0.846 0.859 0.852
去除Self Attention 0.862 0.883 0.872
将实体类型标签改为1/0 0.867 0.889 0.878
位置辅助分步标记模型 0.882 0.891 0.886
Table 6  消融研究实验结果
[1] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786): 504-507.
pmid: 16873662
[2] Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces [C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 1201-1211.
[3] Hashimoto K, Miwa M, Tsuruoka Y, et al. Simple Customization of Recursive Neural Networks for Semantic Relation Classification [C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1372-1376.
[4] Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network [C]//Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. 2014: 2335-2344.
[5] Santos C N D, Xiang B, Zhou B W. Classifying Relations by Ranking with Convolutional Neural Networks [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015: 626-634.
[6] Wang L L, Cao Z, de Melo G, et al. Relation Classification via Multi-level Attention CNNs [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016: 1298-1307.
[7] 孙建东, 顾秀森, 李彦, 等. 基于COAE2016数据集的中文实体关系抽取算法研究[J]. 山东大学学报(理学版), 2017, 52(9): 7-12.
[7] (Sun Jiandong, Gu Xiusen, Li Yan, et al. Chinese Entity Relation Extraction Algorithms Based on COAE2016 Datasets[J]. Journal of Shandong University (Natural Science), 2017, 52(9): 7-12.)
[8] 高丹, 彭敦陆, 刘丛. 海量法律文书中基于CNN的实体关系抽取技术[J]. 小型微型计算机系统, 2018, 39(5): 1021-1026.
[8] (Gao Dan, Peng Dunlu, Liu Cong. Entity Relation Extraction Based on CNN in Large-scale Text Data[J]. Journal of Chinese Computer Systems, 2018, 39(5): 1021-1026.)
[9] Xu Y, Mou L L, Li G, et al. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1785-1794.
[10] Zhang S, Zheng D Q, Hu X C, et al. Bidirectional Long Short-Term Memory Networks for Relation Classification [C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015: 73-78.
[11] Zhang Y H, Qi P, Manning C D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction[OL]. arXiv Preprint, arXiv: 1809.10185.
[12] Guo Z J, Zhang Y, Lu W. Attention Guided Graph Convolutional Networks for Relation Extraction[OL]. arXiv Preprint, arXiv:1906.07510.
[13] Li Z H, Yang Z H, Shen C, et al. Integrating Shortest Dependency Path and Sentence Sequence into a Deep Learning Framework for Relation Extraction in Clinical Text[J]. BMC Medical Informatics and Decision Making, 2019, 19(1): Article No.22.
[14] 陈宇, 郑德权, 赵铁军. 基于Deep Belief Nets的中文名实体关系抽取[J]. 软件学报, 2012, 23(10): 2572-2585.
[14] (Chen Yu, Zheng Dequan, Zhao Tiejun. Chinese Relation Extraction Based on Deep Belief Nets[J]. Journal of Software, 2012, 23(10): 2572-2585.)
[15] Miwa M, Bansal M. End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[OL]. arXiv Preprint, arXiv: 1601.00770.
[16] Katiyar A, Cardie C. Going Out on a Limb: Joint Extraction of Entity Mentions and Relations Without Dependency Trees [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017: 917-928.
[17] Zheng S C, Wang F, Bao H Y, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[OL]. arXiv Preprint, arXiv: 1706.05075.
[18] Bekoulis G, Deleu J, Demeester T, et al. Joint Entity Recognition and Relation Extraction as a Multi-head Selection Problem[J]. Expert Systems with Applications, 2018, 114: 34-45.
doi: 10.1016/j.eswa.2018.07.032
[19] Wang S L, Zhang Y, Che W X, et al. Joint Extraction of Entities and Relations Based on a Novel Graph Scheme [C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4461-4467.
[20] Fu T J, Li P H, Ma W Y. GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1409-1418.
[21] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[22] Yu B W, Zhang Z Y, Shu X L, et al. Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy[OL]. arXiv Preprint, arXiv:1909.04273.
[23] Wang J, Lu W. Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders[OL]. arXiv:2010.03851.
[24] DuIE Dataset [DS/OL].[2019-09-30]. http://ai.baidu.com/broad/download.
[25] Huang W P, Cheng X Y, Wang T F, et al. BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction [C]//Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham. 2019: 713-723.
[26] Hu Z S, Yin H, Xu G L, et al. An Empirical Study on Joint Entities-Relations Extraction of Chinese Text Based on BERT [C]//Proceedings of the 12th International Conference on Machine Learning and Computing. 2020: 473-478.
[1] 邓志文,都平平,穆亚凤. 基于位置感知的图书馆主动信息服务系统设计*[J]. 现代图书情报技术, 2016, 32(2): 102-110.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn