Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 71-80    DOI: 10.11925/infotech.2096-3467.2021.0302
Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship
Wang Yuan1,Shi Kaize1,2,Niu Zhendong1,3()
1School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
2Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney 2007, Australia
3Beijing Institute of Technology Library, Beijing 100081, China
[Objective] This paper designs a joint model for overlapping scenes, aiming to effectively extract triples from unstructured texts. [Methods] We designed a tagging method with position-aware stepwise technique. First, the main entities were determined by tagging their start and end positions. Then, we tagged the corresponding objects under each predefined relations. We also added multiple position-aware information to the tagging procedures. Finally, we shared the encoded sequences with the pre-order results and the attention mechanism. [Results] We examined our new model with DuIE, a Chinese public dataset. The performance of our method is better than those of the baseline models, with an F1 value of 0.886. We also verified the effectiveness of the model’s components through ablation studies. [Limitations] More research is needed to investigate the occasionally nested entities. [Conclusions] The proposed method could effectively address the issues facing triple extraction for overlapping scenes, and provide reference for future studies.

Key wordsJoint Extraction      Position-Aware      Stepwise Tagging Method     
Received: 26 March 2021      Published: 23 November 2021
ZTFLH:  TP391  
Fund:National Key R&D Program of China(2019YFB1406302);National Key R&D Program of China(2019YFB1406303)
Wang Yuan,Shi Kaize,Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship. Data Analysis and Knowledge Discovery, 2021, 5(10): 71-80.

The Extraction Process Based on Position-Aware Stepwise Tagging Method
The Structure of the Position-Aware Stepwise Tagging Method
符号 描述
h i 原始编码序列,作为S抽取器中头部标记器的输入序列
y i start 头部标记器中第i个字符的标签分布
y i end 尾部标记器中第i个字符的标签分布
P y i start 头部标记器中第i个字符对应的标签概率分布
P y i end 尾部标记器中第i个字符对应的标签概率分布
start _ tag ( x i ) 头部标记器中第i个字符的预测标签
end _ tag ( x i ) 尾部标记器中第i个字符的预测标签
y ? i start 头部标记器中第i个字符的真实标签
y ? i end 尾部标记器中第i个字符的真实标签
p i s 头部相对位置向量
h i e S抽取器中尾部标记器的输入序列
v s S中所包含的所有字符的平均向量表示
h i sub v s相加至 h i所得的序列, h i sub = { x 1 + v s , x 2 + v s , , x n + v s }
p i sub S相对位置向量
h i ' h i sub p i sub拼接所得的序列, h i ' = [ h i sub ; p i sub ]
h i o OforR抽取器中头部标记器的输入序列
h i oe OforR抽取器中尾部标记器的输入序列
P i start 头部标记器中第i个字符是头部位置的概率
P i end 尾部标记器中第i个字符是尾部位置的概率
实验数据 总数 训练集 测试集
句子 194 747 173 108 21 639
实例 409 795 364 218 45 577
编号 示例
1 {"文本":《上位》是2012年由中国书籍出版社出版的一部作品,作者是王清平。"三元组列表":[{"S": "上位","R": "作者","O": "王清平", "S_type": "图书作品","O_type": "人物"},{"S": "上位","R": "出版社","O": "中国书籍出版社"," S_type ": "图书作品","O_type ": "出版社"}]}
Data Sample
模型 Precision Recall F1
BERT-Based Multi-Head Selection 0.821 0.855 0.837
BERT+LSTM with Softmax Layer 0.837 0.863 0.850
BERT+MLP with Sigmoid Layer 0.855 0.878 0.866
ETL-Span 0.874 0.876 0.875
位置辅助分步标记模型 0.882 0.891 0.886
Results of Different Models
案例1 案例2
文本 1948年张宗祜在北京大学毕业,即到甘肃兰州中国石油公司地质勘探处工作,在中国石油地质先驱者孙健初领导下从事石油地质工作。 瑞安市华仪精密铸造厂座落在瑞安市安阳镇上望界路头村,紧靠飞云江三桥,离市区3公里,该厂创办于一九九零年,注册资本人民币40万元。
真实三元组 (张宗祜,毕业院校,北京大学)(孙健,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇上望界路头村)
误判三元组 (张宗祜,国籍,中国) (瑞安市华仪精密铸造厂,总部地点,瑞安市安阳镇)
Bad Case
探究内容 Precision Recall F1
去除三重位置嵌入 0.846 0.859 0.852
去除Self Attention 0.862 0.883 0.872
将实体类型标签改为1/0 0.867 0.889 0.878
位置辅助分步标记模型 0.882 0.891 0.886
Results of Ablation Studies
