Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (9): 8-15    DOI: 10.11925/infotech.2096-3467.2017.09.01
Orginal Article Current Issue | Archive | Adv Search |
Extracting Entity Relationship with Word Embedding Representation Features
Zhang Qin1,2(), Guo Hongmei1, Zhang Zhixiong1,3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071, China
Download: PDF (464 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study explores the word embedding representation features for entity relationship extraction, aiming to add semantic message to the existing methods. [Methods] First, we used the feature characteristics at word embedding representation, the vocabulary and the grammar levels to extract relations using Naive Bayesian, Decision Tree and Random Forest models. Then, we obtained the optimal subset of the full features. [Results] The accuracy of the Decision Tree algorithm was 0.48 with full features, which was the best. The F1 score of Member-Collection (E2, E1) was 0.70, and the dependency could help us extract the relations. [Limitations] We need to improve the relation extraction results with small sample size and complex situation. The word vector training method could be further optimized. [Conclusions] This study proves the effectiveness of three types of features. And the word embedding representation level feature plays an important role to extract relations.

Key wordsRelation Extraction      Word Embedding Representation      Word2Vec     
Received: 15 June 2017      Published: 18 October 2017
ZTFLH:  TP393  

Cite this article:

Zhang Qin,Guo Hongmei,Zhang Zhixiong. Extracting Entity Relationship with Word Embedding Representation Features. Data Analysis and Knowledge Discovery, 2017, 1(9): 8-15.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.09.01     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I9/8

特征类别 特征 特征描述
词汇 HE1 实体E1的首单词
HE2 实体E2的首单词
BNULL 当实体间没有单词时, 取值为1, 否则为-1
BO 当实体间仅有一个单词时, 取值为该单词, 否则为-1
BF 当实体间至少有两个单词时, 实体间的第一个单词
BL 当实体间至少有两个单词时, 实体间的最后一个单词
E1F 实体E1之前的第一个单词
E1S 实体E1之前的第二个单词
E2F 实体E2之后的第一个单词
E2S 实体E2之后的第二个单词
类型 E1T 实体E1的类型
E2T 实体E2的类型
数量 BE 两个实体之间的实体数量
BW 两个实体之间的单词数量
序号 关系类型 样本数量 占比
(%)
训练集 测试集 总和
1 Component-Whole(E2, E1) 472 150 622 5.80
2 Component-Whole(E1, E2) 470 162 632 5.90
3 Member-Collection(E2, E1) 612 201 813 7.59
4 Member-Collection(E1, E2) 78 32 110 1.03
5 Entity-Origin(E1, E2) 568 211 779 7.27
6 Entity-Origin(E2, E1) 148 47 195 1.82
7 Entity-Destination(E2, E1) 1 1 2 0.02
8 Entity-Destination(E1, E2) 844 291 1 135 10.59
9 Product-Producer(E1, E2) 323 108 431 4.02
10 Product-Producer(E2, E1) 396 123 519 4.84
11 Message-Topic(E2, E1) 144 51 195 1.82
12 Message-Topic(E1, E2) 490 210 700 6.53
13 Content-Container(E2, E1) 166 39 205 1.91
14 Content-Container(E1, E2) 374 153 527 4.92
15 Instrument-Agency(E1, E2) 97 22 119 1.11
16 Instrument-Agency(E2, E1) 407 134 541 5.05
17 Cause-Effect(E1, E2) 344 134 478 4.46
18 Cause-Effect(E2, E1) 659 194 853 7.96
19 Other 1 407 454 1 861 17.36
分类器 P R F1
朴素贝叶斯模型 0.21 0.21 0.15
决策树模型 0.48 0.47 0.47
随机森林模型 0.45 0.45 0.44
关系类型序号 P R F1
1 0.35 0.30 0.32
2 0.51 0.46 0.49
3 0.67 0.73 0.70
4 0.43 0.31 0.36
5 0.69 0.49 0.57
6 0.38 0.30 0.33
7 0.00 0.00 0.00
8 0.67 0.65 0.66
9 0.42 0.42 0.42
10 0.30 0.30 0.30
11 0.20 0.20 0.20
12 0.39 0.40 0.39
13 0.61 0.64 0.62
14 0.61 0.56 0.58
15 0.07 0.14 0.09
16 0.28 0.30 0.29
17 0.62 0.61 0.61
18 0.61 0.68 0.65
19 0.28 0.31 0.29
排序 特征 分数 特征类型
1 DE2 0.0178 语法特征
2 HE1 0.0152 词汇特征
3 HE2 0.0104 词汇特征
4 BNULL 0.0081 词汇特征
5 R2 0.0078 语法特征
6 BW 0.0056 词汇特征
7 DE1 0.0053 语法特征
8 BL 0.0051 词汇特征
9 BF 0.0049 词汇特征
10 WE1 0.0045 词嵌入特征
11 POS2 0.0040 语法特征
12 R1 0.0037 语法特征
13 POS1 0.0031 语法特征
14 POSD2 0.0031 语法特征
15 D(E1, E2) 0.0030 词嵌入特征
16 WE2 0.0027 词嵌入特征
17 POSD1 0.0023 语法特征
18 E2S 0.0022 词汇特征
19 WE12 0.0015 词嵌入特征
20 E1F 0.0012 词汇特征
21 E2F 0.0010 词汇特征
22 E2T 0.0009 词汇特征
23 E1T 0.0003 词汇特征
24 BE 0.0002 词汇特征
25 BO -0.0008 词汇特征
26 S(E1, E2) -0.0009 词嵌入特征
27 E1S -0.0032 词汇特征
分类器 P R F1
朴素贝叶斯模型 0.16 0.16 0.13
决策树模型 0.44 0.43 0.43
随机森林模型 0.38 0.38 0.37
[1] Bunescu R C, Mooney R J.Subsequence Kernels for Relation Extraction[C]//Proceeding of the 18th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2005: 171-178.
[2] Zelenko D, Aone C, Richardella A.Kernel Methods for Relation Extraction[J]. The Journal of Machine Learning Research, 2003, 3(3): 1083-1106.
doi: 10.3115/1118693.1118703
[3] Culotta A, Sorensen J.Dependency Tree Kernels for Relation Extraction[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. USA: ACL, 2004: 423-429.
[4] Bunescu R C, Mooney R J.A Shortest Path Dependency Kernel for Relation Extraction[C]// Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. USA: ACL, 2005: 724-731.
[5] 郭剑毅, 陈鹏, 余正涛, 等. 基于多核融合的中文领域实体关系抽取[J]. 中文信息学报, 2016, 30(1): 24-29.
[5] (Guo Jianyi, Chen Peng, Yu Zhengtao, et al.Domain Specific Chinese Semantic Relation Extraction Based on Composite Kernel[J]. Journal of Chinese Information Processing, 2016, 30(1): 24-29.)
[6] Xiang Y, Wang X L, Zhang Y Y, et al.Distant Supervision for Relation Extraction via Group Selection[C]// Proceedings of the 22nd International Conference on Neural Information Processing (Part II). USA: Springer, 2015: 250-258.
[7] Mintz M, Bills S, Snow R, et al.Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. USA: ACL, 2009: 1003-1011.
[8] Banko M, Cafarella M J, Soderland S, et al.Open Information Extraction from the Web[C]// Proceedings of the 20th International Joint Conference on Artificial Intelligence. USA: Morgan Kaufmann Publishers, 2007: 2670-2676.
[9] Wu F, Weld D S.Open Information Extraction Using Wikipedia[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. USA: ACL, 2010: 118-127.
[10] Fader A, Soderland S, Etzioni O.Identifying Relations for Open Information Extraction[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. USA: ACL, 2011: 1535-1545.
[11] Kambhatla N. Combining Lexical, Syntactic and Semantic Features with Maximum Entropy Models for Extracting Relations [C]// Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. USA: ACL, 2004: Article No. 22.
[12] Zhou G D, Su J, Zhang J, et al.Exploring Various Knowledge in Relation Extraction[C]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. USA: ACL, 2005: 427-434.
[13] 高俊平, 张晖, 赵旭剑, 等. 面向维基百科的领域知识演化关系抽取[J]. 计算机学报, 2016, 39(10): 2088-2101.
[13] (Gao Junping, Zhang Hui, Zhao Xujian, et al.Evolutionary Relation Extraction for Domain Knowledge in Wikipedia[J]. Chinese Journal of Computers, 2016, 39(10): 2088-2101.)
[14] 甘丽新, 万常选, 刘德喜, 等. 基于句法语义特征的中文实体关系抽取[J].计算机研究与发展, 2016, 53(2): 284-302.
doi: 10.7544/issn1000-1239.2016.20150842
[14] (Gan Lixin, Wan Changxuan, Liu Dexi, et al.Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features[J]. Journal of Computer Research and Development, 2016, 53(2): 284-302.
doi: 10.7544/issn1000-1239.2016.20150842
[15] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[16] Bengio Y, Ducharme R, Vincent P, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155.
doi: 10.1007/3-540-33486-6_6
[17] Mikolov T, Kombrink S, Burget L.Extensions of Recurrent Neural Network Language Model[C]// Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). USA: IEEE, 2010: 1045-1048.
[18] GitHub [EB/OL]. [2017-05-16]..
[19] Google Code [EB/OL]. [2017-05-16]. .
[20] The Stanford Natural Language Group [EB/OL]. [2017-05- 16]. .
[21] Kononenko I.Estimating Attributes: Analysis and Extensions of RELIEF[C]// Proceedings of the European Conference on Machine Learning. USA: Springer, 1994: 171-182.
[22] Hall M A.Correlation-based Feature Subset Selection for Machine Learning [D]. New Zealand: The University of Waikato, 1998.
[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Li Yueyan,Xiong Huixiang,Li Xiaomin. Recommending Doctors Online Based on Combined Conditions[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[4] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[5] Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[6] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[7] Tao Xing,Zhang Xiangxian,Guo Shunli,Zhang Liman. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
[8] Ye Jiaxin,Xiong Huixiang,Jiang Wuxuan. A Physician Recommendation Algorithm Integrating Inquiries and Decisions of Patients[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[9] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[10] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[11] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[12] Li Xinlei,Wang Hao,Liu Xiaomin,Deng Sanhong. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[13] Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[14] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[15] Liu Ruilun,Ye Wenhao,Gao Ruiqing,Tang Mengjia,Wang Dongbo. Research on Text Clustering Based on Requirements of Big Data Jobs[J]. 数据分析与知识发现, 2017, 1(12): 32-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn