Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 15-25    DOI: 10.11925/infotech.2096-3467.2020.0299
Current Issue | Archive | Adv Search |
Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning
Wang Sili1,2(),Zhu Zhongming1,2,Yang Heng1,Liu Wei1
1Literature and Information Center of Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China
2University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF (765 KB)   HTML ( 24
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to automatically identify the hypernym-hyponym relations of domain concepts and establish their ontology. [Methods] First, we combined the traditional unsupervised pattern-based method and the advanced supervised-based projection learning method to automatically extract domain concepts. Then, we examined our new method with an empirical study. [Results] The proposed method could identify the hypernym sets of domain concepts. The identification accuracy in medical and general fields, as well as with the benchmark dataset BLESS were 0.88, 0.83, and 0.85 respectively. [Limitations] More research is needed to reduce the weight of high-frequency top-level words and improve the corpus quality. There are also some misidentified relationships. [Conclusions] The proposed model could find hypernym with different meanings for the same concept, which could also extract low-frequency words and named entities.

Key wordsHearst Pattern      Projection Learning      Word Embedding      Hypernym-Hyponym Relations      Domain Concept     
Received: 09 April 2020      Published: 04 December 2020
ZTFLH:  TP391  
Corresponding Authors: Wang Sili     E-mail: wangsl@llas.ac.cn

Cite this article:

Wang Sili,Zhu Zhongming,Yang Heng,Liu Wei. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning. Data Analysis and Knowledge Discovery, 2020, 4(11): 15-25.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0299     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I11/15

Framework of Automatic Recognition of Hypernym-Hyponym Relationship Based on Pattern and Projection Learning
英文模式 中文模式
Y such as X Y例如/比如X
Y other than X 除了Y之外的X/ Y不仅是X
Y including X Y包含X
Y especially X Y尤其/特别是X
not all Y are X 不全是/并不是所有的Y都是X
Y like X Y类似X
Y for example X Y例如/比如/示例X
Y which includes X Y是那些包含X
X are also Y X也是Y
X are all Y X都是Y
not Y so much as X 没有Y而是X
Y is a X Y是一种/个/只…X
Recognition Mode of Hypernym Based on Extended Hearst Pattern
实验方法 实验设置 实验结果
(平均精度AP
①模式 扩展Hearst模式: 分布假设 + 共同下位词识别模式 通用领域:0.38
医学领域:0.41
②投影学习 Word2Vec 100维、训练迭代次数10、单投影1、无负采样、无高频词亚采样 通用领域:0.54
医学领域:0.60
③投影学习 Word2Vec 200维、训练迭代次数20、多投影24、负采样15、高频词亚采样阈值1e-5 通用领域:0.66
医学领域:0.72
④模式 +
投影学习
扩展Hearst模式 + 训练迭代次数20、Word2Vec 200维、多投影24、负采样15、高频词亚采样阈值1e-5 通用领域:0.83
医学领域:0.88
BLESS评估集:0.85
Tests on Recognition of Hypernym-Hyponym Relationship
医学领域概念词 上位词集合(Top5)
Aneurysm(动脉瘤) procedure; clinical finding; soft tissue lesion; anatomical structure; disease
Diagnostic lumbar puncture(诊断性腰椎穿刺) clinical finding; disease; procedure; sickness; illness
Vertebra(脊椎) body region; bone; body structure; fracture; anatomical structure
Thymosin(胸腺肽) protein; biopolymer; enzyme;
hydrolase; lyase
Pain assessment(疼痛评估) pain; sickness; disease; illness;
practice of medicine
Recognition Results of Hypernym-Hyponym Relationship in Medical Field
通用领域概念词 上位词集合(Top5)
Miscreant(不法之徒) person; bad person; wrongdoer; actor; politician
Queen Elizabeth
(伊丽莎白女王)
person; king; monarch; aristocrat; patrician
Microcontroller(微控制器) electronic circuit; circuitry; pc board; computer chip; electrical device
Business concern
(商业公司/业务关注点)
corporation; business organization; government agency; business firm; written agreement
Vegetarian(素食者/素的) dessert; dish; recipe; food product; person
Recognition Results of Hypernym-Hyponym Relationship in General Field
[1] WordNet-A Lexical Database for English[DB/OL]. [2019-10-20]. https://wordnet.princeton.edu/.
[2] Cyc: Logical Reasoning with the World’s Largest Knowledge Base[DB/OL]. [2019-11-09]. http://www.cyc.com/.
[3] 程韵如. 基于维基百科的领域实体上下位关系抽取[J]. 价值工程, 2016,35(18):160-163.
[3] ( Cheng Yunru. Hyponymy Extraction of Domain Entity Based on Wikipedia[J]. Value Engineering, 2016,35(18):160-163.)
[4] 唐恩博. 基于WordNet的蒙古文名词语义网上下位语义关系树构造方法的研究[D]. 呼和浩特: 内蒙古师范大学, 2014.
[4] ( Tang Enbo. Research on Construction Method of Mongolian Noun Semantic Network Hyponymy Tree Based on WordNet[D]. Huhhot: Inner Mongolia Normal University, 2014.)
[5] Gunawan, Pranata E. Acquisition of Hypernymy-Hyponymy Relation Between Nouns for WordNet Building[C]// Proceedings of the 2010 International Conference on Asian Language Processing. 2010: 114-117.
[6] Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora[C]// Proceedings of the 14th International Conference on Computational Linguistics. 1992,2:539-545.
[7] Roller S, Katrin E K. Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 2163-2172.
[8] 刘磊, 曹存根, 王海涛, 等. 一种基于“是一个”模式的下位概念获取方法[J]. 计算机科学, 2006,33(9):146-151.
[8] ( Liu Lei, Cao Cungen, Wang Haitao, et al. A Method of Hyponym Acquisition Based on “isa” Pattern[J]. Computer Science, 2006,33(9):146-151.)
[9] 汤青, 吕学强, 李卓. 本体概念间上下位关系抽取研究[J]. 微电子学与计算机, 2014(6):68-71.
[9] ( Tang Qing, Lv Xueqiang, Li Zhuo. Research on Domain Ontology Concept Hyponymy Relation Extraction[J]. Microelectronics & Computer, 2014(6):68-71.)
[10] Geffet M, Dagan I. The Distributional Inclusion Hypotheses and Lexical Entailment[C]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005: 107-114.
[11] Kotlerman L, Dagan I, Szpektor I, et al. Directional Distributional Similarity for Lexical Inference[J]. Natural Language Engineering, 2010,16(4):359-389.
doi: 10.1017/S1351324910000124
[12] Baroni M, Lenci A. How We BLESSed Distributional Semantic Evaluation[C]// Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics. 2011: 1-10.
[13] Mei K W, Syed S R A, Ian D J. A Multi-Phase Correlation Search Framework for Mining Non-Taxonomic Relations from Unstructured Text[J]. Knowledge and Information Systems, 2014,38(3):641-667.
doi: 10.1007/s10115-012-0593-7
[14] Roller S, Erk K, Boleda G. Inclusive Yet Selective: Supervised Distributional Hypernymy Detection[C]// Proceedings of the 25th International Conference on Computational Linguistics. 2014: 1025-1036.
[15] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
[16] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[DB/OL]. [2018-12-29]. https://nlp.stanford.edu/projects/glove/.
[17] Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv: 1802. 05365.
[18] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810. 04805.
[19] Fu R J, Guo J, Qin B, et al. Learning Semantic Hierarchies via Word Embeddings[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. USA, 2014: 1199-1209.
[20] Yu Z, Wang H X, Lin X M, et al. Learning Term Embeddings for Hypernymy Identification[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015). 2015: 1390-1397.
[21] Wang C Y, He X F. Chinese Hypernym-Hyponym Extraction from User Generated Categories[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1350-1361.
[22] 余弦相似度[EB/OL]. [2019-10-15]. https://baike.baidu.com/item/余弦相似度.
[22] ( Cosine Similarity[EB/OL]. [2019-10-15]. https://baike.baidu.com/item/余弦相似度.)
[23] Yamane J, Takatani T, Yamada H, et al. Distributional Hypernym Generation by Jointly Learning Clusters and Projections[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1871-1879.
[24] Ustalov D, Arefyev N, Biemann C, et al. Negative Sampling Improves Hypernymy Extraction Based on Projection Learning[OL]. arXiv Preprint, arXiv: 1707. 03903.
[25] PubMed Data[DB/OL]. [2019-08-15]. https://www.nlm.nih.gov/databases/download/pubmed_medline.html.
[26] SnomedCT[DB/OL]. [2019-08-10]. http://browser.ihtsdotools.org/.
[27] UMBC Corpus[DB/OL]. [2019-10-25]. http://ebiquity.umbc.edu/blogger/2013/05/01/umbc-webbase-corpus-of-3b-english-words.
[28] WordNet[DB/OL]. [2019-10-25]. http://wordnetweb.princeton.edu/perl/webwn?s=dog&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=1010000000.
[29] Python Interface to Google Word2Vec[DB/OL]. [2019-08-15]. https://github.com/danielfrg/word2vec.
[30] PyTorch[DB/OL]. [2019-08-15]. https://pytorch.org/.
[31] BLESS Dataset[DB/OL]. [2019-11-27]. https://sites.google.com/site/geometricalmodels/shared-evaluation.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[3] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[4] Dai Zhihong, Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[5] Wei Tingxin,Bai Wenlei,Qu Weiguang. Sense Prediction for Chinese OOV Based on Word Embedding and Semantic Knowledge[J]. 数据分析与知识发现, 2020, 4(6): 109-117.
[6] Su Chuandong,Huang Xiaoxi,Wang Rongbo,Chen Zhiqun,Mao Junyu,Zhu Jiaying,Pan Yuhao. Identifying Chinese / English Metaphors with Word Embedding and Recurrent Neural Network[J]. 数据分析与知识发现, 2020, 4(4): 91-99.
[7] Xinyu Zai,Xuedong Tian. Retrieving Scientific Documents with Formula Description Structure and Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 131-138.
[8] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[9] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[10] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[11] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[12] Li Lin,Li Hui. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[13] Wang Tingting,Han Man,Wang Yu. Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. 数据分析与知识发现, 2018, 2(1): 29-40.
[14] Zhang Qin,Guo Hongmei,Zhang Zhixiong. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[15] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn