Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 60-70    DOI: 10.11925/infotech.2096-3467.2020.1261
Current Issue | Archive | Adv Search |
Extracting Hypernym-Hyponym Relationship for Financial Market Applications
Dai Zhihong1(),Hao Xiaoling1,2
1School of Information Management & Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
2Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai 200433, China
Download: PDF (908 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method to extract superior-inferior relationship from knowledge graph, and then explores its effectiveness with practical application. [Methods] First, we constructed the mapping matrix for hypernym-hyponym words and their context semantics. Then, we combined word vector similarity with the matrix to extract the relation. [Results] We examined our method with datasets of listed companies and found its F1 value was more than 3% higher than those of the existing methods. The new model could help us study the association between company similarity and stock performance. [Limitations] More research is needed to improve relationship extraction with the help of clustering technique and pattern matching method. [Conclusions] The proposed method can effectively identify the relationship between entities, and study the related listed companies and stocks. It also helps us construct better knowledge graph in the financial field.

Key wordsHypernym-Hyponym Relationship      Relationship Extraction      Word Vector      Stock Linkage     
Received: 16 December 2020      Published: 27 August 2021
ZTFLH:  TP391  
Fund:National Social Science Foundation of China(20BGL287);National Natural Science Fund of China(71401096)
Corresponding Authors: Dai Zhihong, ORCID:0000-0002-3890-115X     E-mail: daizhihong@189.cn

Cite this article:

Dai Zhihong, Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications. Data Analysis and Knowledge Discovery, 2021, 5(10): 60-70.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1261     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I10/60

序号 实例
1 v - v 对虾 v - v 金鱼
2 v 工人 - v 木匠 v 演员 - v 小丑
3 v 工人 - v 木匠 v - v 金鱼
Results of Vector Shift for Hypernym-Hyponym Word Pair
Hypernym-Hyponym Extraction Method Based on Matrix Mapping and Word Vector
词语 余弦相似度 词语 余弦相似度
上海 1.000 0 上海财经大学 1.000 0
杭州 0.677 0 财经大学 0.544 3
上海市 0.676 6 复旦大学 0.540 7
苏州 0.645 4 上海交通大学 0.517 7
宁波 0.624 0 华东师范大学 0.511 0
天津 0.616 6 清华大学 0.508 3
Top Five Closest Words in the Word Vector Space
Influence of Lower Bound of Similarity and Number of Clusters on the F1 Value
参数 取值
δ 1 0.5
δ 2 6
细分区间及相似度下限 1.5 < d 2,0.15/0.2
2 < d 4,0.25/0.3
4 < d 6,0.35/0.4
聚类类别数K 60
Related Parameters
指标 δ + Simi 映射偏离 δ 词向量相似度 Simi
准确率 0.835 4 0.643 6 0.716 7
召回率 0.834 7 0.815 2 0.805 4
F1值 0.835 0 0.719 3 0.758 4
Test Set Judgment Results
方法 准确率 召回率 F1值
Hearst[4] 0.974 7 0.214 1 0.351 1
Snow[8] 0.608 8 0.256 7 0.361 1
Suchanek等[9] 0.924 1 0.606 1 0.732 0
Fu等[21] 0.797 8 0.808 1 0.802 9
本文 0.835 4 0.834 7 0.835 0
Comparison of the Method in This Paper with Previous Research Methods
分类依据 主营构成 主营收入/亿元
按行业分类 发电 960.31
铁路 364.32
煤化工
港口
185.06
29.97
航运 14.48
未分配项目
分部抵消
11.89
-388.81
按产品分类 煤炭收入
发电收入
其他收入
运输收入
煤化工收入
755.15
358.86
34.50
29.50
27.17
Main Business Composition of Shenhua China by Industry and Product
公司 行业 产品
科大讯飞 信息技术、教育 信息工程、教育软件、人机交互
四维图新 导航、芯片、应用 导航、芯片、应用
恒源煤电 工业 煤炭、电力
三友化工 化工、电、采矿业 短纤维、纯碱、聚氯乙烯
List of Extracted Industry and Product Words
互联网产品与服务实体词
网络服务 流媒体
电子商务 网络游戏
网络安全 手游
即时通信 电子邮件
网络营销 防病毒
物流 防火墙
供应链 电竞
快递 聊天工具
浏览器 电脑游戏
搜索引擎 电子竞技
Extracted Entity Words in Internet Domain
下位词 上位词 下位词 上位词
电子商务 网络服务 手游 网络游戏
网络安全 网络服务 防病毒 网络安全
即时通信 网络服务 防火墙 网络安全
物流 电子商务 网络安全 网络服务
供应链 电子商务 电子邮件 即时通信
流媒体 网络服务 电脑游戏 网络游戏
搜索引擎 网络服务 电子竞技 网络游戏
浏览器 网络服务 聊天工具 即时通信
网络游戏 网络服务
Extracted Hypernym-Hyponym Word Pairs
Hierarchical Structure of Internet Products and Services
电商板块股票 物流板块股票 日收益率
相关系数
日超额收益率相关系数
苏宁云商 顺丰控股 0.5915 0.4182
焦点科技 飞马国际 0.2612 0.1928
众应互联 韵达股份 0.2217 0.1632
Correlation Coefficients of Daily Return and Daily Excess Return of Three Groups of Stocks
[1] 陈金栋, 肖仰华. 一种基于语义的上下位关系抽取方法[J]. 计算机应用与软件, 2019, 36(2): 216-221.
[1] (Chen Jindong, Xiao Yanghua. Hypernymy Relation Extraction Based on Semantics[J]. Computer Applications and Software, 2019, 36(2): 216-221.)
[2] 钟茂生, 刘慧, 刘磊. 词汇间语义相关关系量化计算方法[J]. 中文信息学报, 2009, 32(2): 115-122.
[2] (Zhong Maosheng, Liu Hui, Liu Lei. Method of Semantic Relevance Relation Measurement Between Words[J]. Journal of Chinese Information Processing, 2009, 32(2): 115-122.)
[3] 邱科达, 马建玲. 基于文本语料的上下位关系识别研究综述[J]. 情报科学, 2020, 38(7): 162-172.
[3] (Qiu Keda, Ma Jianling. A Review of Hypernym Relation Recognition from Text Corpora[J]. Information Science, 2020, 38(7): 162-172.)
[4] Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora [C]//Proceedings of the 14th Conference on Computational Linguistics - Volume 2. 1992: 539-545.
[5] 汤青, 吕学强, 李卓. 本体概念间上下位关系抽取研究[J]. 微电子学与计算机, 2014, 31(6): 68-71.
[5] (Tang Qing, Lv Xueqiang, Li Zhuo. Research on Domain Ontology Concept Hyponymy Relation Extraction[J]. Microelectronics & Computer, 2014, 31(6): 68-71.)
[6] 李军锋, 吕学强, 李卓. 专利领域本体概念语义层次获取[J]. 情报学报, 2014, 33(9): 986-993.
[6] (Li Junfeng, Lv Xueqiang, Li Zhuo. Deriving Concept Semantic Hierarchy of Ontology in Patents[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 986-993.)
[7] 张晨童, 张佳影, 张知行, 等. 融合常用语的大规模疾病术语图谱构建[J]. 计算机研究与发展, 2020, 57(11): 2467-2477.
[7] (Zhang Chentong, Zhang Jiaying, Zhang Zhixing, et al. Construction of Large-Scale Disease Terminology Graph with Common Terms[J]. Journal of Computer Research and Development, 2020, 57(11): 2467-2477.)
[8] Snow R. Learning Syntactic Patterns for Automatic Hypernym Discovery [C]//Proceedings of the 17th International Conference on Neural Information Processing Systems. 2005: 1297-1304.
[9] Suchanek F M, Kasneci G, Weikum G. YAGO: A Large Ontology from Wikipedia and WordNet[J]. Journal of Web Semantics, 2008, 6(3): 203-217.
doi: 10.1016/j.websem.2008.06.001
[10] 陆凯华, 李正华, 张民. 汉语上下位关系分类数据集构建和基准方法比较[J]. 厦门大学学报(自然科学版), 2020, 59(6): 1004-1010.
[10] (Lu Kaihua, Li Zhenghua, Zhang Min. Data Construction and Benchmark Method Comparison for Chinese Hypernym-Hyponym Relation Classification[J]. Journal of Xiamen University (Natural Science Edition), 2020, 59(6): 1004-1010.)
[11] 丁晟春, 侯琳琳, 王颖. 基于电商数据的产品知识图谱构建研究[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[11] (Ding Shengchun, Hou Linlin, Wang Ying. Product Knowledge Map Construction Based on the E-Commerce Data[J]. Data Analysis and Knowledge Discovery, 2019, 3(3): 45-56.)
[12] 黄毅, 王庆林, 刘禹. 一种基于条件随机场的领域术语上下位关系获取方法[J]. 中南大学学报(自然科学版), 2013, 44(S2): 355-359.
[12] (Huang Yi, Wang Qinglin, Liu Yu. An Acquisition Method of Domain-Specific Terminological Hyponymy Based on CRF[J]. Journal of Central South University (Science and Technology), 2013, 44(S2): 355-359.)
[13] 马晓军, 郭剑毅, 线岩团, 等. 结合词向量和Bootstrapping的领域实体上下位关系获取与组织[J]. 计算机科学, 2018, 45(1): 67-72.
[13] (Ma Xiaojun, Guo Jianyi, Xian Yantuan, et al. Entity Hyponymy Acquisition and Organization Combining Word Embedding and Bootstrapping in Special Domain[J]. Computer Science, 2018, 45(1): 67-72.)
[14] 孙佳伟, 李正华, 陈文亮, 等. 基于词模式嵌入的词语上下位关系分类[J]. 北京大学学报(自然科学版), 2019, 55(1): 1-7.
[14] (Sun Jiawei, Li Zhenghua, Chen Wenliang, et al. Hypernym Relation Classification Based on Word Pattern[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2019, 55(1): 1-7.)
[15] 吴婷, 李明扬, 孔芳. 基于同义推理的篇章级实体上下位关系语料库构建[J]. 中文信息学报, 2020, 34(4): 38-46.
[15] (Wu Ting, Li Mingyang, Kong Fang. Construction of Textual Entity Hypernymy Corpus Based on Synonymy Reasoning[J]. Journal of Chinese Information Processing, 2020, 34(4): 38-46.)
[16] 汪诚愚, 何晓丰, 宫学庆, 等. 面向上下位关系预测的词嵌入投影模型[J]. 计算机学报, 2020, 43(5): 868-883.
[16] (Wang Chengyu, He Xiaofeng, Gong Xueqing, et al. Word Embedding Projection Models for Hypernymy Relation Prediction[J]. Chinese Journal of Computers, 2020, 43(5): 868-883.)
[17] Kotlerman L, Dagan I, Szpektor I, et al. Directional Distributional Similarity for Lexical Expansion [C]//Proceedings of ACL-IJCNLP 2009 Conference (Short Papers). 2009: 69-72.
[18] 王思丽, 祝忠明, 杨恒, 等. 基于模式和投影学习的领域概念上下位关系自动识别研究[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[18] (Wang Sili, Zhu Zhongming, Yang Heng, et al. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning[J]. Data Analysis and Knowledge Discovery, 2020, 4(11): 15-25.)
[19] 吴志祥, 王昊, 王雪颖, 等. 基于奇异值分解的专利术语层次关系解析研究[J]. 情报学报, 2017, 36(5): 473-483.
[19] (Wu Zhixiang, Wang Hao, Wang Xueying, et al. Study on Chinese Patent Terms Hierarchy Parse Based on Singular Value Decomposition[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(5): 473-483.)
[20] Kozareva Z, Hovy E. A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web [C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 1110-1118.
[21] Fu R J, Qin B, Liu T. Exploiting Multiple Sources for Open-Domain Hypernym Discovery [C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1224-1234.
[22] 刘琦, 肖仰华, 汪卫. 一种面向海量中文文本的典型类属关系识别方法[J]. 计算机工程, 2015, 41(2): 26-30.
[22] (Liu Qi, Xiao Yanghua, Wang Wei. A Recognition Approach of Typical Generic Relationship for Massive Chinese Text[J]. Computer Engineering [J]. Computer Engineering, 2015, 41(2): 26-30.)
[23] 甘丽新, 万常选, 刘德喜, 等. 基于句法语义特征的中文实体关系抽取[J]. 计算机研究与发展, 2016, 53(2): 284-302.
[23] (Gan Lixin, Wan Changxuan, Liu Dexi, et al. Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features[J]. Computer Research and Development, 2016, 53(2): 284-302.)
[24] Bansal M, Burkett D, de Melo G, et al. Structured Learning for Taxonomy Induction with Belief Propagation [C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014: 1041-1051.
[25] 段利国, 徐庆, 李爱萍, 等. 实体词语义信息对中文实体关系抽取的作用研究[J]. 计算机应用研究, 2017, 34(1): 141-146.
[25] (Duan Liguo, Xu Qing, Li Aiping, et al. Research on Effect of Entities Semantic Information on Chinese Entity Relation Extraction[J]. Computer Application Research, 2017, 34(1): 141-146.)
[26] Mikolov T, Yih S W T, Zweig G. Linguistic Regularities in Continuous Space Word Representations [C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
[27] 付瑞吉. 开放域命名实体识别及其层次化类别获取[D]. 哈尔滨: 哈尔滨工业大学, 2014.
[27] (Fu Ruiji. Open-Domain Named Entity Recognition and Hierarchical Category Acquisition[D]. Harbin: Harbin Institute of Technology, 2014.)
[28] Lai S W, Liu K, He S Z, et al. How to Generate a Good Word Embedding[J]. IEEE Intelligent Systems, 2016: 31(6): 5-14.
[1] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[2] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[3] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[4] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[5] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[6] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[7] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[8] Li Xinlei,Wang Hao,Liu Xiaomin,Deng Sanhong. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[9] Hu Jiaheng,Cen Yonghua,Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[10] Zhai Dongsheng,Hu Dengjin,Zhang Jie,He Xijun,Liu He. Hierarchical Classification Model for Invention Patents[J]. 数据分析与知识发现, 2017, 1(12): 63-73.
[11] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[12] Hu Zewen, Wang Xiaoyue, Bai Rujiang. Study on Text Classification Model Based on SUMO and WordNet Ontology Integration[J]. 现代图书情报技术, 2011, 27(1): 31-38.
[13] Fu Jibin,Liu Jie,Jia Keliang,Mao Jintao. Ontoloy Relationship Extraction Research Based on HowNet and Term Relevancy Degree[J]. 现代图书情报技术, 2008, 24(9): 36-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn