Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 68-78    DOI: 10.11925/infotech.2096-3467.2022.0009
Quantifying Logical Relations of Financial Risks with BERT and Mutual Information
Jia Minghua1,2(),Wang Xiuli1,3
1School of Information, Central University of Finance and Economics, Beijing 102206, China
2Peking University Library, Beijing 100871, China
3Engineering Research Center of State Financial Security, Ministry of Education, Beijing 102206, China
[Objective] This paper tries to prevent and control financial risks by quantifying their logical relationship, which also improve the reliability of processing word frequency of financial events. [Methods] We proposed a quantitative analysis method for the logical relation of financial risks based on BERT and mutual information combined with domain knowledge. Then, we quantified the relations with COPA and financial data sets. [Results] The proposed model effectively addressed the issue of unreliable quantization of word frequency. Its accuracy reached 80.1%, which was 3.1%~37.4% higher than the benchmark models. [Limitations] More research is needed to examine our new model with non-financial and other corpora. [Conclusions] Our new method can reveal the evolutionary path of financial risk events and improve the effect quantitative presentation of their logical relationship.

Key wordsFinancial Risk      Relationship Quantization      Domain Knowledge      BERT      Mutual Information     
Received: 05 January 2022      Published: 16 November 2022
ZTFLH:  TP391  
Corresponding Authors: Jia Minghua,ORCID:0000-0003-0859-7502

Cite this article:

Jia Minghua, Wang Xiuli. Quantifying Logical Relations of Financial Risks with BERT and Mutual Information. Data Analysis and Knowledge Discovery, 2022, 6(10): 68-78.

URL:     OR

模型 核心思想 细分模型
BERT[17] 采用Transformer编码器,包含编码器(Encoder)和解码器(Decoder)两部分 BERT-base-uncased
XLNet[18] 改进Transformer结构为Transformer-XL XLNet-base-cased
RoBERTa[19] 沿用BERT基础模型,优化掩藏语言模型[20] RoBERTa-base
ERNIE[21-22] 优化掩藏语言模型和相邻句预测 ERNIE(Baidu)
ALBERT[23] 优化相邻句预测 ALBERT-base
Comparison of Common BERT Models
Quantization Model of Event Relationship Based on BERT and Mutual Information
BERT Model
Embedding Representation of BERT
Transformer Encoding Unit
类型 文本内容
Premise: The man broke his toe. What was the CAUSE of this?
Alternative 1: He got a hole in his sock.
Alternative 2: He dropped a hammer on his foot.
Premise: I tipped the bottle. What happened as a RESULT?
Alternative 1: The liquid in the bottle froze.
Alternative 2: The liquid in the bottle poured out.
Premise: I knocked on my neighbor's door. What happened as a RESULT?
Alternative 1: My neighbor invited me in.
Alternative 2: My neighbor left his house.
Data Example of COPA
编号 抽象主题事件 泛化事件数 结果事件数 原因事件数
E1 货币超发 3 10 10
E2 股市大跌 3 10 10
E3 美联储加息 3 10 10
E4 人民币升值 3 10 10
E5 人民币贬值 3 10 10
E6 中美贸易摩擦 3 10 3
E7 英国脱欧 3 10 1
E8 股市上涨 3 10 10
Abstract Topic Events
编号 一因多果 由果溯因
方法A 方法B 方法A 方法B
E1 0.73 1.00 0.59 1.00
E2 0.73 1.00 0.75 1.00
E3 0.95 1.00 0.92 1.00
E4 0.79 1.00 0.85 1.00
E5 0.88 1.00 0.79 1.00
E6 0.86 1.00 0.37 1.00
E7 0.59 1.00 0.04 1.00
E8 0.58 1.00 0.69 1.00
Comparison Results of Relational Quantization Values
The Distribution of Relational Quantization Values for Reasoning from Cause to Effect
The Distribution of Relational Quantization Values for Finding the Cause by the Effect
模型 参数量 模型
批大小 词表
BERT-base-uncased 108M 12 768 16 30 522
BERT-large-uncased 334M 24 1 024 4 30 522
RoBERTa-base 123M 12 768 16 30 522
RoBERTa-large 355M 24 1 024 4 50 265
ALBERT-base 12M 12 768 32 30 000
ALBERT-large 18M 24 1 024 12 30 522
The Parameter Setting of BERT Model
实验方法 Test Set Dev Set Dev + Test
协方差* 50.2% 49.0% 49.6%
共现频率 50.0% 51.8% 50.9%
互信息 57.8% 58.8% 58.3%
BERT-base-uncased+PMI 58.2% 62.0% 60.1%
BERT-large-uncased+PMI 71.6% 68.6% 70.1%
RoBERTa-base+PMI 71.4% 76.8% 74.1%
RoBERTa-large+PMI 68.8% 70.6% 69.7%
ALBERT-base+PMI 57.6% 58.4% 58.0%
ALBERT-large+PMI 78.8% 81.4% 80.1%
Accuracy Results of Relational Quantization Reasoning Tasks on COPA
