Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (10): 95-102    DOI: 10.11925/infotech.2096-3467.2018.0169
Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data
Hu Jiaheng1, Cen Yonghua1(), Wu Chengyao2
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2College of Finance, Nanjing Agricultural University, Nanjing 210095, China
[Objective] This paper proposes a new method to construct a working sentiment dictionary for sentiment analysis in the field of finance. [Methods] Our method built a sentiment dictionary based on the characteristics of corpus and knowledge base. It also mapped the textual information into vector space using word vector method. With the help of existing general sentiment dictionary, we automatically indexed the training corpus, and created training and forecasting sets with a ratio of 9: 1. Finally, we used Python to establish the neural network classifier of deep learning, and evaluated the emotional polarity of the candidate words in the new dictionary. [Results] The accuracy of the proposed neural network classifier with the training set was 95.02%, while the accuracy with the forecasting set was 95.00%. Our results are better than the existing models. [Limitations] The method of extracting seed words could be further optimized. [Conclusions] The proposed method increases the size of corpus to train the neural network classifiers more effectively. It also extracts the emotion information from the semantic relevance of word vectors. The new sentiment dictionary provides possible directions for future research.

Key wordsSentiment Dictionary      Deep Learning      Financial Field      Word Vector      Neural Network     
Received: 09 February 2018      Published: 12 November 2018
Cite this article:

Hu Jiaheng,Cen Yonghua,Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data. Data Analysis and Knowledge Discovery, 2018, 2(10): 95-102.

词典名称 积极词数量 消极词数量
NTUSD 2 811 8 277
清华大学李军 5 567 4 469
知网HowNet情感词典 836 1 254
DUTIR 11 229 10 783
总计 20 443 24 783
融合后的词典 16 157 19 559
交集的积极词数量 交集的消极词数量 总计
3 128 2 850 5 978
大涨, 大跌, 股票, 平仓, 牛市, 熊市, 走高, 拉升, 雄起, 利好, 利空, 清仓, 套牢, 抄底, 反弹, 减持, 乏力, 退市, 撤离, 亏
指标 直接利用词向量相似度 利用词向量构建分类器
准确度 0.529 0.742
精确率 0.641 0.889
召回率 0.500 0.821
F1 0.538 0.610
N的阈值 过滤得到的候选词数量(单位: 个)
10 93
20 150
30 197
40 252
