Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (10): 95-102    DOI: 10.11925/infotech.2096-3467.2018.0169
Current Issue | Archive | Adv Search |
Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data
Hu Jiaheng1, Cen Yonghua1(), Wu Chengyao2
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2College of Finance, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (595 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      

[Objective] This paper proposes a new method to construct a working sentiment dictionary for sentiment analysis in the field of finance. [Methods] Our method built a sentiment dictionary based on the characteristics of corpus and knowledge base. It also mapped the textual information into vector space using word vector method. With the help of existing general sentiment dictionary, we automatically indexed the training corpus, and created training and forecasting sets with a ratio of 9: 1. Finally, we used Python to establish the neural network classifier of deep learning, and evaluated the emotional polarity of the candidate words in the new dictionary. [Results] The accuracy of the proposed neural network classifier with the training set was 95.02%, while the accuracy with the forecasting set was 95.00%. Our results are better than the existing models. [Limitations] The method of extracting seed words could be further optimized. [Conclusions] The proposed method increases the size of corpus to train the neural network classifiers more effectively. It also extracts the emotion information from the semantic relevance of word vectors. The new sentiment dictionary provides possible directions for future research.

Key wordsSentiment Dictionary      Deep Learning      Financial Field      Word Vector      Neural Network     
Received: 09 February 2018      Published: 12 November 2018
ZTFLH:  G202 F832.5  

Cite this article:

Hu Jiaheng,Cen Yonghua,Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data. Data Analysis and Knowledge Discovery, 2018, 2(10): 95-102.

URL:     OR

词典名称 积极词数量 消极词数量
NTUSD 2 811 8 277
清华大学李军 5 567 4 469
知网HowNet情感词典 836 1 254
DUTIR 11 229 10 783
总计 20 443 24 783
融合后的词典 16 157 19 559
交集的积极词数量 交集的消极词数量 总计
3 128 2 850 5 978
大涨, 大跌, 股票, 平仓, 牛市, 熊市, 走高, 拉升, 雄起, 利好, 利空, 清仓, 套牢, 抄底, 反弹, 减持, 乏力, 退市, 撤离, 亏
指标 直接利用词向量相似度 利用词向量构建分类器
准确度 0.529 0.742
精确率 0.641 0.889
召回率 0.500 0.821
F1 0.538 0.610
N的阈值 过滤得到的候选词数量(单位: 个)
10 93
20 150
30 197
40 252
[1] Smailović J, Grčar M, Lavrač N, et al.Stream-based Active Learning for Sentiment Analysis in the Financial Domain[J]. Information Sciences, 2014, 285(C): 181-203.
doi: 10.1016/j.ins.2014.04.034
[2] Li X, Xie H, Chen L, et al.News Impact on Stock Price Return via Sentiment Analysis[J]. Knowledge-Based Systems, 2014, 69: 14-23.
doi: 10.1016/j.knosys.2014.04.022
[3] Nguyen T H, Shirai K, Velcin J.Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[4] Wu D D, Zheng L, Olson D L.A Decision Support Approach for Online Stock Forum Sentiment Analysis[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(8): 1077-1087.
doi: 10.1109/TSMC.2013.2295353
[5] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016, 42(4): 495-511.
doi: 10.16383/j.aas.2016.c150585
[5] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511. )
doi: 10.16383/j.aas.2016.c150585
[6] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004: 168-177.
[7] Strapparava C, Valitutti A.WordNet Affect: An Affective Extension of WordNet[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[8] Kamps J, Marx M, Mokken R, et al.Using WordNet to Measure Semantic Orientations of Adjectives[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[9] Hassan A, Abu-Jbara A, Jha R, et al.Identifying the Semantic Orientation of Foreign Words[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011.
[10] 柳位平, 朱艳辉, 栗春亮, 等. 中文基础情感词词典构建方法研究[J]. 计算机应用, 2009, 29(10): 2875-2877.
[10] (Liu Weiping, Zhu Yanhui, Li Chunliang, et al.Research on Building Chinese Basic Semantic Lexicon[J]. Journal of Computer Applications, 2009, 29(10): 2875-2877. )
[11] Andreevskaia A, Bergler S.Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses[C]// Proceedings the 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.
[12] Esuli A, Sebastiani F.Pageranking WordNet Synsets: An Application to Opinion Mining[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007.
[13] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006.
[14] Xia Y, Cambria E, Hussain A, et al.Word Polarity Disambiguation Using Bayesian Model and Opinion-level Features[J]. Cognitive Computation, 2015, 7(3): 369-380.
doi: 10.1007/s12559-014-9298-4
[15] 殷春霞, 彭勤科. 利用复杂网络为自由评论鉴定词汇情感倾向性[J]. 自动化学报, 2012, 38(3): 389-398.
doi: 10.3724/SP.J.1004.2012.00389
[15] (Yin Chunxia, Peng Qinke.Identifying Word Sentiment Orientation for Free Comments via Complex Network[J]. Acta Automatica Sinica, 2012, 38(3): 389-398.)
doi: 10.3724/SP.J.1004.2012.00389
[16] Turney P D.Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL[OL]. arXiv Preprint, arXiv:cs/0212033.
[17] Turney P D.Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.
[18] Wawer A.Mining Co-occurrence Matrices for SO-PMI Paradigm Word Candidates[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 2012.
[19] Krestel R, Siersdorfer S.Generating Contextualized Sentiment Lexica Based on Latent Topics and User Ratings[C]//Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 2013: 129-138.
[20] 钟敏娟, 万常选, 刘德喜. 基于关联规则挖掘和极性分析的商品评论情感词典构建[J]. 情报学报, 2016, 35(5): 501-509.
[20] (Zhong Minjuan, Wan Changxuan, Liu Dexi.Opinion Lexicon Construction Based on Association Rule and Orientation Analysis for Production Review[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 501-509.)
[21] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
doi: 10.11896/j.issn.1002-137X.2017.01.008
[21] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J]. Computer Science, 2017, 44(1): 42-47.)
doi: 10.11896/j.issn.1002-137X.2017.01.008
[22] 冯超, 梁循, 李亚平, 等. 基于词向量的跨领域中文情感词典构建方法[J]. 数据采集与处理, 2017, 32(3): 579-587.
[22] (Feng Chao, Liang Xun, Li Yaping, et al.Construction Method of Chinese Cross-Domain Sentiment Lexicon Based on Word Vector[J]. Journal of Data Acquisition and Processing, 2017, 32(3): 579-587.)
[23] NTUSD [EB/OL]. [2017-12-15].
[24] TSING [EB/OL]. [2017-12-15].
[25] HowNet[EB/OL]. [2017-12-15]. .
[26] DUTIR[EB/OL]. [2017-12-15]. .
[1] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[3] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[4] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[5] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[6] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[7] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[8] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[9] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[10] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[11] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[12] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[13] Feng Yong,Liu Yang,Xu Hongyan,Wang Rongbing,Zhang Yonggang. Recommendation Model Incorporating Neighbor Reviews for GRU Products[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[14] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[15] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938