Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (10): 95-102    DOI: 10.11925/infotech.2096-3467.2018.0169
Current Issue | Archive | Adv Search |
Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data
Jiaheng Hu1,Yonghua Cen1(),Chengyao Wu2
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2College of Finance, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF(595 KB)   HTML
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method to construct a working sentiment dictionary for sentiment analysis in the field of finance. [Methods] Our method built a sentiment dictionary based on the characteristics of corpus and knowledge base. It also mapped the textual information into vector space using word vector method. With the help of existing general sentiment dictionary, we automatically indexed the training corpus, and created training and forecasting sets with a ratio of 9: 1. Finally, we used Python to establish the neural network classifier of deep learning, and evaluated the emotional polarity of the candidate words in the new dictionary. [Results] The accuracy of the proposed neural network classifier with the training set was 95.02%, while the accuracy with the forecasting set was 95.00%. Our results are better than the existing models. [Limitations] The method of extracting seed words could be further optimized. [Conclusions] The proposed method increases the size of corpus to train the neural network classifiers more effectively. It also extracts the emotion information from the semantic relevance of word vectors. The new sentiment dictionary provides possible directions for future research.

Key wordsSentiment Dictionary      Deep Learning      Financial Field      Word Vector      Neural Network     
Received: 09 February 2018      Published: 12 November 2018

Cite this article:

Jiaheng Hu,Yonghua Cen,Chengyao Wu. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data. Data Analysis and Knowledge Discovery, 2018, 2(10): 95-102.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0169     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I10/95

词典名称 积极词数量 消极词数量
NTUSD 2 811 8 277
清华大学李军 5 567 4 469
知网HowNet情感词典 836 1 254
DUTIR 11 229 10 783
总计 20 443 24 783
融合后的词典 16 157 19 559
交集的积极词数量 交集的消极词数量 总计
3 128 2 850 5 978
金融领域种子词集合
大涨, 大跌, 股票, 平仓, 牛市, 熊市, 走高, 拉升, 雄起, 利好, 利空, 清仓, 套牢, 抄底, 反弹, 减持, 乏力, 退市, 撤离, 亏
指标 直接利用词向量相似度 利用词向量构建分类器
准确度 0.529 0.742
精确率 0.641 0.889
召回率 0.500 0.821
F1 0.538 0.610
N的阈值 过滤得到的候选词数量(单位: 个)
10 93
20 150
30 197
40 252
[1] Smailović J, Gr?ar M, Lavra? N, et al.Stream-based Active Learning for Sentiment Analysis in the Financial Domain[J]. Information Sciences, 2014, 285(C): 181-203.
doi: 10.1016/j.ins.2014.04.034
[2] Li X, Xie H, Chen L, et al.News Impact on Stock Price Return via Sentiment Analysis[J]. Knowledge-Based Systems, 2014, 69: 14-23.
doi: 10.1016/j.knosys.2014.04.022
[3] Nguyen T H, Shirai K, Velcin J.Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[4] Wu D D, Zheng L, Olson D L.A Decision Support Approach for Online Stock Forum Sentiment Analysis[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(8): 1077-1087.
doi: 10.1109/TSMC.2013.2295353
[5] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016, 42(4): 495-511.
doi: 10.16383/j.aas.2016.c150585
[5] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511. )
[6] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004: 168-177.
[7] Strapparava C, Valitutti A.WordNet Affect: An Affective Extension of WordNet[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[8] Kamps J, Marx M, Mokken R, et al.Using WordNet to Measure Semantic Orientations of Adjectives[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[9] Hassan A, Abu-Jbara A, Jha R, et al.Identifying the Semantic Orientation of Foreign Words[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011.
[10] 柳位平, 朱艳辉, 栗春亮, 等. 中文基础情感词词典构建方法研究[J]. 计算机应用, 2009, 29(10): 2875-2877.
[10] (Liu Weiping, Zhu Yanhui, Li Chunliang, et al.Research on Building Chinese Basic Semantic Lexicon[J]. Journal of Computer Applications, 2009, 29(10): 2875-2877. )
[11] Andreevskaia A, Bergler S.Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses[C]// Proceedings the 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.
[12] Esuli A, Sebastiani F.Pageranking WordNet Synsets: An Application to Opinion Mining[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007.
[13] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006.
[14] Xia Y, Cambria E, Hussain A, et al.Word Polarity Disambiguation Using Bayesian Model and Opinion-level Features[J]. Cognitive Computation, 2015, 7(3): 369-380.
doi: 10.1007/s12559-014-9298-4
[15] 殷春霞, 彭勤科. 利用复杂网络为自由评论鉴定词汇情感倾向性[J]. 自动化学报, 2012, 38(3): 389-398.
doi: 10.3724/SP.J.1004.2012.00389
[15] (Yin Chunxia, Peng Qinke.Identifying Word Sentiment Orientation for Free Comments via Complex Network[J]. Acta Automatica Sinica, 2012, 38(3): 389-398.)
[16] Turney P D.Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL[OL]. arXiv Preprint, arXiv:cs/0212033.
[17] Turney P D.Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.
[18] Wawer A.Mining Co-occurrence Matrices for SO-PMI Paradigm Word Candidates[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 2012.
[19] Krestel R, Siersdorfer S.Generating Contextualized Sentiment Lexica Based on Latent Topics and User Ratings[C]//Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 2013: 129-138.
[20] 钟敏娟, 万常选, 刘德喜. 基于关联规则挖掘和极性分析的商品评论情感词典构建[J]. 情报学报, 2016, 35(5): 501-509.
[20] (Zhong Minjuan, Wan Changxuan, Liu Dexi.Opinion Lexicon Construction Based on Association Rule and Orientation Analysis for Production Review[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 501-509.)
[21] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
doi: 10.11896/j.issn.1002-137X.2017.01.008
[21] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J]. Computer Science, 2017, 44(1): 42-47.)
[22] 冯超, 梁循, 李亚平, 等. 基于词向量的跨领域中文情感词典构建方法[J]. 数据采集与处理, 2017, 32(3): 579-587.
[22] (Feng Chao, Liang Xun, Li Yaping, et al.Construction Method of Chinese Cross-Domain Sentiment Lexicon Based on Word Vector[J]. Journal of Data Acquisition and Processing, 2017, 32(3): 579-587.)
[23] NTUSD [EB/OL]. [2017-12-15].
[24] TSING [EB/OL]. [2017-12-15].
[25] HowNet[EB/OL]. [2017-12-15]. .
[26] DUTIR[EB/OL]. [2017-12-15]. .
[1] Bengong Yu,Peihang Zhang,Qingtang Xu. Selecting Products Based on F-BiGRU Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[2] Yuemei Xu,Sining Lv,Lianqiao Cai,Xiaoya Zhang. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[3] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[4] Xiaoyu Ma,Han Zhang,Yuhong Zhao. Building Childhood Asthma Prediction Model with Artificial Neural Network and BRFSS Database[J]. 数据分析与知识发现, 2018, 2(8): 10-15.
[5] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[6] Yanhui Xiao,Xin Wang,Wen’gang Feng,Huawei Tian,Shaozhong Wu,Lihua Li. Predicting Crime Locations Based on Long Short Term Memory and Convolutional Neural Networks[J]. 数据分析与知识发现, 2018, 2(10): 15-20.
[7] Wengang Feng,Jing Huang. Early Warning for Civil Aviation Security Checks Based on Deep Learning[J]. 数据分析与知识发现, 2018, 2(10): 46-53.
[8] Xiaoxi Huang,Hanyu Li,Rongbo Wang,Xiaohua Wang,Zhiqun Chen. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[9] Sanhong Deng,Yuyangzi Fu,Hao Wang. Multi-Label Classification of Chinese Books with LSTM Model[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[10] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[11] Jing Yan,Qiang Bi,Jie Li,Fu Wang. Construction of Aggregation Quality Predicting Model for Digital Resource in Library ——Based on Improved Genetic Algorithm and BP Neural Network[J]. 数据分析与知识发现, 2017, 1(12): 49-62.
[12] Dongsheng Zhai,Dengjin Hu,Jie Zhang,Xijun He,He Liu. Hierarchical Classification Model for Invention Patents[J]. 数据分析与知识发现, 2017, 1(12): 63-73.
[13] Lin Jiang,Dongbo Wang. Automatically Detecting and Tagging Foreign Language Citation Metadata[J]. 数据分析与知识发现, 2017, 1(1): 47-54.
[14] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[15] Ba Zhichao,Li Gang,Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. 现代图书情报技术, 2016, 32(4): 81-90.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn