Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (10): 95-102    DOI: 10.11925/infotech.2096-3467.2018.0169
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的领域情感词典自动构建*——以金融领域为例
胡家珩1,岑咏华1(),吴承尧2
1南京理工大学经济管理学院 南京 210094
2南京农业大学金融学院 南京 210095
Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data
Jiaheng Hu1,Yonghua Cen1(),Chengyao Wu2
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2College of Finance, Nanjing Agricultural University, Nanjing 210095, China
全文: PDF(595 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】为特定领域情感分析任务构建一个适用的情感词典。【方法】以金融领域为例, 结合语料库和知识库的特点, 提出一种全新的构建情感词典的方法: 利用词向量方法将文本信息映射到向量空间, 借助已有的通用情感词典, 自动标引训练语料, 按照9:1的比例构建训练集和预测集。使用Python构建深度神经网络分类器, 判断特定领域候选情感词的情感极性, 构成情感词典。【结果】本文构建的神经网络分类器的训练集准确度为95.02%, 预测集准确度为95.00%, 同时证明了利用本文方法所构建的情感词典在金融领域中的表现优于其他已有方法。【局限】抽取种子词的方法需要进一步优化。【结论】本文方法解决了训练神经网络分类器中训练语料不足的问题, 同时解决了词向量的语义相关性无法区分情感信息的问题。在构建面向特定领域情感词典上具有较好的表现, 为该领域其他研究提供参考依据。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
胡家珩
岑咏华
吴承尧
关键词 情感词典深度学习金融领域词向量神经网络    
Abstract

[Objective] This paper proposes a new method to construct a working sentiment dictionary for sentiment analysis in the field of finance. [Methods] Our method built a sentiment dictionary based on the characteristics of corpus and knowledge base. It also mapped the textual information into vector space using word vector method. With the help of existing general sentiment dictionary, we automatically indexed the training corpus, and created training and forecasting sets with a ratio of 9: 1. Finally, we used Python to establish the neural network classifier of deep learning, and evaluated the emotional polarity of the candidate words in the new dictionary. [Results] The accuracy of the proposed neural network classifier with the training set was 95.02%, while the accuracy with the forecasting set was 95.00%. Our results are better than the existing models. [Limitations] The method of extracting seed words could be further optimized. [Conclusions] The proposed method increases the size of corpus to train the neural network classifiers more effectively. It also extracts the emotion information from the semantic relevance of word vectors. The new sentiment dictionary provides possible directions for future research.

Key wordsSentiment Dictionary    Deep Learning    Financial Field    Word Vector    Neural Network
收稿日期: 2018-02-09     
基金资助:*本文系国家自然科学基金项目“投资者有限关注与证券市场监管: 基于大数据和计算实验的方法”(项目编号: 71503130)、国家自然科学基金项目“社会化影响下个体信息认知处理中的扭曲与偏见机制研究”(项目编号: 71471089)和国家社会科学基金重大项目“面向知识创新服务的数据科学理论与方法研究”(项目编号: 16ZDA224)的研究成果之一
引用本文:   
胡家珩,岑咏华,吴承尧. 基于深度学习的领域情感词典自动构建*——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
Jiaheng Hu,Yonghua Cen,Chengyao Wu. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0169.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0169
图1  领域词典构建流程
词典名称 积极词数量 消极词数量
NTUSD 2 811 8 277
清华大学李军 5 567 4 469
知网HowNet情感词典 836 1 254
DUTIR 11 229 10 783
总计 20 443 24 783
融合后的词典 16 157 19 559
表1  通用情感词典情况
图2  神经网络分类器的网络结构
交集的积极词数量 交集的消极词数量 总计
3 128 2 850 5 978
表2  词典中的词出现在语料库中的情况表
金融领域种子词集合
大涨, 大跌, 股票, 平仓, 牛市, 熊市, 走高, 拉升, 雄起, 利好, 利空, 清仓, 套牢, 抄底, 反弹, 减持, 乏力, 退市, 撤离, 亏
表3  金融领域种子词集合
指标 直接利用词向量相似度 利用词向量构建分类器
准确度 0.529 0.742
精确率 0.641 0.889
召回率 0.500 0.821
F1 0.538 0.610
表4  不同方法构建的情感词典性能表
N的阈值 过滤得到的候选词数量(单位: 个)
10 93
20 150
30 197
40 252
表5  候选词数量情况表
图3  不同阈值下神经网络分类器和SVM分类器构建的情感词典的评价指标
[1] Smailović J, Gr?ar M, Lavra? N, et al.Stream-based Active Learning for Sentiment Analysis in the Financial Domain[J]. Information Sciences, 2014, 285(C): 181-203.
doi: 10.1016/j.ins.2014.04.034
[2] Li X, Xie H, Chen L, et al.News Impact on Stock Price Return via Sentiment Analysis[J]. Knowledge-Based Systems, 2014, 69: 14-23.
doi: 10.1016/j.knosys.2014.04.022
[3] Nguyen T H, Shirai K, Velcin J.Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[4] Wu D D, Zheng L, Olson D L.A Decision Support Approach for Online Stock Forum Sentiment Analysis[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(8): 1077-1087.
doi: 10.1109/TSMC.2013.2295353
[5] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016, 42(4): 495-511.
doi: 10.16383/j.aas.2016.c150585
(Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511. )
[6] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004: 168-177.
[7] Strapparava C, Valitutti A.WordNet Affect: An Affective Extension of WordNet[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[8] Kamps J, Marx M, Mokken R, et al.Using WordNet to Measure Semantic Orientations of Adjectives[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[9] Hassan A, Abu-Jbara A, Jha R, et al.Identifying the Semantic Orientation of Foreign Words[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011.
[10] 柳位平, 朱艳辉, 栗春亮, 等. 中文基础情感词词典构建方法研究[J]. 计算机应用, 2009, 29(10): 2875-2877.
(Liu Weiping, Zhu Yanhui, Li Chunliang, et al.Research on Building Chinese Basic Semantic Lexicon[J]. Journal of Computer Applications, 2009, 29(10): 2875-2877. )
[11] Andreevskaia A, Bergler S.Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses[C]// Proceedings the 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006.
[12] Esuli A, Sebastiani F.Pageranking WordNet Synsets: An Application to Opinion Mining[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007.
[13] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006.
[14] Xia Y, Cambria E, Hussain A, et al.Word Polarity Disambiguation Using Bayesian Model and Opinion-level Features[J]. Cognitive Computation, 2015, 7(3): 369-380.
doi: 10.1007/s12559-014-9298-4
[15] 殷春霞, 彭勤科. 利用复杂网络为自由评论鉴定词汇情感倾向性[J]. 自动化学报, 2012, 38(3): 389-398.
doi: 10.3724/SP.J.1004.2012.00389
(Yin Chunxia, Peng Qinke.Identifying Word Sentiment Orientation for Free Comments via Complex Network[J]. Acta Automatica Sinica, 2012, 38(3): 389-398.)
[16] Turney P D.Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL[OL]. arXiv Preprint, arXiv:cs/0212033.
[17] Turney P D.Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.
[18] Wawer A.Mining Co-occurrence Matrices for SO-PMI Paradigm Word Candidates[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 2012.
[19] Krestel R, Siersdorfer S.Generating Contextualized Sentiment Lexica Based on Latent Topics and User Ratings[C]//Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 2013: 129-138.
[20] 钟敏娟, 万常选, 刘德喜. 基于关联规则挖掘和极性分析的商品评论情感词典构建[J]. 情报学报, 2016, 35(5): 501-509.
(Zhong Minjuan, Wan Changxuan, Liu Dexi.Opinion Lexicon Construction Based on Association Rule and Orientation Analysis for Production Review[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 501-509.)
[21] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
doi: 10.11896/j.issn.1002-137X.2017.01.008
(Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J]. Computer Science, 2017, 44(1): 42-47.)
[22] 冯超, 梁循, 李亚平, 等. 基于词向量的跨领域中文情感词典构建方法[J]. 数据采集与处理, 2017, 32(3): 579-587.
(Feng Chao, Liang Xun, Li Yaping, et al.Construction Method of Chinese Cross-Domain Sentiment Lexicon Based on Word Vector[J]. Journal of Data Acquisition and Processing, 2017, 32(3): 579-587.)
[23] NTUSD [EB/OL]. [2017-12-15].
[24] TSING [EB/OL]. [2017-12-15].
[25] HowNet[EB/OL]. [2017-12-15]. .
[26] DUTIR[EB/OL]. [2017-12-15]. .
[1] 余本功,张培行,许庆堂. 基于F-BiGRU情感分析的产品选择方法*[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[2] 徐月梅,吕思凝,蔡连侨,张小娅. 结合卷积神经网络和Topic2Vec的新闻主题演变分析*[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[3] 李心蕾,王昊,刘小敏,邓三鸿. 面向微博短文本分类的文本向量化方法比较研究*[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[4] 马晓宇,张晗,赵玉虹. 基于BRFSS数据库应用人工神经网络构建儿童哮喘预测模型*[J]. 数据分析与知识发现, 2018, 2(8): 10-15.
[5] 陆伟,罗梦奇,丁恒,李信. 深度学习图像标注与用户标注比较研究*[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[6] 肖延辉,王欣,冯文刚,田华伟,吴绍忠,李丽华. 基于长短记忆型卷积神经网络的犯罪地理位置预测方法*[J]. 数据分析与知识发现, 2018, 2(10): 15-20.
[7] 冯文刚,黄静. 基于深度学习的民航安检和航班预警研究*[J]. 数据分析与知识发现, 2018, 2(10): 46-53.
[8] 黄孝喜,李晗雨,王荣波,王小华,谌志群. 基于卷积神经网络与SVM分类器的隐喻识别*[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[9] 邓三鸿,傅余洋子,王昊. 基于LSTM模型的中文图书多标签分类研究*[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[10] 首欢容,邓淑卿,徐健. 基于情感分析的网络谣言识别方法*[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
[11] 陈二静,姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[12] 夏天. 词向量聚类加权TextRank的关键词抽取*[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[13] 程翠琼,徐健. 面向网络游记时间特征的情感分析模型*[J]. 数据分析与知识发现, 2017, 1(2): 87-95.
[14] 闫晶,毕强,李洁,王福. 图书馆数字资源聚合质量预测模型构建*——基于改进遗传算法和BP神经网络[J]. 数据分析与知识发现, 2017, 1(12): 49-62.
[15] 翟东升,胡等金,张杰,何喜军,刘鹤. 专利发明等级分类建模技术研究*[J]. 数据分析与知识发现, 2017, 1(12): 63-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn