Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (7): 126-138     https://doi.org/10.11925/infotech.2096-3467.2020.0907
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于CNN-BiLSTM多特征融合的股票走势预测模型*
徐月梅1(),王子厚2,吴子歆1
1北京外国语大学信息科学与技术学院 北京 100089
2国家计算机网络应急技术处理协调中心 北京 100029
Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model
Xu Yuemei1(),Wang Zihou2,Wu Zixin1
1School of Information Science and Technology, Beijing Foreign Studies of University, Beijing 100089, China
2National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
全文: PDF (1330 KB)   HTML ( 29
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 在传统基于股市数值分析的基础上,研究新闻对股票市场的影响,提高股票走势预测的准确率。【方法】 引入卷积神经网络和双向长短时记忆模型挖掘财经新闻中的新闻事件类型和新闻情感倾向,提出一种深度融合股市财务数据、新闻事件特征及新闻情感特征的股票预测模型。为了验证所提模型对不同行业个股走势的可行性,分别选取家用电器行业和通信行业的两只股票作为实验对象。【结果】 引入新闻事件和情感特征后,模型的预测准确率进一步提升,家用电器行业准确率提高了11.6%,通信行业准确率提高了25.6%。【局限】 模型未考虑不同预测周期对股票预测的影响。【结论】 引入新闻事件类型和情感倾向能够提高股票走势预测的性能。本文评估影响股票走势的因素,并对影响股票走势预测的特征重要性进行排序。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
徐月梅
王子厚
吴子歆
关键词 深度学习特征融合情感倾向股票走势    
Abstract

[Objective] Based on the traditional financial data analysis, this paper explores the impacts of online news on stock market, aiming to improve the accuracy of predicting stock trends. [Methods] First, we used the Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM) to extract news events and their sentiment orientations. Then, we proposed a prediction model for stock trends, which combines the stock numerical data and the news event sentiments. Finally, we examined the feasibility of this model with two individual stocks (GREE Electric Appliance in the household appliance industry and ZTE in the electronic appliance industry). [Results] The prediction accuracy of our model was 11.6% and 25.6% higher than the exiting algorithms. [Limitations] We did not evaluate the impacts of prediction period on the performance of the proposed model. [Conclusions] The news events and their sentiment orientations could lead to the fluctuation of stock prices.

Key wordsDeep Learning    Feature Combination    Sentiment Analysis    Stock Trends
收稿日期: 2020-09-15      出版日期: 2021-04-15
ZTFLH:  TP393  
基金资助:*北京外国语大学一流学科建设项目(YY19ZZA012)
通讯作者: 徐月梅,ORCID:0000-0002-0223-7146     E-mail: xuyuemei@bfsu.edu.cn
引用本文:   
徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model. Data Analysis and Knowledge Discovery, 2021, 5(7): 126-138.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0907      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I7/126
Fig.1  融合新闻事件和情感特征的股票走势预测流程
D 1 D 2 D j D p
T 1 $\bar{d}_{11}$ $\bar{d}_{12}$ $\bar{d}_{1j}$ $\bar{d}_{1p}$
T i $\bar{d}_{i1}$ $\bar{d}_{i2}$ $\bar{d}_{ij}$ $\bar{d}_{ip}$
T n $\bar{d}_{n1}$ $\bar{d}_{n2}$ $\bar{d}_{nj}$ $\bar{d}_{np}$
Table 1  股票财务特征矩阵
事件类别 事件名称
交易类 停牌 复牌 资金流入 资金流出 大宗交易 股价倒挂 创新高
股权类 挂牌 借壳 举牌 收购并购 资产重组 资产冻结 股权转让
投融资类 投资 投建 中标 发行债券 发行股票 可转债 募资 质押 分红
公司事务类 注册资本变更 快速发展 战略合作 拓展业务 高管减持或离职
外部事件类 登上龙虎榜 交易所处罚 评级利好 评级下调 政策利好
Table 2  部分新闻事件类型(部分)
S 1 S 2 S j S q
T 1 s 11 s 12 s 1 j s 1 q
T i s i 1 s i 2 s ij s iq
T n s n 1 s n 2 s nj s nq
Table 3  新闻事件特征矩阵
Fig.2  基于Bi-LSTM的新闻情感分析模型
Fig.3  股票预测模型采样周期示意图
参数 CNN Bi-LSTM
词向量维度 300 300
卷积核个数 96 Null
卷积核大小 3,4,5 Null
Dropout 0.5 0.5
Batch_size 128 128
迭代次数 10 20
标题截取长度 Null 15
单层LSTM神经元个数 Null [256,256]
Table 4  模型参数设置
数据集

模型
SVM Maxent CNN
训练集 90.8% 72.0% 93.0%
测试集 85.2% 69.4% 87.7%
Table 5  新闻事件分类精确率对比
分类性能较好的新闻事件
类型示例
分类性能较差的新闻事件
类型示例
新闻事件 精确率 召回率 F 1 新闻事件 精确率 召回率 F 1
登上龙虎榜 1.00 1.00 1.00 业绩下降 0.64 0.58 0.61
停牌 0.98 1.00 0.99 政策利好 0.81 0.65 0.72
工商变更 1.00 1.00 1.00 资本变更 1.00 0.22 0.36
中标 1.00 1.00 1.00 聘请高管 0.50 0.40 0.44
可转债 0.97 0.97 0.97 业绩增长 0.68 0.73 0.71
质押 1.00 1.00 1.00 预计下滑 0.67 0.61 0.64
交易所问询 0.94 1.00 0.97 利差消息 0.42 0.47 0.44
退市 1.00 1.00 1.00 利好消息 0.46 0.65 0.54
Table 6  新闻事件分类的性能统计表
数据集 SVM精确率 Maxent精确率 Bi-LSTM精确率
训练集 86.6% 82.8% 99.0%
测试集 81.1% 76.1% 91.0%
Table 7  新闻情感分类精确率对比
股票 采用财务特征的LSTM 引入新闻事件的LSTM 新闻事件/情感融合的
GBDT
新闻事件/情感融合的LSTM
格力电器 0.699 8 0.754 5 0.625 7 0.781 2
中兴通讯 0.646 7 0.785 1 0.654 5 0.812 7
Table 8  不同模型的股票走势预测精确率对比
Fig.4  格力电器(000651.SZ)走势预测示例
Fig.5  中兴通讯(000063.SZ)走势预测示例
Fig.6  GBDT对不同特征的重要性排序结果
Fig.7  涨跌幅阈值对股票走势预测模型的影响
[1] Sun X Q, Shen H W, Cheng X Q, Trading Network Predicts Stock Price[J]. Scientific Reports, 2014, 4:Article No. 3711.
doi: 10.1038/srep03711
[2] Adebiyi A A, Adewumi A O, Ayo C K. Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction[J]. Journal of Applied Mathematics, 2014: Article No. 614342.
[3] Birz G. Stale Economic News, Media and the Stock Market[J]. Journal of Economic Psychology, 2017, 61(3):384-412.
[4] Chong E, Han C, Park F C. Deep Learning Networks for Stock Market Analysis and Prediction: Methodology, Data Representations, and Case Studies[J]. Expert Systems with Applications, 2017, 83:187-205.
doi: 10.1016/j.eswa.2017.04.030
[5] Akita R, Yoshihara A, Matsubara T. Deep Learning for Stock Prediction Using Numerical and Textual Information[C]// Proceedings of IEEE/ACIS 15th International Conference on Computer and Information Science. IEEE, 2016: 978-984.
[6] Usmani M, Adil S H, Raza K. Stock Market Prediction Using Machine Learning Classifiers and Social Media, News[C]// Proceedings of the 3rd International Conference on Computer and Information Sciences. IEEE, 2016: 322-327.
[7] Bollen J, Mao H, Zeng X. Twitter Mood Predicts the Stock Market[J]. Journal of Computational Science, 2011, 2(1):1-8.
doi: 10.1016/j.jocs.2010.12.007
[8] 李洁, 林永峰. 基于多时间尺度RNN的时序数据预测[J]. 计算机应用与软件, 2018, 35(7):33-37, 62.
[8] (Li Jie, Lin Yongfeng. Prediction of Time Series Data Based on Multi-time Scale RNN[J]. Computer Application and Software, 2018, 35(7):33-37, 62.)
[9] Zhang Y S, Yang S T. Prediction on the Highest Price of the Stock Based on PSO-LSTM Neural Network[C]// Proceedings of the 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE). 2019: 1565-1569.
[10] Althelaya K A, El-Alfy E M, Mohammed S. Stock Market Forecast Using Multivariate Analysis with Bidirectional and Stacked (LSTM, GRU)[C]// Proceedings of the 21st Saudi Computer Society National Computer Conference (NCC). 2018:1301-1307.
[11] Zhao Z Y, Rao R N, Tu S X, et al. Time-Weighted LSTM Model with Redefined Labeling for Stock Trend Prediction[C]// Proceedings of the 29th International Conference on Tools with Artificial Intelligence. 2017: 1210-1217.
[12] 孔翔宇, 毕秀春, 张曙光. 财经新闻与股市预测—基于数据挖掘技术的实证分析[J]. 数理统计与管理, 2016, 35(2):215-224.
[12] (Kong Xiangyu, Bi Xiuchun, Zhang Shuguang. Financial News and Prediction for Stock Market: An Empirical Analysis Based on Data Mining Techniques[J]. Journal of Applied Statistics and Management, 2016, 35(2):215-224.)
[13] Oncharoen P, Vateekul P. Deep Learning for Stock Market Prediction Using Event Embedding and Technical Indicators[C]// Proceeding of the 5th International Conference on Advanced Informatics: Concept Theory and Applications. 2018: 19-24.
[14] Tsai C F, Lin Y C, Yen D C, et al. Predicting Stock Returns by Classifier Ensembles[J]. Applied Soft Computing, 2011, 11(2):2452-2459.
doi: 10.1016/j.asoc.2010.10.001
[15] 张梦吉, 杜婉钰, 郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5):11-17.
[15] (Zhang Mengji, Du Wanyu, Zheng Nan. Predicting Stock Trends Based on News Event[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):11-17.)
[16] Luan Y D, Lin S F. Research on Text Classification Based on CNN and LSTM[C]// Proceedings of 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). 2019: 435-442.
[17] Yadav A, Vishwakarma D K. Sentiment Analysis Using Deep Learning Architectures: A Review[J]. Artificial Intelligence Review, 2020, 53:4335-4385.
doi: 10.1007/s10462-019-09794-5
[18] Rapach D E, Zhou G F. Forecasting Stock Returns[J]. Handbook of Economic Forecasting, 2013, 2:327-383.
[19] Chen K, Zhou Y, Dai F Y. A LSTM-Based Method for Stock Returns Prediction: A Case Study of China Stock Market[C]// Proceeding of 2015 IEEE International Conference on Big Data. 2015: 2823-2824.
[20] Gandhmal D P, Kumar K. Systematic Analysis and Review of Stock Market Prediction Techniques[J]. Computer Science Review, 2019, 34:100190.
doi: 10.1016/j.cosrev.2019.08.001
[21] 张玉川, 张作泉, 黄珍. 支持向量机在选择优质股票中的应用[J]. 统计与决策, 2008(4):163-165.
[21] (Zhang Yuchuan, Zhang Zuoquan, Huang Zhen. Application of Support Vector Machine in Selecting High Quality Stock[J]. Statistics and Decision, 2008(4):163-165.)
[22] Kara Y, Boyacioglu M A, Bayken K. Predicting Direction of Stock Price Index Movement Using Artificial Neural Networks and Support Vector Machines: The Sample of the Istanbul Stock Exchange[J]. Expert Systems with Applications, 2011, 38(5):5311-5319.
doi: 10.1016/j.eswa.2010.10.027
[23] Wilkinson N. An Introduction to Behavioral Economics[M]. Palgrave Macmillan, 2008.
[24] Bruce J V, Adrian G, Geoff H. Do News and Sentiment Play a Role in Stock Price Prediction?[J]. Applied Intelligence, 2019, 49:3815-3820.
doi: 10.1007/s10489-019-01458-9
[25] Chen W, Zhang Y, Yeo C K, et al. Stock Market Prediction Using Neural Networks Through News on Online Social Networks[C]// Proceeding of 2017 International Smart Cities Conference. 2017:23-29.
[26] Vargas M R, dos Anjos C E M, Bichara G L G, et al. Deep Learning for Stock Market Prediction Using Technical Indicators and Financial News Articles[C]// Proceeding of 2018 International Joint Conference on Neural Networks. DOI: 10.1109/IJCNN.2018.8489208.
doi: 10.1109/IJCNN.2018.8489208
[27] 岑咏华, 谭志浩, 吴承尧. 财经媒介信息对股票市场的影响研究: 基于情感分析的实证[J]. 数据分析与知识发现, 2019, 3(9):98-114.
[27] (Cen Yonghua, Tan Zhihao, Wu Chengyao. Impact of Financial Media Information on Stock Market: An Empirical Study of Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):98-114.)
[28] Zhao W T, Wu F, Fu Z Q, et al. Sentiment Analysis on Weibo Platform for Stock Prediction[C]// Proceedings of the 6th International Conference on Artificial Intelligence and Security (ICAIS). 2020: 323-333.
[29] Basu S. The Investment Performance of Common Stocks in Relation to Their Price/Earnings Ratio: A Test of the Efficient Market Hypojournal[J]. The Journal of Finance, 1977, 32(3):663-682.
doi: 10.1111/j.1540-6261.1977.tb01979.x
[30] French K R, Fama E F. Size and Book-to-Market Factors in Earnings and Returns[J]. The Journal of Finance, 1995, 50(1):131-155.
doi: 10.1111/j.1540-6261.1995.tb05169.x
[31] Cheadle C, Vawter M P, Freed W J, et al. Analysis of Microarray Data Using Z Score Transformation[J]. The Journal of Molecular Diagnostics, 2003, 5(2):73-81.
doi: 10.1016/S1525-1578(10)60455-2
[32] 徐月梅, 刘韫文, 蔡连侨. 基于深度融合特征的政务微博转发规模预测模型[J]. 数据分析与知识发现, 2020, 4(2/3):18-28.
[32] (Xu Yuemei, Liu Yunwen, Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-Combined Features[J]. Data Analysis and Knowledge Discovery, 2020, 4(2/3):18-28.)
[33] Staudemeyer R C, Morris E R. Understanding LSTM - A Tutorial into Long Short-Term Memory Recurrent Neural Networks[OL]. arXiv Preprint, arXiv: 1909. 09586.
[34] Friedman J H. Greedy Function Approximation: A Gradient Boosting Machine[J]. The Annals of Statistics, 2001, 29(5):1189-1232.
doi: 10.1214/aos/1013203450
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[4] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[5] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[7] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[9] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[10] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[11] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[12] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[13] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[14] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[15] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn