Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (12): 33-42     https://doi.org/10.11925/infotech.2096-3467.2018.0420
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于文本价格融合模型的股票趋势预测*
余传明1, 龚雨田1, 王峰1, 安璐2()
1中南财经政法大学信息与安全工程学院 武汉 430073
2武汉大学信息管理学院 武汉 430072
Predicting Stock Prices with Text and Price Combined Model
Yu Chuanming1, Gong Yutian1, Wang Feng1, An Lu2()
1School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China
2School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (761 KB)   HTML ( 4
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】在传统股票预测模型的基础上, 提高股票价格预测准确率, 降低股票交易风险, 研究大数据环境下的股票价格变化趋势。【方法】提出一种新的文本价格融合模型。该模型对股票论坛上的评论文本预处理后, 通过深度表示学习生成评论文本的特征矩阵, 使用K均值聚类方法生成文本类别; 结合开盘价、收盘价等15个原始价格指标, 使用多层感知机算法预测股票价格趋势。【结果】使用该模型进行预测, 所得精度为65.91%, 超出单独使用价格特征的模型7.76%, 超出单独使用文本特征的模型11.37%, 预测性能具有较大提升。【局限】只对个股进行预测研究。【结论】本文模型从文本和价格结合的角度出发提高股票预测精度, 为股价趋势预测相关研究者和从业者提供新的研究方法和研究视角。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
余传明
龚雨田
王峰
安璐
关键词 文本股票价格股票价格趋势预测文本价格融合模型    
Abstract

[Objective] This paper tries to predict stock price fluctuation with the help of big data, aiming to improve the accuracy of the forecasting and reduce the trading risks. [Methods] We proposed a new Text and Price Combined Model (TPCM) to process comments retrieved from a stock forum. Then, we employed deep representation learning algorithm to generate text feature matrix and utilized the K-means clustering method to generate text category. Finally, we used the Multi-Layer Perceptron (MLP) to predict stock price fluctuation based on the opening price, closing price and other 15 original price indicators. [Results] The accuracy of TPCM was 65.91%, which was 7.76% higher than that of the model (58.15%) employing price features only, and 11.37% higher than that of the model (54.54%) employing text features only. [Limitations] The study only used one stock to examine the proposed model. [Conclusions] Stock price forecasting could be improved through the combination of text and price, which creates novel perspectives for future studies.

Key wordsText    Stock Price    Stock Price Fluctuation Prediction    Text and Price Combined Model
收稿日期: 2018-04-16      出版日期: 2019-01-16
ZTFLH:  TP391.1  
基金资助:*本文系国家自然科学基金面上项目“大数据环境下基于领域知识获取与对齐的观点检索研究”(项目编号: 71373286)和中南财经政法大学科研项目“证券交易量化投资策略研究”(项目编号: 3251612007)的研究成果之一
引用本文:   
余传明, 龚雨田, 王峰, 安璐. 基于文本价格融合模型的股票趋势预测*[J]. 数据分析与知识发现, 2018, 2(12): 33-42.
Yu Chuanming,Gong Yutian,Wang Feng,An Lu. Predicting Stock Prices with Text and Price Combined Model. Data Analysis and Knowledge Discovery, 2018, 2(12): 33-42.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0420      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I12/33
  文本价格融合模型
  文本评论示例
  模型预测精度随股票价格技术指标延迟天数变化
延迟时间 LSTM Bi-LSTM
1 daylag 46.67% 52.22%
2 daylags 50.56% 44.94%
3 daylags 53.41% 54.54%
4 daylags 45.98% 49.43%
5 daylags 45.34% 43.02%
6 daylags 45.88% 45.88%
7 daylags 45.23% 39.29%
  文本预测模型精度随时间延迟变化
算法 P R F ACC AUC
Price AdaBoosting 45.12% 55.69% 50.89% 56.18% 56.36%
DT 59.86% 58.12% 58.49% 57.90% 53.99%
KNN 56.32% 56.42% 56.36% 54.68% 50.00%
NB 40.58% 47.76% 44.63% 47.73% 52.00%
SVM 54.68% 55.83% 55.46% 54.62% 53.31%
MLP 57.67% 58.22% 58.09% 58.15% 50.00%
Price+Text AdaBoosting 50.65% 51.53% 49.58% 51.21% 57.95%
DT 41.94% 42.05% 41.86% 42.05% 42.06%
KNN 51.17% 51.14% 50.83% 51.14% 51.12%
NB 59.17% 59.09% 59.09% 59.09% 56.03%
SVM 57.37% 56.82% 56.00% 56.82% 55.68%
TPCM(MLP) 66.78% 65.91% 65.46% 65.91% 62.66%
  各算法预测实验结果
  添加文本聚类标签与不添加文本聚类标签预测结果对比
  不同文本向量维度对应预测实验结果对比
[1] Fama E F.The Behavior of Stock-Market Prices[J]. Journal of Business, 1965, 38(1): 34-105.
doi: 10.1086/294743
[2] 庄树田. 浅谈投资心理和投资行为[J]. 东南大学学报: 哲学社会科学版, 2015, 17(S2): 41, 46.
[2] (Zhuang Shutian. A Preliminary Analysis of Investment Psychology and Investment Behavior[J]. Journal of Southeast University: Philosophy and Social Science, 2015, 17(S2): 41, 46.)
[3] 张健. 近代西欧历史上的泡沫事件及其经济影响[J]. 世界经济与政治论坛, 2010(4): 99-109.
doi: 10.3969/j.issn.1007-1369.2010.04.009
[3] (Zhang Jian.Economic Bubbles in the History of Modern Western Europe and Influences[J]. Forum of World Economics & Politics, 2010(4): 99-109.)
doi: 10.3969/j.issn.1007-1369.2010.04.009
[4] 师萍, 李丽青, 杨洵. 上市公司与审计机构信息披露的博弈模型与实证分析[J]. 管理工程学报, 2004, 18(1): 44-47.
doi: 10.3969/j.issn.1004-6062.2004.01.011
[4] (Shi Ping, Li Liqing, Yang Xun.A Game Theory Analysis Between Public Company and Audit Office in Securities Market[J]. Journal of Industrial and Engineering Management, 2004, 18(1): 44-47.)
doi: 10.3969/j.issn.1004-6062.2004.01.011
[5] 王洪良, 詹奕椿. 上证股市非理性行为的实证分析[J]. 长春大学学报, 2015, 25(7):24-29.
[5] (Wang Hongliang, Zhan Yichun.An Empirical Analysis on the Irrational Behavior of Shanghai Stock Market[J]. Journal of Changchun University, 2015, 25(7): 24-29.)
[6] Nagy J L. Behavioral Economics and the Effects of Psychology on the Stock Market [EB/OL]. [2017-08-30]. .
[7] 邹辉文. 投资者非理性心理行为的综合效应与股价波动[J]. 福州大学学报: 哲学社会科学版, 2008, 22(1): 25-29.
doi: 10.3969/j.issn.1002-3321.2008.01.005
[7] (Zou Huiwen.Combined Effects of Non-rational Trade Behavior of Investors and Fluctuation of Stock Prices[J]. Journal of Fuzhou University: Philosophy and Social Sciences, 2008, 22(1): 25-29.)
doi: 10.3969/j.issn.1002-3321.2008.01.005
[8] 史青春, 徐露莹. 负面舆情对上市公司股价波动影响的实证研究[J]. 中央财经大学学报, 2014(10): 54-62.
[8] (Shi Qingchun, Xu Luying.Empirical Research on the Listed Companies’ Stock Prices Affected by Negative Public Opinion[J]. Journal of Central University of Finance & Economics, 2014(10): 54-62.)
[9] 于瑾, 侯伟相. 杠杆交易、机构投资者行为与资产价格暴跌风险——来自股票市场的证据[J]. 金融监管研究, 2017(12): 17-34.
[9] (Yu Jin, Hou Weixiang.Leveraged Transactions, the Behavior of Institutional Investor and the Risk of Asset Price Crash: Evidences from the Stock Market[J]. Financial Regulation Research, 2017(12): 17-34.)
[10] 岳衡, 赵龙凯. 股票价格中的数字与行为金融[J]. 金融研究, 2007(5): 98-107.
[10] (Yue Heng, Zhao Longkai.Figures and Behavioral Finance in Stock Prices[J]. Journal of Financial Research, 2007(5): 98-107.)
[11] 吴璇, 田高良, 司毅, 等. 网络舆情管理与股票流动性[J]. 管理科学, 2017, 30(6): 51-64.
[11] (Wu Xuan, Tian Gaoliang, Si Yi, et al.Internet Media Management and Stock Liquidity[J]. Journal of Management Science, 2017, 30(6): 51-64.)
[12] 林川. 过度投资、市场情绪与股价崩盘——来自创业板上市公司的经验证据[J]. 中央财经大学学报, 2016(12): 53-64.
[12] (Lin Chuan. ExcessiveInvestment, Market Sentiment and Share Prices Crash: Empirical Evidence from GEM Listed Companies[J]. Journal of Central University of Finance & Economics, 2016(12): 53-64.)
[13] 郭红玉, 许争, 佟捷然. 日本量化宽松政策的特征及对股票市场短期影响研究——基于事件分析法[J]. 国际金融研究, 2016(5): 38-47.
[13] (Guo Hongyu, Xu Zheng, Tong Jieran.The Characteristics of Japan’s Quantitative Easing Policy and Its Short-Term Impact on Stock Market——Based on Event Analysis[J]. Studies of International Finance, 2016(5): 38-47.)
[14] 卢磊. 基于多元回归与技术分析的组合股票价格预测[J]. 上海应用技术学院学报: 自然科学版, 2014, 14(3): 274-276.
doi: 10.3969/j.issn.1671-7333.2014.03.020
[14] (Lu Lei.Combinational Stock Price Forecasting Based on Multiple Regression and Technical Analysis[J]. Journal of Shanghai Institute of Technology: Natural Science, 2014, 14(3): 274-276.)
doi: 10.3969/j.issn.1671-7333.2014.03.020
[15] 陈璐璐. 基于多元线性回归分析的股价预测——以中信银行为例[J]. 经济研究导刊, 2016(19): 75-76.
doi: 10.3969/j.issn.1673-291X.2016.19.032
[15] (Chen Lulu.Based on Multivariate Linear Regression Analysis—— Forecasting Stock Prices in China Citic Bank[J]. Economic Research Guide, 2016(19): 75-76.)
doi: 10.3969/j.issn.1673-291X.2016.19.032
[16] 张建宽, 盛炎平. 支持向量机对股票价格涨跌的预测[J]. 北京信息科技大学学报: 自然科学版, 2017, 32(3): 41-44.
doi: 10.16508/j.cnki.11-5866/n.2017.03.008
[16] (Zhang Jiankuan, Sheng Yanping.Prediction of Stock Price Fluctuation with Support Vector Machine[J]. Journal of Beijing Information Science & Technology University: Natural Science, 2017, 32(3): 41-44.)
doi: 10.16508/j.cnki.11-5866/n.2017.03.008
[17] 黄宏运, 吴礼斌, 李诗争. BP神经网络在股票指数预测中的应用[J]. 通化师范学院学报, 2016, 37(5): 32-34.
doi: 10.13877/j.cnki.cn22-1284.2016.10.011
[17] (Huang Hongyun, Wu Libin, Li Shizheng.Application of Neural Network in Prediction of Stock Index[J]. Journal of Tonghua Normal University, 2016, 37(5): 32-34.)
doi: 10.13877/j.cnki.cn22-1284.2016.10.011
[18] 魏文轩. 改进型RBF神经网络在股票市场预测中的应用[J]. 统计与决策, 2013(15): 70-72.
[18] (Wei Wenxuan.Application of Improved RBF Neural Network in Stock Market Forecasting[J]. Statistics & Decision, 2013(15): 70-72.)
[19] 蔡红, 陈荣耀. 基于PCA-BP神经网络的股票价格预测研究[J]. 计算机仿真, 2011, 28(3):365-368.
doi: 10.3969/j.issn.1006-9348.2011.03.088
[19] (Cai Hong, Chen Rongyao.Stock Price Prediction Based on PCA and BP Neural Network[J]. Computer Simulation, 2011, 28(3): 365-368.)
doi: 10.3969/j.issn.1006-9348.2011.03.088
[20] Göçken M, özçalıcı M, Boru A, et al.Integrating Metaheuristics and Artificial Neural Networks for Improved Stock Price Prediction[J]. Expert Systems with Applications, 2016, 44: 320-331.
doi: 10.1016/j.eswa.2015.09.029
[21] 郭建峰, 李玉, 安东. 基于LM遗传神经网络的短期股价预测[J]. 计算机技术与发展, 2017, 27(1): 152-155.
doi: 10.3969/j.issn.1673-629X.2017.01.034
[21] (Guo Jianfeng, Li Yu, An Dong.Prediction for Short-term Stock Price Based on LM-GA-BP Neural Network[J]. Computer Technology and Development, 2017, 27(1): 152-155.)
doi: 10.3969/j.issn.1673-629X.2017.01.034
[22] Adebiyi A A, Adewumi A, Ayo C.Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction[J]. Journal of Applied Mathematics, 2014(1): 1-7.
doi: 10.1155/2014/614342
[23] Evangelopoulos N, Magro M, Sidorova A.The Dual Micro/Macro Informing Role of Social Network Sites: Can Twitter Macro Messages Help Predict Stock Prices?[J]. Informing Science: The International Journal of an Emerging Transdiscipline, 2012, 15: 247-269.
doi: 10.28945/1739
[24] 王健俊, 殷林森, 叶文靖. 投资者情绪、杠杆资金与股票价格——兼论2015-2016年股灾成因[J]. 金融经济学研究, 2017, 32(1): 85-98.
[24] (Wang Jianjun, Yin Linsen, Ye Wenjing. Investor Sentiment, Leveraged Fund and Stock Price: Reflection on the Cause of Stock Crash in 2015-2016[J]. Financial Economics Research, 2017, 32(1): 85-98.)
[25] 石勇, 唐静, 郭琨. 社交媒体投资者关注、投资者情绪对中国股票市场的影响[J]. 中央财经大学学报, 2017(7): 45-53.
[25] (Shi Yong, Tang Jing, Guo Kun.The Study of Social Media Investor Attention and Sentiment’s Influence on Chinese Stock Market[J]. Journal of Central University of Finance & Economics, 2017(7): 45-53.)
[26] 于琴, 张兵, 虞文微. 新闻情绪是股票收益的幕后推手吗[J]. 金融经济学研究, 2017, 32(6): 95-103.
[26] (Yu Qin, Zhang Bing, Yu Wenwei.Are the Emotions of News a Wire-puller of Stock Returns?[J]. Financial Economics Research, 2017, 32(6): 95-103.)
[27] 董理, 王中卿, 熊德意. 基于文本信息的股票指数预测[J]. 北京大学学报: 自然科学版, 2017, 53(2): 273-278.
doi: 10.13209/j.0479-8023.2017.037
[27] (Dong Li, Wang Zhongqing, Xiong Deyi.Stock Index Prediction Based on Text Information[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 273-278.)
doi: 10.13209/j.0479-8023.2017.037
[28] 黄润鹏, 左文明, 毕凌燕. 基于微博情绪信息的股票市场预测[J]. 管理工程学报, 2015, 29(1): 47-52.
[28] (Huang Runpeng, Zuo Wenming, Bi Lingyan.Predicting the Stock Market Based on Microblog Mood[J]. Journal of Industrial Engineering and Engineering Management, 2015, 29(1): 47-52.)
[29] Yan D F, Zhou J, Zhao X, et al.Predicting Stock Using Microblog Moods[J]. China Communications, 2016, 13(8): 244-257.
doi: 10.1109/CC.2016.7563727
[30] Nguyen T H, Shirai K, Velcin J.Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[31] Li X, Xie H, Chen L, et al.News Impact on Stock Price Return via Sentiment Analysis[J]. Knowledge-Based Systems, 2014, 69(1): 14-23.
doi: 10.1016/j.knosys.2014.04.022
[32] 苏治, 卢曼, 李德轩. 深度学习的金融实证应用: 动态、贡献与展望[J]. 金融研究, 2017(5): 111-126.
[32] (Su Zhi, Lu Man, Li Dexuan.Deep Learning in Financial Empirical Application: Dynamics, Contributions and Prospects[J]. Journal of Financial Research, 2017(5): 111-126.)
[33] 韩豫峰, 汪雄剑, 周国富, 等. 中国股票市场是否存在趋势?[J]. 金融研究, 2014(3): 152-163.
[33] (Han Yufeng, Wang Xiongjian, Zhou Guofu, et alAre There Trends in Chinese Stock Market?[J]. Journal of Financial Research, 2014(3): 152-163.)
[34] 金德环, 李岩. 投资者互动与股票收益——来自社交媒体的经验证据[J]. 金融论坛, 2017(5): 72-80.
[34] (Jin Dehuan, Li Yan.Investor Interaction and Stock Returns——Empirical Evidences of Social Media[J]. Finance Forum, 2017(5): 72-80.)
[35] 刘向强, 李沁洋, 孙健. 互联网媒体关注度与股票收益:认知效应还是过度关注[J]. 中央财经大学学报, 2017(7): 54-62.
[35] (Liu Xiangqiang, Li Qinyang, Sun Jian.Internet Media Coverage and Stock Returns: Investor Recognition or Over Attention[J]. Journal of Central University of Finance & Economics, 2017(7): 54-62.)
[36] 段江娇, 刘红忠, 曾剑平. 中国股票网络论坛的信息含量分析[J]. 金融研究, 2017(10): 178-192.
[36] (Duan Jiangjiao, Liu Hongzhong, Zeng Jianping.Analysis on the Information Content of China’s Internet Stock Message Boards[J]. Journal of Financial Research, 2017(10): 178-192.)
[37] 杨晓兰, 沈翰彬, 祝宇. 本地偏好、投资者情绪与股票收益率: 来自网络论坛的经验证据[J]. 金融研究, 2016(12): 143-158.
[37] (Yang Xiaolan, Shen Hanbin, Zhu Yu.The Effect of Local Bias in Investor Attention and Investor Sentiment on Stock Markets: Evidence from Online Forum[J]. Journal of Financial Research, 2016(12): 143-158.)
[38] Huang Y, Qiu H, Wu Z.Local Bias in Investor Attention: Evidence from China’s Internet Stock Message Boards[J]. Journal of Empirical Finance, 2016, 38: 338-354.
doi: 10.2139/ssrn.2050232
[39] Rätsch G, Onoda T, Müller K R.Soft Margins for AdaBoost[J]. Machine Learning, 2001, 42(3): 287-320.
doi: 10.1023/A:1007618119488
[40] Safavian S R, Landgrebe D.A Survey of Decision Tree Classifier Methodology[J]. IEEE Transactions on Systems, Man and Cybernetics, 2002, 21(3): 660-674.
doi: 10.1109/21.97458
[41] Guo G, Wang H, Bell D, et al.KNN Model-Based Approach in Classification[J]. Lecture Notes in Computer Science, 2003, 2888: 986-996.
doi: 10.1007/b94348
[42] Rish I.An Empirical Study of The Naive Bayes Classifier[C]// Proceedings of the 2001 Workshop on Empirical Methods in Artificial Intelligence. 2001, 3(22): 41-46.
[43] Hearst M A, Dumais S T, Osuna E, et al.Support Vector Machines[J]. IEEE Intelligent Systems & Their Applications, 1998, 13(4): 18-28.
[1] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[2] 唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[3] 焦启航,乐小虬. 对比关系句子生成方法研究[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
[4] 王思迪,胡广伟,杨巳煜,施云. 基于文本分类的政府网站信箱自动转递方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[5] 高原,施元磊,张蕾,曹天奕,冯筠. 基于游记文本的游客游览行程重构*[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[6] 徐月梅,刘韫文,蔡连侨. 基于深度融合特征的政务微博转发规模预测模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] 田钟林,吴旭,颉夏青,许晋,陆月明. 一种基于领域语义关系图的短文本实时分析模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 239-248.
[8] 马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[9] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[10] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[11] 吴佳芬,马费成. 产品虚假评论文本识别方法研究述评 *[J]. 数据分析与知识发现, 2019, 3(9): 1-15.
[12] 黄漫宇,云琪,彭虎锋,窦雪萌. 基于主题挖掘的超额募资农产品众筹项目文本特征研究 *——以众筹网为例[J]. 数据分析与知识发现, 2019, 3(9): 124-134.
[13] 赵华茗,余丽,周强. 基于均值漂移算法的文本聚类数目优化研究 *[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[14] 聂维民,陈永洲,马静. 融合多粒度信息的文本向量表示模型 *[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[15] 邵云飞,刘东苏. 基于类别特征扩展的短文本分类方法研究 *[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn