Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (7): 18-27    DOI: 10.11925/infotech.2096-3467.2020.0323
Current Issue | Archive | Adv Search |
Forecasting Poultry Turnovers with Machine Learning and Multiple Factors
Chen Dong1,Wang Jiandong1(),Li Huiying1,Cai Sihang1,Huang Qianqian1,Yi Chengqi1,Cao Pan2,3
1Big Data Development Department, State Information Center, Beijing 100045, China
2Chongqing Western Institute of Big Data Advanced Application, Chongqing 401100, China
3Beijing Yidianying Technology Co., Ltd, Beijing 100073, China
Download: PDF (1077 KB)   HTML ( 26
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to forecast the trends of poultry market influenced by multiple factors, aiming to strengthen the decision makings and policies for livestock and poultry production.[Methods] We chose 50 variables to construct machine learning models for predicting daily turnovers of dressed chicken. Our models were created based on popular machine learning algorithms.[Results] We found that GBRT, Random Forest and Elastic Net yielded stable prediction results and their MAEs were 25.30, 26.67, and 28.21 respectively. The prediction was improved with more large training sets and longer training time. We could forecast the turnovers of three periods in advance.[Limitations] The training sets needs to include more features and historical data.[Conclusions] The proposed models could quantatively assess and forecast the impacts of emergencies on industrial output, which imrpoves governmental policy making.

Key wordsForecasting      Machine Learning      Dressed Chicken     
Received: 16 April 2020      Published: 25 July 2020
ZTFLH:  TP393  
Corresponding Authors: Wang Jiandong     E-mail: wangjd@sic.gov.cn

Cite this article:

Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors. Data Analysis and Knowledge Discovery, 2020, 4(7): 18-27.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0323     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I7/18

General Train of Thought
特征类别 特征序号 特征名称 特征描述
市场主体特征 F1 BREEDING_ADD_YOY 鸡鸭等家禽养殖企业和个体工商户新增数量同比值
F2 BREEDING_CANCEL_REVOKE_YOY 鸡鸭等家禽养殖企业和个体工商户注销及吊销数量同比值
F3 BREEDING_RECRUIT_YOY 鸡鸭等家禽养殖企业和个体工商户招聘岗位数量同比值
F4 FEED_ADD_YOY 鸡鸭等家禽饲料企业和个体工商户新增数量同比值
F5 FEED_CANCEL_REVOKE_YOY 鸡鸭等家禽饲料企业和个体工商户注销及吊销数量同比值
F6 FEED_RECRUIT_YOY 鸡鸭等家禽饲料企业和个体工商户招聘岗位数量同比值
F7 SLAUGHTER_ADD_YOY 鸡鸭等家禽屠宰加工企业和个体工商户新增数量同比值
F8 SLAUGHTER_CANCEL_REVOKE_YOY 鸡鸭等家禽屠宰加工企业和个体工商户注销及吊销数量同比值
F9 SLAUGHTER_RECRUIT_YOY 鸡鸭等家禽屠宰加工企业和个体工商户招聘岗位数量同比值
F10 CHICK_ADD_YOY 鸡苗种鸡企业和个体工商户新增数量同比值
F11 CHICK_CANCEL_REVOKE_YOY 鸡苗种鸡企业和个体工商户注销及吊销数量同比值
F12 CHICK_RECRUIT_YOY 鸡苗种鸡企业和个体工商户招聘岗位数量同比值
F13 MEDICINE_ADD_YOY 生产禽药企业和个体工商户新增数量同比值
F14 MEDICINE_CANCEL_REVOKE_YOY 生产禽药企业和个体工商户注销及吊销数量同比值
F15 MEDICINE_RECRUIT_YOY 生产禽药企业和个体工商户招聘岗位数量同比值
F16 BREEDING_ADD_QOQ 鸡鸭等家禽养殖企业和个体工商户新增数量环比值
F17 BREEDING_CANCEL_REVOKE_QOQ 鸡鸭等家禽养殖企业和个体工商户注销及吊销数量环比值
F18 BREEDING_RECRUIT_QOQ 鸡鸭等家禽养殖企业和个体工商户招聘岗位数量环比值
F19 FEED_ADD_QOQ 鸡鸭等家禽饲料企业和个体工商户新增数量环比值
F20 FEED_CANCEL_REVOKE_QOQ 鸡鸭等家禽饲料企业和个体工商户注销及吊销数量环比值
F21 FEED_RECRUIT_QOQ 鸡鸭等家禽饲料企业和个体工商户招聘岗位数量环比值
F22 SLAUGHTER_ADD_QOQ 鸡鸭等家禽屠宰加工企业和个体工商户新增数量环比值
F23 SLAUGHTER_CANCEL_REVOKE_QOQ 鸡鸭等家禽屠宰加工企业和个体工商户注销及吊销数量环比值
F24 SLAUGHTER_RECRUIT_QOQ 鸡鸭等家禽屠宰加工企业和个体工商户招聘岗位数量环比值
F25 CHICK_ADD_QOQ 鸡苗种鸡企业和个体工商户新增数量环比值
F26 CHICK_CANCEL_REVOKE_QOQ 鸡苗种鸡企业和个体工商户注销及吊销数量环比值
F27 CHICK_RECRUIT_QOQ 鸡苗种鸡企业和个体工商户招聘岗位数量环比值
F28 MEDICINE_ADD_QOQ 生产禽药企业和个体工商户新增数量环比值
F29 MEDICINE_CANCEL_REVOKE_QOQ 生产禽药企业和个体工商户注销及吊销数量环比值
F30 MEDICINE_RECRUIT_QOQ 生产禽药企业和个体工商户招聘岗位数量环比值
舆情信息特征 F31 CHICKEN_NUMS 网民提及鸡肉等相关舆情信息数量
F32 CHICKEN_EMOTION 网民提及鸡肉等相关舆情信息情感值
搜索意愿特征 F33 SEARCH_SPRING_FESTIVAL “过年”一词百度指数结果
F34 SEARCH_CHICKEN “鸡肉”一词百度指数结果
F35 SEARCH_CHICKEN_PRICE “鸡肉价格”一词百度指数结果
F36 SEARCH_FEED “饲料”一词百度指数结果
F37 SEARCH_BLESS “扫福”一词百度指数结果
F38 SEARCH_ONLINE_OFFICE “在线办公”一词百度指数结果
F39 SEARCH_RETURN “返乡”一词百度指数结果
F40 SEARCH_NECESSITIES “年货”一词百度指数结果
F41 SEARCH_GREETINGS “拜年”一词百度指数结果
F42 SEARCH_DISEASE “疾病”一词百度指数结果
F43 SEARCH_VEGETABLES “买菜”一词百度指数结果
F44 SEARCH_EPIDEMIC “疫情”一词百度指数结果
F45 SEARCH_TICKET “抢票”一词百度指数结果
F46 SEARCH_CHICK “鸡苗”一词百度指数结果
统计数据特征 F47 PORK_NUMS 猪肉日均交易量(统计口径)
F48 EGG_NUMS 鸡蛋日均交易量(统计口径)
F49 BEEF_NUMS 牛肉日均交易量(统计口径)
F50 MUTTON_NUMS 羊肉日均交易量(统计口径)
Predict Characteristics of Dressed Chicken’s Daily Turnover (Week by Week)
Results of Random Sampling Data Set on the Stability
Comparison with Prediction Results of Different Algorithms
时间切片 训练集时间跨度 测试集时间跨度
1 第1~44周 第45周
2 第1~45周 第46周
3 第1~46周 第47周
4 第1~47周 第48周
5 第1~48周 第49周
6 第1~49周 第50周
7 第1~50周 第51周
8 第1~51周 第52周
Data Set Partition Method of Iterative Rolling Prediction Experiment
Comparison with Prediction Results of Different Algorithms
时间切片 训练集时间跨度 测试集时间跨度
1 第1~44周 第52周
2 第1~45周 第52周
3 第1~46周 第52周
4 第1~47周 第52周
5 第1~48周 第52周
6 第1~49周 第52周
7 第1~50周 第52周
8 第1~51周 第52周
Data Set Partition Method of Prediction Effect and Training Sample Number Analysis Experiment
The Relationship Between the Training Samples Needed for Prediction and the Number of Period Time
[1] 国家统计局. 2020年1月份居民消费价格同比上涨5.4%[R/OL]. [ 2020- 03- 12]. http://www.stats.gov.cn/tjsj/zxfb/202002/t20200210_1725569.html.
[1] ( National Bureau of Statistics of China. Consumer Prices for January 2020 [R/OL]. [ 2020- 03- 12]. http://www.stats.gov.cn/tjsj/zxfb/202002/t20200210_1725569.html.)
[2] 潘迪特, 李昌琪. 时间序列及系统分析与应用[M]. 李昌琪, 荣国俊译. 北京: 机械工业出版社, 1988.
[2] ( Pandit S M, Li Changqi. Time Series and System Analysis with Applications[M]. Translated by Li Changqi, Rong Guojun. Beijing: China Machine Press, 1988.)
[3] 李一智. 经济预测技术[M]. 北京: 清华大学出版社, 1991.
[3] ( Li Yizhi. Economic Forecasting Techniques[M]. Beijing: Tsinghua University Press, 1991.)
[4] 邓聚龙. 灰色系统(社会·经济)[M]. 北京: 国防工业出版社, 1985.
[4] ( Deng Julong Grey System (Sociology·Economics)[M]. Beijing: National Defense Industry Press, 1985.)
[5] 李志强, 白文斌, 张亚丽, 等. 基于ARIMA模型的内蒙古羊产业分析与预测[J]. 山西农业科学, 2011,39(7):729-732, 743.
[5] ( Li Zhiqiang, Bai Wenbin, Zhang Yali, et al. Analysis and Forecast of Sheep Industry Based on the ARIMA Model in Inner Mongolia[J]. Journal of Shanxi Agricultural Sciences, 2011,39(7):729-732, 743.)
[6] 王晓梅. 灰色理论GM(1,1)模型在我国畜产品产量预测中的应用[J]. 安徽农业科学, 2007,35(1):7-8.
[6] ( Wang Xiaomei. The Application of Grey Theory GM(1,1) Model in the Prediction of Animal Product Yield in China[J]. Journal of Anhui Agricultural Sciences, 2007,35(1):7-8.)
[7] 林绍森, 唐永金. 三种模型对我国粮食产量预测效果的比较[J]. 统计与决策, 2007(4):39-40.
[7] ( Lin Shaosen, Tang Yongjin. Comparison of Three Models on Forecasting Grain Yields in China[J]. Statistics & Decision, 2007(4):39-40.)
[8] 刘峰, 王儒敬, 李传席. ARIMA模型在农产品价格预测中的应用[J]. 计算机工程与应用, 2009,45(25):238-239, 248.
doi: 10.3778/j.issn.1002-8331.2009.25.073
[8] ( Liu Feng, Wang Rujing, Li Chuanxi. Application of ARIMA Model in Forecasting Agricultural Product Price[J]. Computer Engineering and Applications, 2009,45(25):238-239, 248.)
doi: 10.3778/j.issn.1002-8331.2009.25.073
[9] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction[M]. New York: Springer Science & Business Media, 2009.
[10] Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net[J]. Journal of the Royal Statistical Society: Series B(Statistical Methodology), 2005,67(2):301-320.
[11] Breiman L, Friedman J, Olshen R, et al. Classification and Regression Trees[M]. CRC Press, 1984.
[12] Friedman J H. Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001,29(5):1189-1232.
[13] 谢坤, 容钰添, 胡奉平, 等. 基于数据集成的随机森林算法[J/OL]. 计算机工程, [2020-03-12]. https://doi.org/10.19678/j.issn. 1000-3428.0055891.
[13] ( Xie Kun, Rong Yutian, Hu Fengping, et al. Random Forest Based on Data Ensembling[J/OL]. Computer Engineering, [2020-03-12]. https://doi.org/10.19678/j.issn. 1000-3428.0055891.)
[14] 林霞, 刘宗尚, 高宇, 等. 基于机器学习的产油量主控因素分析[J]. 信息系统工程, 2019(12):94-97, 99.
[14] ( Lin Xia, Liu Zongshang, Gao Yu, et al. Analysis of the Main Control Factors of Oil Production Based on Machine Learning[J]. China CIO News, 2019(12):94-97, 99.)
[15] Ayaru L, Ypsilantis P P, Nanapragasam A, et al. Prediction of Outcome in Acute Lower Gastrointestinal Bleeding Using Gradient Boosting[J]. PLoS One, 2015,10(7):e0132485.
doi: 10.1371/journal.pone.0132485 pmid: 26172121
[16] 张棪, 曹健. 面向大数据分析的决策树算法[J]. 计算机科学, 2016,43(S1):374-379, 383.
[16] ( Zhang Yan, Cao Jian. Decision Tree Algorithms for Big Data Analysis[J]. Computer Science, 2016,43(S1):374-379, 383.)
[17] 董莉, 彭凯越, 唐晓彬. 大数据背景下的CPI实时预测研究[J]. 调研世界, 2017(8):51-54.
[17] ( Dong Li, Peng Kaiyue, Tang Xiaobin. Research on Real-Time CPI Prediction Under the Background of Big Bata[J]. The World of Survey and Research, 2017(8):51-54.)
[18] 康传利, 顾峻峰, 刘兆威. 梯度提升回归树的旅游流量预测模型[J]. 数学的实践与认识, 2019,49(15):251-261.
[18] ( Kang Chuanli, Gu Junfeng, Liu Zhaowei. Analysis of Tourist Volume Forecasting Model Based on Gradient Boost Regression Tree[J]. Mathematics in Practice and Theory, 2019,49(15):251-261.)
[19] 巩晓文, 凤思苑, 崔壮, 等. 基于SVGD分类预测的梯度提升机与随机森林的性能比较[J]. 中国卫生统计, 2019,36(5):674-677.
[19] ( Gong Xiaowen, Feng Siyuan, Cui Zhuang, et al. Performance Comparison Between Gradient Boosting Machine and Random Forest Based on SVGD Classification Prediction[J]. Chinese Journal of Health Statistics, 2019,36(5):674-677.)
[20] 韩忠明, 原碧鸿, 陈炎, 等. 一个有效的基于GBRT的早期电影票房预测模型[J]. 计算机应用研究, 2018,35(2):410-416.
[20] ( Han Zhongming, Yuan Bihong, Chen Yan, et al. Effective Box-Office Revenue Prediction Model Based on GBRT[J]. Application Research of Computers, 2018,35(2):410-416.)
[21] Maric I, Ivek I. Self-organizing Polynomial Networks for Time-constrained Applications[J]. IEEE Transactions on Industrial Electronics, 2011,58(5):2019-2029.
[22] Singh B, Sihag P, Singh K. Modelling of Impact of Water Quality on Infiltration Rate of Soil by Random Forest Regression[J]. Modeling Earth Systems and Environment, 2017,3(3):999-1004.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[9] Jiang Cuiqing,Wang Xiangxiang,Wang Zhao. Forecasting Car Sales Based on Consumer Attention[J]. 数据分析与知识发现, 2021, 5(1): 128-139.
[10] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] Wang Jiandong,Yu Shiyang. Principles on Constructing National Economic Brain[J]. 数据分析与知识发现, 2020, 4(7): 2-17.
[12] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[13] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[14] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[15] Jiandong Wang. Monitoring and Forecasting Economic Performance with Big Data[J]. 数据分析与知识发现, 2020, 4(1): 12-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn