Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 192-199    DOI: 10.11925/infotech.2096-3467.2019.0522
Current Issue | Archive | Adv Search |
Forecasting Airfare Based on Route Characteristics
Zhong Lizhen1,Ma Minshu1(),Zhou Changfeng2
1School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
2Passenger Transport Department, China State Railway Group Co., Ltd., Beijing 100033, China
Download: PDF(783 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper predicts airfare on routes with fewer daily average flights and incomplete or even no historical data, aiming to help passengers choose better ticketing time.[Methods] We used historical data of multiple routes to predict airfares of the targets. Based on previous research and data, we extracted characteristic variables related to airfare fluctuations. We also classified these variables to establish the airfare forecasting model.[Results] When the model contains variables like the distance and the socio-economic characteristics of the route, the prediction error was significantly reduced.[Limitations] We did not include transit flights and local residents’ income data in our study. More research is needed to evaluate the performance of predicting algorithms.[Conclusions] The characteristics related to the year, the distance between the two places and the socio-economic factors of the routes are the main reasons for airfare fluctuations.

Key wordsAirfare Prediction      Support Vector Regression      Ticket Purchase Time Decision      Route Characteristic     
Received: 16 May 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Minshu Ma     E-mail: mshma@bjtu.edu.cn

Cite this article:

Zhong Lizhen,Ma Minshu,Zhou Changfeng. Forecasting Airfare Based on Route Characteristics. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 192-199.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0522     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/192

分类 特征变量 说明
航班 发到时段 DEPART_CLOCK 出发钟点
ARRIVE_CLOCK 到达钟点
出发
日期
与年度
相关
WEEK_OF_THE_YEAR 该年第几周
YEARS_DIFF 年份之差
与周
相关
DAY_OF_THE_WEEK 星期几
IS_MONDAY 是否周一
IS_FRIDAY 是否周五
IS_WEEKEND 是否周末
航线 社会经济特征 DEP_CITY_LEVEL 出发城市的城市等级
ARR_CITY_LEVEL 到达城市的城市等级
DEP_GDP 出发城市的人均GDP
ARR_GDP 到达城市的人均GDP
DEP_POPULATION 出发城市的人口数
ARR_POPULATION 到达城市的人口数
空间距离 DISTANCE 航程
高铁服务水平 HSR_STD_PRICE 高铁单位里程票价
HSR_NUM 高铁日均开行班次
预测周期 DAYS_DIFF 距离起飞天数
Feature Alternative Set
步骤 特征变量 备注
自变量 控制变量
1 出发日期 空间距离
高铁服务水平
社会经济特征
发到时段
预测周期
将表征出发日期的两类变量进行组合,构造16个模型M1-M16,如表3所示。
2 空间距离 出发日期
高铁服务水平
社会经济特征
发到时段
预测周期
去掉空间距离这一变量,构造模型M17。
3 高铁服务水平 出发日期
空间距离
社会经济特征
发到时段
预测周期
将表征高铁服务水平的两个变量进行组合,构造3个模型M18-M20,如表4所示。
4 社会经济特征 出发日期
空间距离
高铁服务水平
发到时段
预测周期
将表征社会经济特征的三组变量进行组合,构造7个模型M21-M27,如表5所示。
The Process of Model Building
特征

模型
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16
WEEK_OF_THE_YEAR - - - - - - - -
YEARS_DIFF - - - - - - - -
DAY_OF_THE_WEEK - - - - - - - - - - -
IS_MONDAY - - - - - - - - - - - -
IS_FRIDAY - - - - - - - - - - - -
IS_WEEKEND - - - - - - - -
The Description of Depart Date
特征

模型
M18 M19 M20
HSR_STD_PRICE - -
HSR_NUM - -
The Description of High-speed Rail Service Level
特征

模型
M21 M22 M23 M24 M25 M26 M27
DEP_CITY_LEVEL - - - -
ARR_CITY_LEVEL - - - -
DEP_GDP - - - -
ARR_GDP - - - -
DEP_POPULATION - - - -
ARR_POPULATION - - - -
The Description of Socioeconomic Feature
参数 取值范围
C [0.001,0.01,0.1,1,10,100]
ε [0.001,0.01,0.1,1,10,100]
γ [2-10,2-9,,23]
The Value Range of Parameters

与年度相关


与周相关
- DAY_OF_THE_WEEK IS_WEEKEND IS_MONDAY IS_FRIDAY IS_WEEKEND
误差(模型) 误差(模型) 误差(模型) 误差(模型)
- 28.90%(M1) 19.87%(M5) 29.72%(M9) 26.51%(M13)
WEEK_OF_THE_YEAR 19.14%(M2) 19.60%(M6) 19.65%(M10) 19.70%(M14)
YEARS_DIFF 21.64%(M3) 13.59%(M7) 22.69%(M11) 14.36%(M15)
WEEK_OF_THE_YEAR
YEARS_DIFF
13.39%(M4) 13.89%(M8) 14.24%(M12) 20.78%(M16)
Errors with Different Descriptions of Departure Date
误差(模型)
有DISTANCE 13.39%(M4)
无DISTANCE 16.19%(M17)
Errors Considering Whether Distance is Included in the Models
有HSR_NUM 无HSR_NUM
误差(模型) 误差(模型)
有HSR_STD_PRICE 13.39%(M4) 13.37%(M20)
无HSR_STD_PRICE 13.37%(M19) 13.33%(M18)
Errors with Different Descriptions of High-speed Rail Service Level
社会经济特征 误差 模型 P值
M18 M21 M22 M23 M24 M25 M26
ALL 13.33% M18 - - - - - - -
NONE 24.12% M21 0.000* - - - - - -
CITY_LEVEL+GDP 13.43% M22 0.388 0.000* - - - - -
CITY_LEVEL+PLN 13.15% M23 0.582 0.000* 0.076 - - - -
GDP+PLN 13.26% M24 0.315 0.000* 0.204 0.709 - - -
CITY_LEVEL 13.51% M25 0.395 0.000* 0.515 0.021* 0.163 - -
GDP 20.36% M26 0.000* 0.035* 0.000* 0.000* 0.000* 0.000* -
PLN 17.68% M27 0.001* 0.000* 0.001* 0.000* 0.000* 0.000* 0.045*
Errors and P-values with Different Descriptions of Socioeconomic Feature
航线 混合航线 单条航线 误差之差
南宁-郑州 18.78% 33.74% -14.96%
重庆-福州 5.48% 13.45% -7.97%
南宁-武汉 6.30% 9.60% -3.30%
南昌-北京 20.76% 19.58% 1.18%
杭州-长沙 4.86% 3.63% 1.23%
郑州-深圳 7.90% 2.34% 5.56%
广州-南宁 15.79% 1.54% 14.25%
Prediction Errors Between Mixed Routes and Single Routes
Distribution of Airfares per Mileage
航线 测试集中具有不同年份的历史
同期数据的占比
测试集中具有近期数据的占比
2年 1年 没有
南宁-郑州 8% 23% 12% 57%
重庆-福州 9% 9% 9% 73%
南宁-武汉 42% 0% 14% 44%
南昌-北京 16% 8% 7% 69%
杭州-长沙 - - - 100%
郑州-深圳 - - - 100%
广州-南宁 - - - 100%
Data Set of Every Route
[1] Etzioni O, Tuchinda R, Knoblock C , et al. To Buy or Not to Buy: Mining Airfare Data to Minimize Ticket Purchase Price [C]// Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2003: 119-128.
[2] Wohlfarth T, Clémençon S, Roueff F , et al. A Data-Mining Approach to Travel Price Forecasting [C]// Proceedings of the 10th International Conference on Machine Learning & Applications & Workshops. 2011: 84-89.
[3] 顾兆军, 王双, 赵亿 . 基于时间序列的机票价格预测模型[J]. 中国民航大学学报, 2013,31(2):80-84.
[3] ( Gu Zhaojun, Wang Shuang, Zhao Yi . Flight Ticket Fare Prediction Model Based on Time-Serial[J]. Journal of Civil Aviation University of China, 2013,31(2):80-84.)
[4] Tziridis K, Kalampokas T, Papakostas G A , et al. Airfare Prices Prediction Using Machine Learning Techniques [C]// Proceedings of the 25th European Signal Processing Conference. 2017: 1036-1039.
[5] Chen Y, Cao J, Feng S , et al. An Ensemble Learning Based Approach for Building Airfare Forecast Service [C]// Proceedings of the 2015 IEEE International Conference on Big Data. 2015: 964-969.
[6] Xu Y, Cao J . OTPS: A Decision Support Service for Optimal Airfare Ticket Purchase [C]// Proceedings of the 2017 IEEE International Conference on Big Data. 2017: 1363-1368.
[7] 王振, 张志敏, 禚保玲 . 基于多源数据的青岛市中心城区城市特征研究[C]//2018中国城市规划年会. 2018.
[7] ( Wang Zhen, Zhang Zhimin, Zhuo Baoling . Research on the Urban Characteristics of Qingdao City Center Based on Multi-source Data[C]//Proceedings of 2018 China Urban Planning Annual Meeting. 2018.
[8] 卢晓涵, 罗吉, 琚瑞 , 等. 基于区域竞争力的国家中心城市特征研究[C]//2018中国城市规划年会. 2018.
[8] ( Lu Xiaohan, Luo Ji, Ju Rui , et al. Research on the Characteristics of National Central Cities Based on Regional Competitiveness[C]//Proceedings of 2018 China Urban Planning Annual Meeting. 2018.
[9] He D, Lin Y C, Chen J , et al. Microstructural Evolution and Support Vector Regression Model for an Aged Ni-Based Superalloy During Two-Stage Hot Forming with Stepped Strain Rates[J]. Materials & Design, 2018,154:51-62.
[1] Liang Ye, Li Xiaoyuan, Xu Hang, Hu Yiran. CLOpin: A Cross-lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning [J]. 数据分析与知识发现, 0, (): 1-.
[2] Yu Fengchang, Lu Wei. A Data Set Construction Method for The Location Annotation of Academic Literature Figure and Table [J]. 数据分析与知识发现, 0, (): 1-.
[3] Zeng Zhen, Li Gang, Mao Jin, Chen Jinghao. Research on Regional Public Security Data Governance and Process Domain Ontology [J]. 数据分析与知识发现, 0, (): 1-.
[4] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[5] Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[6] Wei Wei,Guo Chonghui,Xing Xiaoyu. Annotating Knowledge Points & Recommending Questions Based on Semantic Association Rules[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[7] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[8] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[9] Hu Yongjun,Wei Tingting,Dou Zixin,Huang Yunyin,Liang Ruicheng,Chang Huiyou. Tech-Development Path of Knife-Scissor Industry in Guangdong with TRIZ Analysis of Patents[J]. 数据分析与知识发现, 2020, 4(2/3): 101-109.
[10] Tang Lin,Guo Chonghui,Chen Jingfeng. Review of Chinese Word Segmentation Studies[J]. 数据分析与知识发现, 2020, 4(2/3): 1-17.
[11] Haixia Sun,Panpan Deng,Jiao Li,Liu Shen,Qing Qian. Automatic Concept Update Strategy Towards Heterogeneous Terminology Integration[J]. 数据分析与知识发现, 2020, 4(1): 121-130.
[12] Jie Ma,Yan Ge,Hongyu Pu. Survey of Attribute Reduction Methods[J]. 数据分析与知识发现, 2020, 4(1): 40-50.
[13] Hong Pan,Li Tang. Qualitative Data Analysis in Chinese Social Science Studies——The Case of Nvivo[J]. 数据分析与知识发现, 2020, 4(1): 51-62.
[14] Zhixiong Zhang,Huan Liu,Liangping Ding,Pengmin Wu,Gaihong Yu. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[15] Hui Nie. Modeling Users with Word Vector and Term-Graph Algorithm[J]. 数据分析与知识发现, 2019, 3(12): 30-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn