Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 192-199    DOI: 10.11925/infotech.2096-3467.2019.0522
Current Issue | Archive | Adv Search |
Forecasting Airfare Based on Route Characteristics
Zhong Lizhen1,Ma Minshu1(),Zhou Changfeng2
1School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
2Passenger Transport Department, China State Railway Group Co., Ltd., Beijing 100033, China
Download: PDF (783 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper predicts airfare on routes with fewer daily average flights and incomplete or even no historical data, aiming to help passengers choose better ticketing time.[Methods] We used historical data of multiple routes to predict airfares of the targets. Based on previous research and data, we extracted characteristic variables related to airfare fluctuations. We also classified these variables to establish the airfare forecasting model.[Results] When the model contains variables like the distance and the socio-economic characteristics of the route, the prediction error was significantly reduced.[Limitations] We did not include transit flights and local residents’ income data in our study. More research is needed to evaluate the performance of predicting algorithms.[Conclusions] The characteristics related to the year, the distance between the two places and the socio-economic factors of the routes are the main reasons for airfare fluctuations.

Key wordsAirfare Prediction      Support Vector Regression      Ticket Purchase Time Decision      Route Characteristic     
Received: 16 May 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Ma Minshu     E-mail: mshma@bjtu.edu.cn

Cite this article:

Zhong Lizhen,Ma Minshu,Zhou Changfeng. Forecasting Airfare Based on Route Characteristics. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 192-199.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0522     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/192

分类 特征变量 说明
航班 发到时段 DEPART_CLOCK 出发钟点
ARRIVE_CLOCK 到达钟点
出发
日期
与年度
相关
WEEK_OF_THE_YEAR 该年第几周
YEARS_DIFF 年份之差
与周
相关
DAY_OF_THE_WEEK 星期几
IS_MONDAY 是否周一
IS_FRIDAY 是否周五
IS_WEEKEND 是否周末
航线 社会经济特征 DEP_CITY_LEVEL 出发城市的城市等级
ARR_CITY_LEVEL 到达城市的城市等级
DEP_GDP 出发城市的人均GDP
ARR_GDP 到达城市的人均GDP
DEP_POPULATION 出发城市的人口数
ARR_POPULATION 到达城市的人口数
空间距离 DISTANCE 航程
高铁服务水平 HSR_STD_PRICE 高铁单位里程票价
HSR_NUM 高铁日均开行班次
预测周期 DAYS_DIFF 距离起飞天数
Feature Alternative Set
步骤 特征变量 备注
自变量 控制变量
1 出发日期 空间距离
高铁服务水平
社会经济特征
发到时段
预测周期
将表征出发日期的两类变量进行组合,构造16个模型M1-M16,如表3所示。
2 空间距离 出发日期
高铁服务水平
社会经济特征
发到时段
预测周期
去掉空间距离这一变量,构造模型M17。
3 高铁服务水平 出发日期
空间距离
社会经济特征
发到时段
预测周期
将表征高铁服务水平的两个变量进行组合,构造3个模型M18-M20,如表4所示。
4 社会经济特征 出发日期
空间距离
高铁服务水平
发到时段
预测周期
将表征社会经济特征的三组变量进行组合,构造7个模型M21-M27,如表5所示。
The Process of Model Building
特征

模型
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16
WEEK_OF_THE_YEAR - - - - - - - -
YEARS_DIFF - - - - - - - -
DAY_OF_THE_WEEK - - - - - - - - - - -
IS_MONDAY - - - - - - - - - - - -
IS_FRIDAY - - - - - - - - - - - -
IS_WEEKEND - - - - - - - -
The Description of Depart Date
特征

模型
M18 M19 M20
HSR_STD_PRICE - -
HSR_NUM - -
The Description of High-speed Rail Service Level
特征

模型
M21 M22 M23 M24 M25 M26 M27
DEP_CITY_LEVEL - - - -
ARR_CITY_LEVEL - - - -
DEP_GDP - - - -
ARR_GDP - - - -
DEP_POPULATION - - - -
ARR_POPULATION - - - -
The Description of Socioeconomic Feature
参数 取值范围
C [0.001,0.01,0.1,1,10,100]
ε [0.001,0.01,0.1,1,10,100]
γ [2-10,2-9,,23]
The Value Range of Parameters

与年度相关


与周相关
- DAY_OF_THE_WEEK IS_WEEKEND IS_MONDAY IS_FRIDAY IS_WEEKEND
误差(模型) 误差(模型) 误差(模型) 误差(模型)
- 28.90%(M1) 19.87%(M5) 29.72%(M9) 26.51%(M13)
WEEK_OF_THE_YEAR 19.14%(M2) 19.60%(M6) 19.65%(M10) 19.70%(M14)
YEARS_DIFF 21.64%(M3) 13.59%(M7) 22.69%(M11) 14.36%(M15)
WEEK_OF_THE_YEAR
YEARS_DIFF
13.39%(M4) 13.89%(M8) 14.24%(M12) 20.78%(M16)
Errors with Different Descriptions of Departure Date
误差(模型)
有DISTANCE 13.39%(M4)
无DISTANCE 16.19%(M17)
Errors Considering Whether Distance is Included in the Models
有HSR_NUM 无HSR_NUM
误差(模型) 误差(模型)
有HSR_STD_PRICE 13.39%(M4) 13.37%(M20)
无HSR_STD_PRICE 13.37%(M19) 13.33%(M18)
Errors with Different Descriptions of High-speed Rail Service Level
社会经济特征 误差 模型 P值
M18 M21 M22 M23 M24 M25 M26
ALL 13.33% M18 - - - - - - -
NONE 24.12% M21 0.000* - - - - - -
CITY_LEVEL+GDP 13.43% M22 0.388 0.000* - - - - -
CITY_LEVEL+PLN 13.15% M23 0.582 0.000* 0.076 - - - -
GDP+PLN 13.26% M24 0.315 0.000* 0.204 0.709 - - -
CITY_LEVEL 13.51% M25 0.395 0.000* 0.515 0.021* 0.163 - -
GDP 20.36% M26 0.000* 0.035* 0.000* 0.000* 0.000* 0.000* -
PLN 17.68% M27 0.001* 0.000* 0.001* 0.000* 0.000* 0.000* 0.045*
Errors and P-values with Different Descriptions of Socioeconomic Feature
航线 混合航线 单条航线 误差之差
南宁-郑州 18.78% 33.74% -14.96%
重庆-福州 5.48% 13.45% -7.97%
南宁-武汉 6.30% 9.60% -3.30%
南昌-北京 20.76% 19.58% 1.18%
杭州-长沙 4.86% 3.63% 1.23%
郑州-深圳 7.90% 2.34% 5.56%
广州-南宁 15.79% 1.54% 14.25%
Prediction Errors Between Mixed Routes and Single Routes
Distribution of Airfares per Mileage
航线 测试集中具有不同年份的历史
同期数据的占比
测试集中具有近期数据的占比
2年 1年 没有
南宁-郑州 8% 23% 12% 57%
重庆-福州 9% 9% 9% 73%
南宁-武汉 42% 0% 14% 44%
南昌-北京 16% 8% 7% 69%
杭州-长沙 - - - 100%
郑州-深圳 - - - 100%
广州-南宁 - - - 100%
Data Set of Every Route
[1] Etzioni O, Tuchinda R, Knoblock C , et al. To Buy or Not to Buy: Mining Airfare Data to Minimize Ticket Purchase Price [C]// Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2003: 119-128.
[2] Wohlfarth T, Clémençon S, Roueff F , et al. A Data-Mining Approach to Travel Price Forecasting [C]// Proceedings of the 10th International Conference on Machine Learning & Applications & Workshops. 2011: 84-89.
[3] 顾兆军, 王双, 赵亿 . 基于时间序列的机票价格预测模型[J]. 中国民航大学学报, 2013,31(2):80-84.
[3] ( Gu Zhaojun, Wang Shuang, Zhao Yi . Flight Ticket Fare Prediction Model Based on Time-Serial[J]. Journal of Civil Aviation University of China, 2013,31(2):80-84.)
[4] Tziridis K, Kalampokas T, Papakostas G A , et al. Airfare Prices Prediction Using Machine Learning Techniques [C]// Proceedings of the 25th European Signal Processing Conference. 2017: 1036-1039.
[5] Chen Y, Cao J, Feng S , et al. An Ensemble Learning Based Approach for Building Airfare Forecast Service [C]// Proceedings of the 2015 IEEE International Conference on Big Data. 2015: 964-969.
[6] Xu Y, Cao J . OTPS: A Decision Support Service for Optimal Airfare Ticket Purchase [C]// Proceedings of the 2017 IEEE International Conference on Big Data. 2017: 1363-1368.
[7] 王振, 张志敏, 禚保玲 . 基于多源数据的青岛市中心城区城市特征研究[C]//2018中国城市规划年会. 2018.
[7] ( Wang Zhen, Zhang Zhimin, Zhuo Baoling . Research on the Urban Characteristics of Qingdao City Center Based on Multi-source Data[C]//Proceedings of 2018 China Urban Planning Annual Meeting. 2018.
[8] 卢晓涵, 罗吉, 琚瑞 , 等. 基于区域竞争力的国家中心城市特征研究[C]//2018中国城市规划年会. 2018.
[8] ( Lu Xiaohan, Luo Ji, Ju Rui , et al. Research on the Characteristics of National Central Cities Based on Regional Competitiveness[C]//Proceedings of 2018 China Urban Planning Annual Meeting. 2018.
[9] He D, Lin Y C, Chen J , et al. Microstructural Evolution and Support Vector Regression Model for an Aged Ni-Based Superalloy During Two-Stage Hot Forming with Stepped Strain Rates[J]. Materials & Design, 2018,154:51-62.
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] Sun Yu, Qiu Jiangnan. Research on Influence of Opinion Leaders Based on Network Analysis and Text Mining [J]. 数据分析与知识发现, 0, (): 1-.
[4] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn