Download presentation
Presentation is loading. Please wait.
Published byErnest Stevenson Modified over 7 years ago
1
Predicting the performance of US Airline carriers
Applied Business Analytics & Business Intelligence (BIT 5534) Submitted by: Group 4 Akash Yadav & Suresh Malhotra
2
Agenda Problem Definition Data Preparation Data Exploration
1 Problem Definition 2 Data Preparation 3 Data Exploration 4 Modeling & Analysis 5 Modeling Selection & Comparison 6 Recommendation 7 Customized Models & Future work Virginia Tech
3
Model Comparison & Selection
Business Problem Problem: Insufficient information availability on Flight carriers & their flights. Traveler’s concern: Which Flight carrier is better & what are the chances of a flight delay. Our Goal: Predict flight carriers performance based on delay time in future. Determine the main causes of flight delay & suggest improvements. What’s in it for customers or traveler Able to make flight reservations for time-crunch business meetings Ease of choosing inter-connected flights and airports ensuring better services Educate themselves to make better decision while making a flight reservation. Note: Data Source: ( Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
4
Data Definition & Preparation
Remove Missing Value Cases Scrub-off Outliers Training / Validation Data Exploration(next step) Multivariate Analytical Models Variable Dictionary Attribute Data Type Description year Nominal Time horizon of data. month Period of Year. carrier Airline company code. carrier Name Name of Airline carrier. airport Airport code. airport Name Name of Airport arr-flights Continuous No. of flights arriving. carrier_ct Flights delayed due to airline’s own issue weather_ct weather issues. nas_ct National Aviation system issues/problems. security_ct security issues. late_aircraft_ct previous flight delay. arr_delay Delay time (Target variable) Missing values ? Independent Variables Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
5
Model Comparison & Selection
Data Exploration 1) Carrier_ct, nas_ct & later_aircraft_ct: Better predictors of flight delay. (Red Boxes) Reason: Small variation from the mean (red trend lines) & more compact variation of data between arr_delay and above variables. 2) weather_ct & security_ct: Least contributors in prediction objective. Reason: Large variation from the mean (red trend lines) & more scattered pattern of data between arr_delay and above variables. 3) Small or No Redundancy: Reason: Since the plots among independent variables are more scattered or widespread in space which indicates weak correlation. (This is a good indication as it validates independency of variables). Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
6
Model Comparison & Selection
Models & Analysis Linear Regression Used a linear combination of 5 independent variables for predicting the target. Plot (below) shows predicted values (using regression) of avg. flight delay against actual values of delay. PCA Analysis Created new variables from given variables. 2 new variables sufficient for analysis as they cover 79% of variation in data. Cluster Analysis Better for data mining. Clusters data with similar attributes/characteristics. Hierarchical: estimated no. of clusters = 20 K-means: Optimal Number of clusters = 21 Decision Tree Creates groups of data with similar attributes. Data split is based on some threshold value of variables. No. of Split = 35 Neural Network Created a hidden layer of new variables which receive data from current variables. Utilizes a regression like approach and gives prediction values as output. Below plot: Output vs Actual values of delay. Diff Groups Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
7
Model & Analysis (continued…)
R2 : Higher value depicts that model is capable of accounting or explaining most of the variation in the data which is important. (Refer plots below) RMSE: Root mean square error Low RMSE value means predicted values are close to the actual values or the deviation from actual value is small. Note 1: All the models perform quite well as evident from significantly high R2 and low RMSE values. Note 1: Consistent performance on both Training & Validation set which ensures that data preparation was good and models are acceptable. Linear Regression & Neural Network Model - Best Since already R2 is high, so 5-6% increase matters a lot. Further, RMSE is also low in comparison for these two models. Modeling Technique Modeling Highlight Training Validation RSquare RMSE Linear Regression 5 Independent Variables 95.5 1851 95.6 1805 Principle Component Analysis 2 Principle Components 90.5 2688 90.8 2617 Cluster Analysis 21 Clusters 87.8 3053 87.5 3042 Decision Trees 35 Splits 90.9 2639 90.3 2694 Neural Networks 1 Hidden Layer, 6 Nodes, Learning Rate = 0.1, Transform Covariates 96.2 1703 96.02 1728 PCA Neural Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
8
Model Comparison & Selection
Neural Network Model - Selected Regression Neural - neural - linear regression Why Neural Network !!! In comparison to linear regression model, Neural Network Model is slightly better on R2 and RMSE values. Profiler depicts that variables variation profiles match much closely with the desired profiles in case of Neural Model. Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
9
Recommendation & Key points
Good to know: - Although we have focused more on Delay time, the model has the potential to predict flight cancellations and other performance metrics. How to ensure an efficient model - Its critical to explore and prepare the data efficiently. - Treating missing values and removing outliers is of utmost importance for model stability. - Model should be created on training data and then tested on Validation data set. (Modify model if needed) What else could be done or added to scope - Additional predictor variables can be included like distance between arrival & departure airports, air traffic etc. - Use data from different source and different time horizon. - Focus on specific airports or air carriers. - Using Forecast models for prediction. Define Problem Data Preparation Data Exploration Modeling Model Comparison & Selection Recommendation Virginia Tech
10
Customized Models and Future Work
Air Carrier R2 RMSE Logworth Comment & LogWorth AA American Airlines 97.38 1554 A - 125, B - 152, C - 181 All significant. Security - 4 DL Delta Airways 95.79 1568 A - 555, B - 334, C - 135 All significant. Security - 3.4 WN Southwest Airlines 95.68 1485 All significant. Security - 4.2 AS United Airlines 98.07 2554 A - 737, B - 409, C - 114 Security_ct insignificant. B6 JetBlue 96.82 1784 A - 318, B - 165, C - 71 All significant. Security - 20 Busiest Airport ATL Atlanta, GA 96.72 1378 A - 190, B - 179, C - 55 LAX Los Angeles, CA 97.19 2002 A - 344, B - 246, D - 123 All significant. Security - 31 ORD Chicago, IL 96.07 2819 A - 177, B - 145, D – 31 Weather and security - Low DFW Dallas, TX 97.12 776 A - 95, B - 480, D - 126 JFK New York, NY 96.10 2287 A - 156, B - 125, D – 55 All significant. Security - 4.9 Best Airport (US) SLC Salt Lake City 97.6 1157 A - 201, B - 292, D - 135 All significant. Security - 17 DCA Washington 95.4 1179 A - 118, B - 235, C - 117 All significant. Security – 2.2 SEA Seattle-Tacoma 97.5 1026 A - 228, B - 290, D - 128 All significant. Security - 1.4 PDX Portland 95.3 810 A - 157, B - 283, D - 123 All significant. Security – 4.5 MSP Minneapolis 97.1 1297 A - 180, B - 221, D - 64 All significant. Security – 4.7 Delta & Airports 96.28 1120 A - 18, B - 23, D - 10 Security insignificant 88.29 1065 A - 25, B - 3.7, D - 4.4 Security - Zeroed 94.5 1868 A - 7.5, B - 17, D - 8.4 All significant. 88.28 1140 A - 11, B - 5.6, D - 10 98.4 2183 B - 4.8, C - 8.1, D - 7.4 Nas_ct Insignificant. A = nas_ct; B = late_aircraft_ct; C = weather_ct; D = carrier_ct The Dashboard Story (Future Scope of Work) Individual models to be prepared for each flight carrier, airport and a combination of flight carrier + airport for monthly predictions. Prediction from the model will be used to provide information to the travelers. Description Models will inform the current performance of flight carriers and airports based on historical data. Interactive Visual Information Delivery is the goal !! Virginia Tech
11
Thank You Virginia Tech
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.