Download presentation
Presentation is loading. Please wait.
1
PREDICTING Flight Delays
Washington DC area airports BIT Applied Business Intelligence and Analytics - Spring 2017 Group 3: Alexandra Robleto, Caitlin Fernandez, Lucas Cameron, Kevin Sherman
2
Project Summary Establish business need Collect data
Understand flight data Prepare data for modeling Create predictive models Measure models ability to predict delays Evaluate findings Make recommendations
3
data Data source Data content Data understanding Data preparation
Data content Calendar Year 2016 All Flights from Washington DC area airports: BWI, IAD, and DCA Data understanding Available variables Relationships Data preparation Missing values Outliers Redundant variables
4
Preliminary findings – Average delays
Washington Reagan airport had more delays on average Certain airlines experience more delays and the airlines are different depending on the airport Highest delays during summer months June and July, followed by December
5
Preliminary findings – Average delays by time of day
Flights departing before noon tend to arrive early Delays tend to get worse after noon
6
Preliminary findings – total delays by reason
Late aircraft was the most common cause of flight delays in 2016, followed by carrier delays Security issues were least likely to cause delays, and weather was not an important cause of delays either
7
Preliminary findings – canceled flights
While weather was not an important cause of delay, it did contribute to most flight cancelations, especially in January 2016. While Southwest Airlines cancelled the most flights, they also had the highest number of flights. On the other hand, Delta Airlines had fewer cancelled flights compared to their number of flights.
8
Preliminary findings – diverted flights
Summer months had highest delays, and also the most diverted flights, regardless of the airport.
9
Predictive modeling process
Training and Validation Logistic Regression Classification Tree Neural Network
10
Evaluation of predictive models
Receiver Operating Characteristic (ROC) curve The closer it gets to the top left corner the better Area Under the Curve (AUC) The closer to one the better LR: Logistic Regression model DT: Decision (or classification) Tree model Neural: Neural Network model
11
Evaluation of predictive models
Fit or Accuracy Rsquare: the higher (closer to one) the better Misclassification Rate: the lower the better Lift curves Model performance as opposed to guessing The higher the better LR: Logistic Regression model DT: Decision (or classification) Tree model Neural: Neural Network model
12
Conclusion and recommendations
Best model based on evaluation techniques Classification Tree How the model and insights address the business need Possible delays identified based on flight booking information Alternative flights presented Ways to improve the model Include more inputs Increase amount of data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.