PREDICTING Flight Delays

Slides:

Advertisements

Similar presentations

Brief introduction on Logistic Regression

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.

Chapter 8 – Logistic Regression

Determination of Construction Contract Duration for Public Projects in Saudi Arabia By: Ahmed Saleh Al-Sultan, June 1989 Presented by Sameh Elish January.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Intelligible Models for Classification and Regression

Flight Delay Data Analysis CSE 140 A research to find major factors that cause a flight to be late.

Airline On Time Performance Systems Design Project by Matthias Chan.

Washington State Low Income Weatherization Program Evaluation Calendar Year 2011 DRAFT Results Prepared by: Rick Kunkle July 2013.

Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.

Understanding Data Analytics and Data Mining Introduction.

Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.

MIT ICAT ICATMIT M I T I n t e r n a t i o n a l C e n t e r f o r A i r T r a n s p o r t a t i o n Virtual Hubs: A Case Study Michelle Karow

M I T I n t e r n a t i o n a l C e n t e r f o r A i r T r a n s p o r t a t i o n ANALYSIS OF BARRIERS TO THE UTILITY OF GENERAL AVIATION TROY D. DOWNENR.

Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Evaluation of Level of Service at Airport Passenger Terminals: Individual Components and Overall Perspectives Anderson Correia Department of Civil Engineering.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

NOAA Satellite Conference Special Panel on the Importance of NOAA Satellites Tom Fahey, Mgr. Meteorology & Chair A4A Meteorology Work Group April 12, 2013.

1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.

Loan Default Model Saed Sayad 1www.ismartsoft.com.

Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .

Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.

Gary M. Weiss Alexander Battistin Fordham University.

1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.

Effect of Neighboring Flight Patterns on a Particular Flight Presented by Venugopal Rajagopal CIS 595 Dr. Slobodan Vucetic.

Neural Network Implementation of Poker AI

Case Selection and Resampling Lucila Ohno-Machado HST951.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.

SEASONS ASHLEY MIDDLEBROOKS.  Content Area: Science  Grade Level: Kindergarten  Summary: The purpose of this instructional PowerPoint is to help the.

Preliminary Analysis by: Fawn Hornsby 1, Charles Rogers 2, & Sarah Thornton 3 1,3 North Carolina State University 2 University of Texas at El Paso Client:

A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.

Determining How Costs Behave

Predicting the performance of US Airline carriers

The Airline Report January 2017.

An Empirical Comparison of Supervised Learning Algorithms

Group 7 Hospital Readmission Predictive Analytics

FAA Air Traffic Organization (ATO)

Using JMP® to Predict the Adoption of Animals at Austin Animal Center g Hind Manou & Imran Selim MBA Students, Analytics Concentration, Oklahoma State.

Belinda Boateng, Kara Johnson, Hassan Riaz

Prepared by: Mahmoud Rafeek Al-Farra

Modeling and Simulation (An Introduction)

USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES

Nick Onopa, Charles Jones, Kathy Anderson

Bhavya Pilli Dolly tripathy Ryne Andrews Deepika Deewangan

Support Vector Machines (SVM)

Introduction to Data Mining and Classification

Using Data Analytics to Predict Liquor Sales in Iowa State

NBA Draft Prediction BIT 5534 May 2nd 2018

Employee Turnover: Data Analysis and Exploration

Reducing Loan Risk Using Data Analytics

Machine Learning & Data Science

Predicting Government Spending on Professional Services

Cost Estimation Chapter 5

Tabulations and Statistics

Dr. Morgan C. Wang Department of Statistics

Should They Stay or Should They Go

Predicting Pneumonia & MRSA in Hospital Patients

Classification and Prediction

Analytics: Its More than Just Modeling

Machine Learning Interpretability

Volume 66, Issue 1, Pages (July 2004)

Federal Aviation Administration Data Analysis

Identifying Severe Weather Radar Characteristics

Decision trees MARIO REGIN.

CAMCOS Report Day December 9th, 2015 San Jose State University

Cytokine profiles can predict severe CRS

Presentation transcript:

PREDICTING Flight Delays Washington DC area airports BIT 5534 - Applied Business Intelligence and Analytics - Spring 2017 Group 3: Alexandra Robleto, Caitlin Fernandez, Lucas Cameron, Kevin Sherman

Project Summary Establish business need Collect data Understand flight data Prepare data for modeling Create predictive models Measure models ability to predict delays Evaluate findings Make recommendations

data Data source Data content Data understanding Data preparation https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time Data content Calendar Year 2016 All Flights from Washington DC area airports: BWI, IAD, and DCA Data understanding Available variables Relationships Data preparation Missing values Outliers Redundant variables

Preliminary findings – Average delays Washington Reagan airport had more delays on average Certain airlines experience more delays and the airlines are different depending on the airport Highest delays during summer months June and July, followed by December

Preliminary findings – Average delays by time of day Flights departing before noon tend to arrive early Delays tend to get worse after noon

Preliminary findings – total delays by reason Late aircraft was the most common cause of flight delays in 2016, followed by carrier delays Security issues were least likely to cause delays, and weather was not an important cause of delays either

Preliminary findings – canceled flights While weather was not an important cause of delay, it did contribute to most flight cancelations, especially in January 2016. While Southwest Airlines cancelled the most flights, they also had the highest number of flights. On the other hand, Delta Airlines had fewer cancelled flights compared to their number of flights.

Preliminary findings – diverted flights Summer months had highest delays, and also the most diverted flights, regardless of the airport.

Predictive modeling process Training and Validation Logistic Regression Classification Tree Neural Network

Evaluation of predictive models Receiver Operating Characteristic (ROC) curve The closer it gets to the top left corner the better Area Under the Curve (AUC) The closer to one the better LR: Logistic Regression model DT: Decision (or classification) Tree model Neural: Neural Network model

Evaluation of predictive models Fit or Accuracy Rsquare: the higher (closer to one) the better Misclassification Rate: the lower the better Lift curves Model performance as opposed to guessing The higher the better LR: Logistic Regression model DT: Decision (or classification) Tree model Neural: Neural Network model

Conclusion and recommendations Best model based on evaluation techniques Classification Tree How the model and insights address the business need Possible delays identified based on flight booking information Alternative flights presented Ways to improve the model Include more inputs Increase amount of data