Plane Spotting 3 Or Opening Emergency Exit Doors at Inappropriate Times since 1985. Predicting Air Delays at SeaTac airport. A CS 773C Project by Sam Delaney.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Critical Thinking: Chapter 10
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
Ensemble Learning: An Introduction
Three kinds of learning
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
“ How To “ A button such as this will occur on the first page of an activity stream to show “How To” complete the activity. Click here now. A button such.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Airline On Time Performance Systems Design Project by Matthias Chan.
Plane Spotting 2 Or how to get more free peanuts. Predicting Air Delays at SeaTac airport. A CS 773C Project by Sam Delaney.
The IMPACT of the DOT REGULATIONS as to TARMAC DELAYS on AIR CARRIERS AND AIRPORTS Katherine Staton Jackson Walker L.L.P.
Machine Learning CS 165B Spring 2012
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Do Now Review Quantitative and Qualitative observations.
Introduction Welcome to CHEM Brendan Stamper
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Weka Project assignment 3
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
WOW World of Walkover-weight “My God, it’s full of cows!” (David Bowman, 2001)
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Plane Spotting Or how to put yourself on no-fly lists. Predicting Air Delays at SeaTac airport. A CS 773C Project by Sam Delaney.
Gary M. Weiss Alexander Battistin Fordham University.
DO NOW 1. What is the first step of the scientific method? 2. What is a hypothesis? What words need to be included in a hypothesis? 3. How is data recorded.
Welcome to U.S. History II! Please find your name on the desks and complete the student information sheet. When you are finished, raise your hand to get.
Effect of Neighboring Flight Patterns on a Particular Flight Presented by Venugopal Rajagopal CIS 595 Dr. Slobodan Vucetic.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Using Connectors.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 19 Confidence intervals for proportions
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Ensemble Methods in Machine Learning
Sight Words.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Schedule Accuracy Score My Team. Schedule Accuracy Schedule Accuracy was developed to Increase the usefulness and reliability of the Dispatch Portal Reduce.
1 Chapter 15 Probability Rules. 2 Recall That… For any random phenomenon, each trial generates an outcome. An event is any set or collection of outcomes.
ORDER OF OPERATIONS. Did you get one of these numbers as an answer? 18? 12? SOLVE THIS PROBLEM.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Click to edit Master title style BUDAPEST Airport Wizz Air – Q2.
Introduction to Hypothesis Testing Chapter 10. What is a Hypothesis? A tentative explanation for an observation, phenomenon, or scientific problem that.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Let’s try it one more time!. Allan Technique Programming Recursively 1.Decide that the problem needs a recursive solution. 2.Decide specifically what.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
COLLEGE ALGEBRA 6.8. Zero Product Property Day One.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Confidence Intervals for Proportions
Air Malta 2010, Q1 BUDAPEST Airport
Egyptair 2010, Q1 BUDAPEST Airport
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Selling At A Profit The asking price you set for your home significantly affects whether you will profit in the sale, how much you will profit and how.
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Ensemble learning.
Lecture 06: Bagging and Boosting
Confidence Intervals for Proportions
Speech recognition Koen en Hraban
Session 3: Thinking Logs & Website Training
Confidence Intervals for Proportions
CS639: Data Management for Data Science
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Presentation transcript:

Plane Spotting 3 Or Opening Emergency Exit Doors at Inappropriate Times since Predicting Air Delays at SeaTac airport. A CS 773C Project by Sam Delaney

The Problem! A Review!

The Problem Is it possible to predict whether a flight will be delayed at time of ticket purchase out of Seattle Tacoma International Airport?

Recap of Previous Work.  Cheating makes things easy!  It is easy to predict within 6 hours of flight  Many attributes help in delay prediction but are not available to the ticket buyer.

Data Preparation  One year’s worth of data from SeaTac airport  Over 300,000 flights  To much for weka…  First randomize set  Then grab 10% sample

Features Used  Month  Day  Day of Week (Monday)  Airline  Destination Airport  Origin flight (If Applicable)

Results! Why we are all here.

First the binning!  Bin intervals for delay  1Hour intervals? Too easy, 95.28% accuracy on J48  Same as ZeroR  1 leaf tree  10 min intervals? 52.82% accuracy on J48  3,412 leaves  Okay, let’s see what we can do

Zero R baseline  45.65% Accurate  Rule was to predict roughly no delay and not early (-5 to 5 min delay time)

J 48 Results  52.82% Accuaracy  Tree Breakdown  Carrier  Destination  Month  Day of Week

Carrier vs Delay Time

Random Forest  10 Trees  6 Random Attributes  Unlimited depth  48.17% Accuracy  Confusion matrix was not terrible, most misclassifications were only off by ~10mins.

JRIP Rules  51.66% accuracy  Only 12 rules!

Decision Table  54.48% Accuracy!  124 rules  Interesting rule set  Rule was  Carrier  Where you going?  BOOM! Delay prediction.

Boost Me!  Had to reduce number of instances  Stupid weka  64.49% Accuracy!  Moving on up  J48 Boost  10 iterations  3 Subcommittees  1,199 leaves in tree

Naïve Biased I’ll let the picture speak for itself

For Funnsies  Kstar – 66.08%  Adaboost  Decision Stump  69.9%  Trying other boosts didn’t work as Weka gave up unless I had only 1000 instances or less and accuracy went throw the floor 

Conclusion  Weka improvement for larger data sets would be nice  Results were pleasantly surprising!  ~70% accuracy with 10 min intervals (roughly 50 bins)  Though I don’t think they are ready for iPhone apps just yet

Future Work  Does this hold true for other airports?  How accurate of a weather model can be strung up 6 months out?  How would this affect ticket purchase if predictability could be greater than 90%?

Questions?