Case Study: FAA Air Traffic Data

Slides:



Advertisements
Similar presentations
Mixed Designs: Between and Within Psy 420 Ainsworth.
Advertisements

Everything you ever wanted to know about using URS Cenk Erdil, HMIS Manager Caitlin Madevu-Matson, SI Specialist Strategic Information Unit 22 May 2014.
Introduction to Data Mining with XLMiner
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
T-tests and ANOVA Statistical analysis of group differences.
Chapter 1: Introduction
Chapter 1: Introduction to Predictive Modeling 1.1 Applications 1.2 Generalization 1.3 JMP Predictive Modeling Platforms.
Residuals and Residual Plots Most likely a linear regression will not fit the data perfectly. The residual (e) for each data point is the ________________________.
Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
A lesson approach © 2011 The McGraw-Hill Companies, Inc. All rights reserved. a lesson approach Microsoft® Excel 2010 © 2011 The McGraw-Hill Companies,
Unit 42 : Spreadsheet Modelling
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Background and purpose
Training Call Centre Reports START
Presented at CARSP Conference
I. ANOVA revisited & reviewed
ITEA T&E of Systems of Systems Conference
Analyzing Federal Reserve Board Interest Rate Yield Curve Data With JMP® Matthew Flynn, PhD, Data Science - Head of Machine Learning.
Robert Anderson SAS JMP
JMP Discovery Summit 2016 Janet Alvarado
PROJECT ON MS-EXCEL.
Parametric Sensitivity Analysis
Statistical Data Analysis - Lecture /04/03
How to Select the Right Chart for Your Data
distance prediction observed y value predicted value zero
Date Dimension: One Script for All
Belinda Boateng, Kara Johnson, Hassan Riaz
Mary Corcoran January 9th, 2014
PCB 3043L - General Ecology Data Analysis.
(Residuals and
Introduction The fit of a linear function to a set of data can be assessed by analyzing residuals. A residual is the vertical distance between an observed.
Homework: Residuals Worksheet
Advanced Analytics Using Enterprise Miner
Edexcel: Large Data Set Activities
Regression model Y represents a value of the response variable.
AP Exam Review Chapters 1-10
CPSC 531: System Modeling and Simulation
Data Presentation Carey Williamson Department of Computer Science
how to do a data analysis
Just Enough to be Dangerous: Basic Statistics for the Non-Statistician
Basic Training for Statistical Process Control
Building Worksheet Charts
Basic Training for Statistical Process Control
Workshop: JMP & R for Analytics Instruction
CH2. Cleaning and Transforming Data
Lecture 14: Anomaly Detection
Analytics: Its More than Just Modeling
Sampling Distributions
INVESTIGATING CLIMATE CHANGE USING OBSERVED TEMPERATURE DATA
UIG Task Force Progress Report
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
JMP 11 added new features and improvements to CCB and MSA.
Title of your experimental design
Carey Williamson Department of Computer Science University of Calgary
Unscheduled Care Analysis
Introduction The fit of a linear function to a set of data can be assessed by analyzing residuals. A residual is the vertical distance between an observed.
Chapter 13 Multiple Regression
SAS/Graph to help data Dose/Concentration consistency review
Unscheduled Care Analysis
Presentation transcript:

Case Study: FAA Air Traffic Data JMP Explorers Series Seminar February 6, 2013 Tom Donnelly, PhD Systems Engineer & Co-insurrectionist JMP Federal Government Team

Outline Discuss the Problem Key Steps Fit and compare models Summary Get the data Augment original data Shape the data for visualization and modeling Split data into Train, Validate/(Tune) and Test subsets Fit and compare models Partition vs. Neural vs. 2nd-Order Polynomial With and Without Federal Holiday factor in the model Summary GOAL… “Data > Information > Knowledge > Understanding > Action”

Problem and Situation FAA already using a classification and regression tree tool to model air traffic to better understand staffing needs While comparing results to this tool during evaluation, JMP’s answers were sometimes different. “Why?” Initial FAA analyses done separately for each Air Route Traffic Control Center (ARTCC Facility) and for one year only – resulting in 23 files of 365 rows of data Initial FAA focus was on using Weekday and Month as factors with which to model Total Traffic By using seven years of data we can also learn about the effect and trends of Fiscal Year Using 7 years of data for 21 ARTCCs (53,697 data rows) makes it easier to study the effect of Federal Holidays

Download Air Traffic Data From FAA Site

Download ARTCC Data From FAA Site

Jacksonville (ZJX) vs. Albuquerque (ZAB) ? ZAB

Include Modeling of 10 Federal Holidays Symbol 1 K W M 4 L C V T X

Augment Original Data Use Tables Menu to: Join Holiday information Join Latitude/Longitude Information Stack Air Traffic Type Use Formula to extract and/or expand information from original m/d/y formatted Date column Day of Week, Month, Year Fiscal Year – use conditional & comparison formulas Use Value Ordering to make better sense of ordinal data Weekday, Month, Fiscal Year and Federal Holidays

Make the Visualization Dynamic Use Data Filters and Distributions to select on the fly what to view Use Column Switcher to compare model predictions in Graph Builder

Randomly Assign Data into 3 Groups – Train, Validate (Tune), and Test (60%/20%/20%)

Actual vs. Predicted Total Traffic for ALL Data in Test Subset

Actual vs. Predicted Total Traffic for NO Fed Holidays in Test Subset

Actual vs. Predicted Total Traffic for ALL Fed Holidays in Test Subset

Actual vs. Predicted Total Traffic for Thanksgiving Holiday in Test Subset

Actual vs. Predicted Total Traffic for Christmas Holiday in Test Subset

Actual vs. Predicted Total for October 2012 w & w/o “Sandy” Data for 29th, 30th & 31st

Actual vs. Predicted Total Traffic - Partition - for 10 Federal Holidays by Holdback Subset

Actual vs. Predicted Total Traffic – Neural - for 10 Federal Holidays by Holdback Subset

Actual vs. Predicted Total Traffic - 2nd Order - for 10 Federal Holidays by Holdback Subset

Fit Models Using Honest Assessment Method i. e Fit Models Using Honest Assessment Method i.e. Train, Validate(Tune) and Test Subsets Use JMP’s Random Indicator to assign subsets In this case used a 60/20/20 split Original work used just one year of data and fit each facility separately. This meant “outliers” especially Federal Holidays were only represented in one subset – making it impossible to take their behavior into account Using seven years of data makes these Federal Holidays randomly fall into all three groups – making it possible to account for their behavior Using seven years of data also makes it possible to model effect of Fiscal Year on Total Traffic.

Potential Enhancements to Analysis Model non-Federal holidays Model the holidays people really observe Model SQRT(Total Traffic) Normalize variance – a usual regression assumption Prevent nonsensical predictions – negative traffic! Model Type of Traffic separately or combined Model impact of major storms

Prediction Profiler and Actual Values at 5 ARTCCs for Thanksgiving Day Test Data (Holdback = 2) for Fiscal Year 2010 ARTCC Total Traffic ZAB 2126 ZAN 683 ZAU 3699 ZBW 2879 ZDC 4265

Prediction Profiler Beside 3-D Response Surface

Bubble Plot

Q & A