EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Characteristics in Flight Data Estimation with Logistic Regression and Support Vector Machines ICRAT.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
Chapter 4: Linear Models for Classification
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
The loss function, the normal equation,
Pattern Recognition and Machine Learning
Support Vector Machines (and Kernel Methods in general)
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Ensemble Learning (2), Tree and Forest
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
EUROCONTROL EXPERIMENTAL CENTRE1 AIRSPACE CONGESTION: Pre-Tactical Measures and Operational Events ICRAT Conference, Zilina November 22, 2004 Nabil BELOUARDY.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Ann Melissa Campbell, Jan Fabian Ehmke 2013 Service Management and Science Forum Decision.
EUROCONTROL EXPERIMENTAL CENTRE1 1st ICRAT Zilina 04/11/22-24 Frédéric Ferchaud EEC INO/LaBRI Vu Duong EEC INO Cyril Gavoille, Mohamed Mosbah LaBRI, University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Decision Making.
Benk Erika Kelemen Zsolt
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear Models for Classification
V Bandi and R Lahdelma 1 Forecasting. V Bandi and R Lahdelma 2 Forecasting? Decision-making deals with future problems -Thus data describing future must.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
EUROCONTROL EXPERIMENTAL CENTRE1 / 29/06/2016  Raphaël CHRISTIEN  Network Capacity & Demand Management  5 th USA/Europe ATM 2003 R&D seminar  23 rd.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PREDICT 422: Practical Machine Learning
Why Stochastic Hydrology ?
Deep Feedforward Networks
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Work Package 3 Data Management
Machine Learning Basics
Overview of Supervised Learning
Professor S K Dubey,VSM Amity School of Business
Mathematical Foundations of BME Reza Shadmehr
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Characteristics in Flight Data Estimation with Logistic Regression and Support Vector Machines ICRAT 2004 Claus Gwiggner, LIX, Ecole Polytechnique Palaiseau Gert Lanckriet, EECS, University of California, Berkeley Characteristics in Flight Data

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Flow Management and Planning Differences  Time slots are distributed among aircraft to avoid congestion In reality, delays, re-reroutings, etc. lead to missed time slots Not the same number of aircraft than planned arrive in sectors: safety, lost capacity Planning Differences

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Related Work  Factors/Causes [ATFM Study, PRR]  Slot adherence, flight plan quality, in-flight change of route,....  Simulations [Ky, Stortz]  Random noise on departure times  Reactionary Delay [Toulouse Study]  microscopic model of departure times

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Unknown  Real situation at sector entries  interplay of factors  compensations of delays ...

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Program  Problem Formulation  Simple Characteristics  Binary Classification  Conclusion  Future Work

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Planning Differences Planning Differences = Regulated Demand – Real Demand

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH General Problem Formulation  Find 'regularities' of planning differences, useful to improve the current planning procedure  Why? Safety, suboptimal used capacity  How?  MACRO approach: relations between flows, not single deviations from flight plans  Daily basis, not extreme situations  How? Data analysis  141 days of week-day data

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Today's Question  Are planning differences of different sectors the 'same'?  if yes: any model can be greatly simplified  if no: what are the differences?  Difficulty  24 dimensions: one variable for each hour

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Comparison of Planning Differences  No visible regularities in both sectors...

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Mean and Standard Deviation...but similar mean and standard deviation over the time

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH  H0: same underlying distribution...  reject on 1 % level  assumes that statistical properties do not vary over time .... but what are the characteristics?  e.g. 'if high peaks at noon => sector 1'?  Find a rule that tells whether a sequence of values belongs to sector 1  Classification problem Hypothesis Tests

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH (Binary) Classification  Probabilistic  'what is the probability that a new item belongs to sector 1?'  Logistic Regression  Geometric  'on which side of the boundary lies the new item?'  Support Vector Machines

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Some Problems in Classification  Probabilistic  which underlying distribution?  Geometric  overlapping classes?  which form of boundaries ? Fig: linear decision boundary and overlapping classes

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Logistic Regression ● Assumes (normally) a linear relation between probability p and item X  p = prob(C=1 | X) = 1/1+e^bX

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Support Vector Machines  Non-linear boundaries  transformation of the initial instances into a higher dimensional space  Solve the problem of overlapping classes  relaxation of the boundary constraint

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Comparison  Linear Logistic Regression vs SVMs  linear vs non-linear  simple vs mathematically sophisticated  traditional vs state-of-the-art  probabilistic vs geometric  Common points [Hastie et. al 2003], [Friedman 2003]  SVM estimator of class probabilities  logistic regression induces linear boundaries

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Experiments on...  Data from 4 sectors in Upper Berlin airspace  Raw Data (random permutations)  Data where number of instances in both classes are balanced  In total 8 experiments conducted

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Model Selection  Report Estimated Prediction Error (EPE)  Model Selection:  Cross-Validation [Stone 1974]  Wilcoxon-Mann-Whitney Test

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Parameters of SVMs  Kernel functions  Linear, Gauss, Poly, Linear CN, Gauss CN, Poly CN  Kernel parameters  Cost Function  1 Norm, 2 Norm  In total over 800 combinations possible  best one estimated by cross validation

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Results

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Summary  characteristics in high dimensional data  comparison of a very simple and a very complicated classification method

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Conclusions  There are systematic differences between different sectors  SVMs do not promise major improvement  no more than 4% better than logistic regression  Linear Prediction is possible  Expected prediction errors around 15 %

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Future Work  (black box) prediction not satisfactory  Better understanding of the underlying processes  reasons for the differences  model of the probability distribution of planned traffic and realized traffic

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Questions ? Thanks for your attention!

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Results  Is Week End?

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Known: Causes for Planning Differences Departure Slot adherence Inconsistent profile CASA implementation In flight change of route Regulations too late Weekday, Season Weather Source: Independent Study for the Improvement of ATFM, Final Report, 2000 Slot tolerance window Missing flight plans Incorrect flight plan information Priorities: Very High High Medium Unknown time # over-deliveries

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Little known: Dynamics of Planning Differences X: time Y: Number of planning differences Sector n... 'Error' Propagation Sector 2Sector 1 Related Work: Simulation studies, reactionary delay studies

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Summary Motivation  Are planning differences unpredictable?  Or are there hidden 'regularities'?

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Possible Research Questions  Propagation over the network  Dependence on traffic density, sector complexity,... ...  Characteristics  Comparison of different sectors

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Notation  A sector is represented as a vector of 24 variables, one for each hour  An instance is a value for this vector  An instance belongs to class 1 or -1; dependent on the sector from which it was drawn

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Binary Classification ● Given:  Instances from sectors 1 and -1 ● Question:  a rule to decide for a new instance to which sector it might belong ● Example:  if 'high peaks at noon' then class 1  Decision trees

EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Geometric and Probabilistic Approaches example: Instances are 2 dimensional  Geometric  Instances are points in Euclidean space  Rules are class boundaries  Problem: overlapping classes  Probabilistic  Classes have underlying probability distribution  Rules are class-probabilities  Problem: which distribution?