10/31/2012, METU Spatiotemporal Stream Mining using TRACDS Middle East Technical University October 31, 2012 Margaret H Dunham, Michael Hahsler, Yu Su,

Slides:



Advertisements
Similar presentations
Clustering Data Streams Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague.
Advertisements

Random Forest Predrag Radenković 3237/10
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Classification: Alternative Techniques
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Data Mining Techniques Outline
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Spatial and Temporal Data Mining
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Recommender systems Ram Akella November 26 th 2008.
Classification of Remotely Sensed Data General Classification Concepts Unsupervised Classifications.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5 Data mining : A Closer Look.
World Renewable Energy Forum May 15-17, 2012 Dr. James Hall.
Data Mining Techniques
Data Mining Chun-Hung Chou
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA
Chapter 9: Weather Forecasting
Improvements in Deterministic and Probabilistic Tropical Cyclone Wind Predictions: A Joint Hurricane Testbed Project Update Mark DeMaria and Ray Zehr NOAA/NESDIS/ORA,
Chapter 9 Neural Network.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Knowledge Discovery and Data Mining Evgueni Smirnov.
The History and Evolution of Weather Forecasting. Lindsay Whistman 2 nd Per.
11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas Vijay Kumar UMKC Kansas.
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Guidance on Intensity Guidance Kieran Bhatia, David Nolan, Mark DeMaria, Andrea Schumacher IHC Presentation This project is supported by the.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Caribbean Disaster Mitigation Project Caribbean Institute for Meteorology and Hydrology Tropical Cyclones Characteristics and Forecasting Horace H. P.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 Improving Hurricane Intensity.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
11/3/041 ME Extensible Markov Model Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas 75275
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Forecast 2 Linear trend Forecast error Seasonal demand.
1 Creating Situational Awareness with Data Trending and Monitoring Zhenping Li, J.P. Douglas, and Ken. Mitchell Arctic Slope Technical Services.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
CSE 4705 Artificial Intelligence
Machine Learning with Spark MLlib
Data Transformation: Normalization
Data Mining: Concepts and Techniques
School of Computer Science & Engineering
CSE 4705 Artificial Intelligence
ADVANCED TOPICS IN DATA MINING CSE 8331 Spring 2010 Part I
DATA MINING Introductory and Advanced Topics Part II - Clustering
EE368 Soft Computing Genetic Algorithms.
Avoid Overfitting in Classification
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

10/31/2012, METU Spatiotemporal Stream Mining using TRACDS Middle East Technical University October 31, 2012 Margaret H Dunham, Michael Hahsler, Yu Su, Sudheer Chelluboina, and Hadil Shaiba Computer Science and Engineering This work is supported by NSFIIS

10/31/2012, METU Intelligent Data Analysis Lab Team led by Margaret H. Dunham Michael Hahsler Mission At we create novel techniques inspired by knowledge discovery, data mining, machine learning, artificial intelligence and statistical analysis to work with data from various sources. Current Focus  Massive data stream modeling: TRACDS TM  Hurricane intensity prediction  Effective metagenomic classification for the Human Genome Project  Recommender systems: R/Apache Mahout

10/31/2012, METU Outline Spatiotemporal Stream Data TRACDS Hurricane Intensity Prediction PIIH PIIH online

10/31/2012, METU From Sensors to Streams Data captured and sent by a set of sensors is usually referred to as “stream data”. Real-time sequence of encoded signals which contain desired information. It is continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items May be viewed as arriving in discrete time intervals. Stream data is infinite - the data keeps coming. Examples: Weather data, network data (VoIP), traffic data.

10/31/2012, METU Stream Data Format Events arriving in a stream At any time, t, we can view the state of the problem as represented by a vector of n numeric values: V t = V1V1 V2V2 …VqVq S1S1 S 11 S 12 …S 1q S2S2 S 21 S 22 …S 2q …………… SnSn S n1 S n2 …S nq Time

10/31/2012, METU Modeling Stream Data –Summarization (Synopsis) of data –Temporal and Spatial –Dynamic –Continuous (infinite stream) –Concept Drift Learn Forget –Sublinear growth rate - Clustering

10/31/2012, METU MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: S ={N 1,N 2, …, N m }, and A = {L ij | i  1, 2, …, m, j  1, 2, …, m} and Each arc, L ij = is labeled with a transition probability P ij = P(N j | N i ).

10/31/2012, METU Problem with Markov Chains The required structure of the MC may not be certain at the model construction time. As the real world being modeled by the MC changes, so should the structure of the MC. Not scalable – grows linearly as number of events. Our solution: –Extensible Markov Model (EMM) –Cluster real world events –Allow Markov chain to grow and shrink dynamically

10/31/2012, METU EMM (Extensible Markov Model) Time Varying Discrete First Order Markov Model Continuously evolves Nodes are clusters of real world states. Learning continues during application phase. Learning: –Transition probabilities between nodes –Node labels (centroid of cluster) –Nodes are added and removed as data arrives Applications: –Anomaly/Rare Event Detection –Prediction –Classification

10/31/2012, METU EMM Definition Extensible Markov Model (EMM): at any time t, EMM consists of an MC with designated current node and algorithms to modify it, where algorithms include: EMMCluster, which defines a technique for matching between input data at time t + 1 and existing states in the MC at time t. EMMIncrement algorithm, which updates MC at time t + 1 given the MC at time t and clustering measure result at time t + 1. EMMDecrement algorithm, which removes nodes from the EMM when needed.

10/31/2012, METU EMM Cluster Nearest Neighbor (or any clustering technique) If none “close” create new node Labeling of cluster is centroid of members in cluster or Clustering Feature O(n) Here n is the number of states

10/31/2012, METU EMM Advantages Dynamic Adaptable Use of clustering Learns rare event Scalable: –Growth of EMM is not linear on size of data. –Hierarchical feature of EMM Creation/evaluation quasi-real time

10/31/2012, METU EMM Sublinear Growth Servent Data

10/31/2012, METU Growth Rate Automobile Traffic Minnesota Traffic Data

10/31/2012, METU EMM Learning <18,10,3,3,1,0,0><17,10,2,3,1,0,0><16,9,2,3,1,0,0><14,8,2,3,1,0,0><14,8,2,3,0,0,0><18,10,3,3,1,1,0.> 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1

10/31/2012, METU N2 N1N3 N5N6 2/2 1/3 1/2 N1N3 N5N6 1/6 1/3 EMM Forgetting

10/31/2012, METU Outline Spatiotemporal Stream Data TRACDS Hurricane Intensity Prediction PIIH PIIH online

10/31/2012, METU Traditional Stream Clustering Standard Data Stream Clustering ignores temporal aspect of data

10/31/2012, METU Stream Clustering Clusters change over time – they move Some techniques use micro clusters/reclustering Reclustering is often off line (batch while stream data comes). STREAM –Partitions stream data into segments –Clusters each segment (k-medians) –Iteratively reclusters the centers of these clusters S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. “Clustering data streams: Theory and practice.” IEEE Transactions on Knowledge and Data Engineering, 15(3): , 2003.

10/31/2012, METU Stream Clustering Requirements Dynamic updating of the clusters Completely online Identify outliers Identify concept drifts Compactness Fast Incremental processing 10/31/2012, METU

Temporal Relationship Among Clusters in Data Streams

10/31/2012, METU TRACDS NOTE TRACDS is not: – Another stream clustering algorithm TRACDS is: –A new way of looking at clustering –Built on top of an existing clustering algorithm TRACDS may be used with any stream clustering algorithm 10/31/2012, METU

TRAC-DS Overview 10/31/2012, METU

TRACDS Definition Given a data stream clustering ζ, a temporal relationship among clusters (TRACDS) overlays a data stream clustering ζ with an EMM M, in such a way that the following are satisfied: (1) There is a one-to-one correspondence between the clusters in ζ and the states S in M. (2) A transition aij in the EMM M represents the probability that given a data point in cluster i, the next data point in the data stream will belong to cluster j with i; j = 1; 2; : : : ; k. (3) The EMM M is created online together with the data stream clustering 10/31/2012, METU

TRACDS Clustering Operations

10/31/2012, METU TRACDS Example C EMM

10/31/2012, METU Outline Spatiotemporal Stream Data TRACDS Hurricane Intensity Prediction PIIH PIIH online

10/31/2012, METU Lower 9 th Ward of New Orleans, Louisiana, Feb 27, 2006 Photographer: Mackenzie Schott 10/31/2012, METU

The major issues in forecasting hurricanes are predicting their tracks of movement and their intensities. Compared with prediction of track movement, intensity prediction is still relatively inaccurate. Hurricanes are tropical cyclones with sustained winds of at least 64 kt (119 km/h, 74 mph). Time step [0h, 12h, 24h, …, 120h] Hurricanes 10/31/2012, METU

Hurricane Intensity Prediction Hurricane Intensity: Maximum sustained surface wind. Highest average wind speed within 1 minute and10m above surface. Rapid Intensification 24-h increase in maximum wind speed >= 30knots. “Maximum Sustained Wind”. Wikipedia. Wikimedia foundation, 27 August Web. 4 December Retrieved from “ Rapid Intensification,” accessed on 10/24/12,

10/31/2012, METU Hurricane Saffir–Simpson Hurricane Scale[1]: Category 5: Wind speed >= 136 knots Category 4: Wind speed ( ) knots Category 3: Wind speed (96-113) knots Category 2: Wind speed (83-95) knots Category 1: Wind speed (64-82) knots Tropical storm: Wind speed (35-63) knots Tropical depression: Wind speed (0-34) knots Maximum Sustained Wind”. Wikipedia. Wikimedia foundation, 27 August Web. 4 December Retrieved from

10/31/2012, METU Predicting Intensity Statistical models predict intensity based on measured stream data. Current state of storm History of this storm How similar storms behaved in past Regression models are the most popular. NOAA (branch of U.S. Government) –collects stream data. –Yearly updates it models based on data from previous year –Makes predictions in a quasi-real time manner.

10/31/2012, METU Hurricane Intensity Prediction  Category mph  Damage: estimated $125 billion  Fatalities: >1,800  “Hurricane Katrina – Most Destructive Hurricane Ever to Strike the U.S.”, August 28, 2005, February 12, 2007, “Objective: Improve forecast skill to accuracy and confidence levels required for decision ‐ making and risk management” NOAA’s National Weather Service Strategic Plan  Very difficult to predict Intensity (rapid intensification)  National Hurricane Center (NHC) uses –Dynamical models: computational intensive and slow –Statistical models: Statistical Hurricane Intensity Prediction Scheme (SHIPS) Current Storm – SANDY aphics.shtml Path of Hurricane Katrina (2005) Color shows intensity 10/31/2012, METU

Remote Sensing Storm features are gathered from the earth's observations using remote sensing. Real time data are gathered every few hours and stored in large databases. Historical data of more than 20 years of the earth's behavior is stored in the database. Methods: Satellite Buoy Ship Aircraft

10/31/2012, METU Satellite Images Analogous to to how the eye and camera captures the images. Passive: The sun omits light to earth Light hits objects Light reflected from objects to satellite sensors Image is captured Each object has a different color reflection  helps analyze the image and  understand the actual representation on earth Active: Satellites omit energy to objects Radiation is reflected to the sensors Reflection is measured and analyzed

10/31/2012, METU Satellite Images Source:

10/31/2012, METU Buoy and Ships Used to gather direct measurements within the sea. Used when the readings gathered by the satellite are not accurate. Buoys: form a network in the ocean and are used to take hourly measurements  Such as: sea surface temperature, wind speed and direction, and humidity. Ships: Observations are taken occasionally Crews ride ships and take measurements using different tools  Such as: anemometers that are used to measure the wind speed

10/31/2012, METU Aircraft Reconnaissance Used to gather data by flying above the hurricanes The aircraft includes different tools that are used for measurements They try to find the center of the storm They fly on top of the storm to provide detailed and more accurate information about the storm This could be very dangerous and might cause damage to the aircraft and the crew

10/31/2012, METU Hurricane Data

10/31/2012, METU Hurricane Data First-order Markov chain assumes that the current state only depends on the previous state. We assume hurricane states preserve the first - order Markov chain property. For instance, let s t denote a current state. Then s t only depends on the state s t-1, where t − 1 indicates the previous time interval of t. A hurricane 0 hours, 12 hours, 24 hours, …., 120 hours stst s t+1 s t+2 s t+3 … dependence A First Order Markov chain is a sequence of random variables X 1, X 2, X 3,... with the Markov property

10/31/2012, METU Outline Spatiotemporal Stream Data TRACDS Hurricane Intensity Prediction PIIH PIIH online

10/31/2012, METU Hurricane Data The data contains 16 predictors. The dataset is formed by time ordered 12 hour interval records and contains the hurricane data from seasons 1982 to Hurricane Data hurricane 10h, 12h, 24h, … 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 25,0,1,-5.83,668,0,140,14.9,-53.5,13.25,40.5,23,6.6,27,372.5, ,0,1,-5.83,708,0,140,12.7,-53.45,13.65,37.5,17.5,5.69,4,317.5, ,5,1,-3.58,682,150,135,12.75,-53.35,13.25,34,1.5,5.79,15,382.5, ,5,1,-4.9,674,175,130,14.2,-53.35,13.4,33,-12,6.66,-13,497, ,15,1,0.44,681,750,113.52,17.1,-53.15,13.2,35,-20,8.32,-7,855, ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 30,0,0.99,-7.02,656,0,124.55,19.05,-52.55,14.75,51,0.5,6.68,45,571.5, ,0,0.98,-7.02,675,0,123.75,17.3,-52.6,14.15,54,5,6.63,22,519, ,5,0.98,-4.16,722,175,119.55,17.9,-52.6,14.65,58,10,7.43,34,626.5, ,30,0.97,4.09,635,1950,88.77,19.15,-52.1,14.7,54.5,27.5,8.63,33, , ,10,0.97,6.25,724,750,70.08,17.8,-52.15,12.55,54,48.5,8.61,45,1335, ,20,0.96,9.17,641,1900,37.59,14.85,-52.9,11.1,56.5,55,7.87,15, , ,0,0.96,7.2,691,0,33.33,15.6,-53.45,9.25,51.5,44.5,8.97,32,1482, ,0,0.95,0.82,713,0,35.62,17.9,-53.25,7.85,47,38,10.72,31,1700.5, ,0,0.95,2.4,813,0,28.12,20.85,-52.65,7.25,45,45,12.84,63, , ,20,0.93,10.65,635,2300,-11.1,24.45,-52.7,4.55,41.5,57.5,15.81,24, , ,-5,0.93,14.51,622,-550,-26.24,30.7,-53.55,1.15,40.5,50.5,21.2,28,3377, ,-20,0.91,18.15,613,-1800,-17.97,37.05,-53.95,0,46,29.5,27.08,42,3334.5, ,-20,0.91,21.86,668,-1400,1.01,40.3,-53.7,0,52.5,20,30.72,41,2821, ,0,0.89,26.22,688,0,2.35,45.05,-52.7,0.25,50.5,37.5,35.18,31,3153.5,5.5 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 …… 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 hurricane 20h, 12h, 24h, … … hurricane 274 0h, 12h, 24h, … … 16 predictors Intensity

10/31/2012, METU Construct EMM

10/31/2012, METU Use EMM for Prediction

10/31/2012, METU EMM, TRACDS and Hurricane Data Approach: Using TRACDS algorithms, construct multiple EMMs. One will be built for each time point into the future for which predictions are to be made: 12 hours, …, 120 hours. NOAA provides 16 different features or predictors (attribute values). Clustering is performed based on a distance calculation from input feature vector to centroid of clusters in EMMs. However the importance of these to intensity prediction is not uniform. How can we determine weight for each feature? Used during clustering.

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) WFL-EMM assumes that the different predictors contribute differently during the prediction. V 1 = V 2 = …… f1f2f3f4f5f6f7 1 0 Weights for predictors In WFL-EMM, a weight vector u = to indicate the weights for different predictors, where u i ∈ [0, 1]. u i =1 means the ith predictor is important and u i =0 implies that the ith predictor is ignored.

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) The question is how to locate a fitness weight vector u = for hurricane intensity predictions. Genetic algorithm (GA) is introduced in WFL-EMM to find the best fitness weight vector, which gives the smallest error of the prediction. GA Learning Process

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) Given a weights vector u =. Two steps of data transformation Normalization: normalize all the predictor within the range of [0, 1] First standardize the predictor values by Transformation: Assume a normalized record d =. Then the record is transformed as d’ =. where and sd(x) are the mean and standard deviation of the ith predictor. Then a non-linear normalization maps z i to interval [0, 1], where is damping coefficient.

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM)

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) The question is how to locate a fitness weight vector u = for hurricane intensity predictions. These weights are used during the clustering and applied to the distance/similarity measure used for clustering Genetic algorithm (GA) is introduced in WFL- EMM to find the best fitness weight vector, which gives the smallest error of the prediction. GA Learning Process

10/31/2012, METU GAs try to locate a fitness solution from the a solution space. Solution space Fitness solution Weight vector u = spans a vector space [0, 1] n since each u i is a real value ranged in [0, 1]. Weighted Feature Learning -Extensible Markov Model (WFL-EMM)

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) Genetic algorithm evolution Each time, two chromosomes are selected randomly from the ith population with a probability proportional to their fitness, where a chromosome is a Gray code string of a weight vector u. Chromosome 1 Chromosome 2 GA Learning Process Population i

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) Genetic algorithm evolution GA Learning Process Chromosome 1 Chromosome 2 Calculate the fitness of the obtained chromosome and place it into the population i+1 New chromosome crossovermutation Randomly alter one or more bits in the offspring based on a given probability. inversion Randomly select a break point in a chromosome and then exchange the position of the two pieces.

10/31/2012, METU Weighted Feature Learning -Extensible Markov Model (WFL-EMM) GA Learning Process Fitness of the chromosome A chromosome is first decoded into a weight vector u. Apply this obtained u to generate a G EMM by using the training data. Then the fitness is calculated by either mean absolute deviation (MAD) or root mean square error (RMSE) based on the testing data. The best fitness weight vector u is located during the evolution of a GA. Fitness where

10/31/2012, METU Results Input parameters of the experiments.

10/31/2012, METU Results - Experiment 1: Incremental training and testing for the periods from 2001 to 2003 (set RMSE as fitness). The model is trained on the data from 1982 to 2000 and evaluated using the data of Then the model is trained on the data from 1982 to 2001 and evaluated using the data of 2002 etc. For each time interval in [12h, 24h, …, 120h], the average error is computed based on the errors of 2001, 2002 and 2003.

10/31/2012, METU Results - Experiment 2: Evaluating WFL-EMM by using k-fold cross validation technique over the dataset from 1982 to 2003 (set MAD as fitness).

10/31/2012, METU Results It is interesting to look at the weights of the features because these weights reveals information about what the main drivers of intensity change might be.

10/31/2012, METU Learn feature weights using Genetic Algorithm. Weights for features over time.

10/31/2012, METU PIIH – Prediction Intensity Interval Model for Hurricanes Historic hurricane data Features  Current wind speed  Various temperatures  Time of the year  Direction of movement  GOES Satellite Data (IR) Currently 23 features from the Statistical Hurricane Intensity Prediction Scheme (SHIPS) TRACDS TM Data stream clustering + temporal order model

10/31/2012, METU Prediction using PIIH – Irene (2011) Current features of hurricane

10/31/2012, METU Prediction using PIIH – Irene (2011) Current features of hurricane Aggregate possible future scenarios into a prediction

10/31/2012, METU PIIH Output for Irene (2011) MAD … Mean average deviation MSE … Mean squared error * Baseline model

10/31/2012, METU PIIH Advantages Real Time Dynamic Machine Learning Confidence Bands By analyzing the 2011 storms through Nate, we observed the following: –96.33% of observations fell within the 95% confidence band –92.8% of observations fell within the 90% confidence band –74.27% of observations fell within the 68% confidence band

10/31/2012, METU Outline Spatiotemporal Stream Data TRACDS Hurricane Intensity Prediction PIIH PIIH online

10/31/2012, METU

10/31/2012, METU Cooperation & Media Coverage James Franklin Branch Chief, Hurricane Specialist Unit, NHC, NOAA Mark Demaria Chief of the NESDIS Regional and Mesoscale Meteorology Branch, CIRA, NOAA

10/31/2012, METU Future Work 1. Deploy model with NOAA  Add decay model over land  Evaluate additional features  Predict rapid intensification  Interface with NOAA’s systems 2. Improve the TRACDS TM model  Data stream clustering  Higher-order effects  Improve model selection and outlier handling

10/31/2012, METU PIIH Bibliography

10/31/2012, METU Thank you!