Advanced Methods of Prediction Motti Sorani Boaz Cohen Supervisor: Gady Zohar Technion - Israeli Institute of Technology Department of Electrical Engineering.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Kostas Kontogiannis E&CE
Robot Localization Using Bayesian Methods
Lab 2 Lab 3 Homework Labs 4-6 Final Project Late No Videos Write up
Artificial Intelligence in Game Design Introduction to Learning.
Instance Based Learning
280 SYSTEM IDENTIFICATION The System Identification Problem is to estimate a model of a system based on input-output data. Basic Configuration continuous.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Application of Generalized Representations for Image Compression Application of Generalized Representations for Image Compression using Vector Quantization.
Spatial and Temporal Data Mining
1 Abstract This study presents an analysis of two modified fuzzy ARTMAP neural networks. The modifications are first introduced mathematically. Then, the.
Reduced Support Vector Machine
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
LAB 3 AIRBAG DEPLOYMENT SENSOR PREDICTION NETWORK Warning This lab could save someone’s life!
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Using CTW as a language modeler in Dasher Martijn van Veen Signal Processing Group Department of Electrical Engineering Eindhoven University.
Overview and Mathematics Bjoern Griesbach
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Markov Localization & Bayes Filtering
Biointelligence Laboratory, Seoul National University
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Computer vision: models, learning and inference Chapter 19 Temporal models.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Combining Regression Trees and Radial Basis Function Networks paper by: M. Orr, J. Hallam, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, T. Leonard presentation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Applications of Neural Networks in Time-Series Analysis Adam Maus Computer Science Department Mentor: Doctor Sprott Physics Department.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Akram Bitar and Larry Manevitz Department of Computer Science
Course 2 Image Filtering. Image filtering is often required prior any other vision processes to remove image noise, overcome image corruption and change.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.
NTU & MSRA Ming-Feng Tsai
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Predictive Learning from Data
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
PSG College of Technology
Course: Autonomous Machine Learning
Structure learning with deep autoencoders
Random Noise in Seismic Data: Types, Origins, Estimation, and Removal
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
Predictive Learning from Data
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Akram Bitar and Larry Manevitz Department of Computer Science
Derivatives and Gradients
Presentation transcript:

Advanced Methods of Prediction Motti Sorani Boaz Cohen Supervisor: Gady Zohar Technion - Israeli Institute of Technology Department of Electrical Engineering The Image and Computer Vision Laboratory

Project Goals Enhanced prediction scheme based on the former project. Better approximation of the system behavior using Kalman Filtering. Implementation of a competative prediction tool that is based on neural network Implementation of LZ Predictor, and adapting its prediction scheme to continuous signals.

Enhanced Prediction Schemes The following points of weakness were diagnosed in the former project:  A need for an “optimal” criterion while searching for the optimal-evaluation-environment  Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)  Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.

Enhanced Prediction Schemes The following points of weakness were diagnosed in the former project:  A need for an “optimal” criterion while searching for the optimal-evaluation-environment  Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)  Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.

Confidence Interval Criterion The former criterions: The Neighbor Criterion Xn Xn+1 XnewXnei opt Xnew+1

Confidence Interval Criterion (cont) The former criterions: The Nmse Criterion Xn Xn+1 Xnew opt Xnew+1

Confidence Interval Criterion (cont) The New criterion: The Confidence Interval Criterion Choose the environment in which the regression has the best (minimal) confidence interval. Motivation: The confidence interval gives us the interval around in which exists in 90% probability. Small Confidence Interval  Better Evaluation Env.

Confidence Interval Criterion (cont) Criterions Comparison

Confidence Interval Criterion (cont) Criterions Comparison Confidence IntervalNeighborNMSE

Confidence Interval Criterion - Conclusions The Confidence Interval criterion proved its superiority over the NMSE criterion. In most cases it was better than the Neighbor criterion as well Thus, the Confidence-Interval Criterion was selected to be the major criterion in our experiments.

Enhanced Prediction Schemes The following points of weakness were diagnosed in the former project:  A need for an “optimal” criterion while searching for the optimal-evaluation-environment  Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)  Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.

Multi Dimensional Prediction In the former project: Prediction is done on a fixed dimensional state-vector. (the dimension is the fractal dimension of the set). The reason: Smaller dimension  the attractor won’t be embedded correctly in the embedding space Bigger dimension  the points go far from each other  demands a large number of samples

Multi Dimensional Prediction (cont) Fixed Dimensional Prediction Advantage: Speed, Speed, Speed. Disadvantage: The fractal dimension calculated is an averaged one. We know that certain areas of the attractor have a bigger dimension than the averaged value. We want to allow our prediction to increase/decrease dimension as needed

Multi Dimensional Prediction (cont) The Solution Xn (samples) Embedding Dim = 1 Dim = 2 Dim = 3 Dim = 10 Prediction Dim = 1 Dim = 2 Dim = 3 Dim = 10 Pick Best (in terms of Confidence Interval) Xn+1 (samples)

Multi Dimensional Prediction (cont) Example Set: AA N: 2180 LookAhead: 200 Multi Dim Dim = 5

Multi Dimensional Prediction - Conclusions As we expected, using Multi- Dimensional Prediction improved the quality of the prediction, in cost of run-time.

Enhanced Prediction Schemes The following points of weakness were diagnosed in the former project:  A need for an “optimal” criterion while searching for the optimal-evaluation-environment  Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)  Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.

Asymmetrical Evaluation Environment Xn In the former project: searching for environments that are symmetrical around Xnew. Poor results near sharp areas Xn+1 opt Xnew

Asymmetrical Evaluation Environment (cont) The algorithm (by example): Step 1: Partition of the range

Asymmetrical Evaluation Environment (cont) The algorithm (by example): Step 2: Try all possibilities

Asymmetrical Evaluation Environment (cont) The algorithm (by example): Step 3: Find the optimal

Asymmetrical Evaluation Environment (cont) The algorithm (by example): Step 4: go to step 1 (repartition)

Asymmetrical Evaluation Environment (cont) Examples Set: AA N: 2180 LookAhead:100 Dim: 2 Symmetric Env Asymmetric Env

Asymmetrical Evaluation Environment - Conclusions The algorithm succeeds in finding environment with minimum value of the quality criterion. Thus, the confidence interval is reduced, but in some cases the Hit-Ratio isn’t improved. Possible reason: Noise Contribution

System approximation using Kalman Filtering The model: One Dimensional Kalman Filter Noises are gaussian, independent in time, and independent one of each other.

Kalman Filter The filter Recursive filter Optimization problem - finding of a(k) and b(k) that minimize the error

Kalman Filter The Extended Kalman Filter (EKF) The model:  is non-linear x, w can be multi-dimensional

Kalman Filter The Extended Kalman Filter (EKF) The model: A, B are local linear approximation of  EKF doesn’t promise us the optimal solution!

Kalman Filter The Extended Kalan Filter (EKF) The filter:

System approximation using Kalman Filtering Our goal: To eliminate the measurement noise from the state vectors

Kalman Filtering examples Linear Transform N=1000

מספר איטרציות של סינון קלמן. רעש מדידהרעש מערכתמספר נקודות לימוד העתקה לינארית לינארית לינארית לינארית לינארית משולש משולש Prediction using Kalman Filtering העתקת משולש NMSE of prediction

Prediction using Kalman Filtering Example Linear Transform N=50 ITR=5 ITR=1Without Kalman

Prediction using Kalman Filtering - Conclusions EKF demands accurate knowledge of the behavior of the system, but having an accurate knowledge is the reason why we use the Kalman filter… We checked the iterative process of: filter  improved transform  filter Predicting signals with fast-changes in their behavior are not improved by this scheme (the fast changes are considered as noise, the filter smoothes the behavior) Finger Rule: prediction will be efficient if measurement noise is greater than system noise in at least one order. In most cases first iteration is enough.

Competitive tool - neural network We implemented a competitive prediction tool that is based on neural network, to be used as a comparison to our prediction scheme. we used the backpropagation algorithm in order to train the network. The tool was written in MATLAB.

Competitive tool - neural network Our predictor uses the Confidence-Interval criterion Comparison

Competitive tool - neural network Comparison Neural NetworkOur predictor Set: AA N: 1000 LookAhead: 100

Competitive tool - neural network Comparison Neural NetworkOur predictor Set: D N: 1100 LookAhead: 100

Competitive tool - neural network - conclusions The comparison between the prediction results of our tool, and the neural network shows our tool’s superiority, for the signals that were tested.

Sequential Prediction Common usage: for signals with finite accuracy The Idea: The predictor is FS. (Final State Predictor) It keeps in memory only part of the past knowledge, thus can be used for sequential prediction of infinite set.

Sequential Prediction Some Terms before we start... Alpha-Bet: set of all possible values of measurement. For example, digital information has an alpha-bet of {0,1} We deal with the case of finite alpha-bet

Sequential Prediction FS Predictor Predictor keeps all the information needed for the prediction inside. In other words the FS predictor keeps an approximation for the system’s state which it updates sequentially.

Sequential Prediction FS Predictor For example: The alpha-bet: {-2, -1, 0, 1, 2} The Classes: Negative {-2, -1} Non-negative {0, 1, 2}

Sequential Prediction The sequential FS Prediction scheme: f - stochastic g- deteministic The problem: Find optimal f & g that minimize the fraction of errors

Sequential Prediction Markovian Predictor Markovian predictor of order k is a FS-predictor with the following properities: The state is composed of k-order embedding of the last samples. The f-function is: Empiric probability The problem: Increasing k as n is increased

LZ Predictor FS predictor that increases its order automatically. Based on LZ parsing.

LZ Parsing The result of parsing: is: 0, 01, 010, 1, 0100 The dictionary tree is actually the g-function. The probabilities in the nodes generate the f-function. The tree is self-increasing.

Applying LZ Predictor on continuous signals Xn 1n C  Maps cont. to disc. LZ Predictor For Example: Predicting the aim of the signal NOTE: The partitioning of the continuous space to the cells is very important for the quality of the prediction

Applying LZ Predictor on continuous signals Results Predicting the sequence of … with Salt&pepper noise Np \ N

Applying LZ Predictor on continuous signals Example - Stocks Prediction of the aim of the signal Fraction of Errors LEV

Applying LZ Predictor on continuous signals - Conclusions The fraction error is lower-bounded as it can be seen in the case of the binary-sequence (decreasing the noise probability doesn’t decrease the error). The reason: Guessing at the leaves of the dictionary tree. Discretization of a continuous signals shows good results especially for the STOCKS signal. Partitioning the space to cells proved to be very effective.