Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797.

Slides:



Advertisements
Similar presentations
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Advertisements

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Mobile Intelligent Systems 2004 Course Responsibility: Ola Bengtsson.
Kalman Filtering Jur van den Berg. Kalman Filtering (Optimal) estimation of the (hidden) state of a linear dynamic process of which we obtain noisy (partial)
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Computer Vision - A Modern Approach Set: Tracking Slides by D.A. Forsyth The three main issues in tracking.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
CS Statistical Machine learning Lecture 24
An Introduction to Kalman Filtering by Arthur Pece
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
An Introduction To The Kalman Filter By, Santhosh Kumar.
Short Introduction to Particle Filtering by Arthur Pece [ follows my Introduction to Kalman filtering ]
Kalman Filtering And Smoothing
State Estimation and Kalman Filtering Zeeshan Ali Sayyed.
Tracking with dynamics
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Other Models for Time Series. The Hidden Markov Model (HMM)
Lecture 1.31 Criteria for optimal reception of radio signals.
STATISTICAL ORBIT DETERMINATION Kalman (sequential) filter
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
STATISTICAL ORBIT DETERMINATION Coordinate Systems and Time Kalman Filtering ASEN 5070 LECTURE 21 10/16/09.
Hidden Markov Models.
水分子不時受撞,跳格子(c.p. 車行) 投骰子 (最穩定) 股票 (價格是不穏定,但報酬過程略穩定) 地震的次數 (不穩定)
Tracking We are given a contour G1 with coordinates G1={x1 , x2 , … , xN} at the initial frame t=1, were the image is It=1 . We are interested in tracking.
Today.
PSG College of Technology
Machine Learning for Signal Processing Prediction and Estimation, Part II Bhiksha Raj Class Nov Nov /18797.
Introduction to particle filter
CSCI 5822 Probabilistic Models of Human and Machine Learning
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
Filtering and State Estimation: Basic Concepts
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Introduction to particle filter
Integration of sensory modalities
EE513 Audio Signals and Systems
Particle Filtering.
Prediction and Estimation, Part II
Motion Models (cont) 2/16/2019.
Bayes and Kalman Filter
Hidden Markov Models Markov chains not so useful for most agents
Kalman Filtering COS 323.
LECTURE 15: REESTIMATION, EM AND MIXTURES
Chapter14-cont..
Predicting and Estimation from Time Series
Hidden Markov Models Lirong Xia.
Parametric Methods Berlin Chen, 2005 References:
Speech recognition, machine learning
11. Conditional Density Functions and Conditional Expected Values
8. One Function of Two Random Variables
Chapter 6 Random Processes
Mathematical Foundations of BME Reza Shadmehr
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
11. Conditional Density Functions and Conditional Expected Values
9. Two Functions of Two Random Variables
Kalman Filter: Bayes Interpretation
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Berlin Chen Department of Computer Science & Information Engineering
8. One Function of Two Random Variables
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Speech recognition, machine learning
Presentation transcript:

Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797

Administrivia Final class on Tuesday the 2nd.. Project Demos: 4th December (Thursday). Before exams week Problem: How to set up posters for SV students? Find a representative here? 11-755/18797

An automotive example Determine automatically, by only listening to a running automobile, if it is: Idling; or Travelling at constant velocity; or Accelerating; or Decelerating Assume (for illustration) that we only record energy level (SPL) in the sound The SPL is measured once per second 11-755/18797

What we know An automobile that is at rest can accelerate, or continue to stay at rest An accelerating automobile can hit a steady-state velocity, continue to accelerate, or decelerate A decelerating automobile can continue to decelerate, come to rest, cruise, or accelerate A automobile at a steady-state velocity can stay in steady state, accelerate or decelerate 11-755/18797

What else we know P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 The probability distribution of the SPL of the sound is different in the various conditions As shown in figure In reality, depends on the car The distributions for the different conditions overlap Simply knowing the current sound level is not enough to know the state of the car 11-755/18797

The Model! The state-space model P(x|accel) 0.33 70 Accelerating state P(x|idle) 0.5 0.33 0.33 0.5 0.33 0.25 0.33 Idling state Cruising state 65 0.25 45 0.25 0.25 I A C D 0.5 1/3 0.25 0.33 Decelerating state 60 The state-space model Assuming all transitions from a state are equally probable 11-755/18797

Estimating the state at T = 0- 0.25 0.25 0.25 0.25 Idling Accelerating Cruising Decelerating A T=0, before the first observation, we know nothing of the state Assume all states are equally likely 11-755/18797

The first observation At T=0 we observe the sound level x0 = 68dB SPL P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 At T=0 we observe the sound level x0 = 68dB SPL The observation modifies our belief in the state of the system P(x0|idle) = 0 P(x0|deceleration) = 0.0001 P(x0|acceleration) = 0.7 P(x0|cruising) = 0.5 Note, these don’t have to sum to 1 In fact, since these are densities, any of them can be > 1 11-755/18797

Estimating state after at observing x0 P(state | x0) = C P(state)P(x0|state) P(idle | x0) = 0 P(deceleration | x0) = C 0.000025 P(cruising | x0) = C 0.125 P(acceleration | x0) = C 0.175 Normalizing P(deceleration | x0) = 0.000083 P(cruising | x0) = 0.42 P(acceleration | x0) = 0.57 11-755/18797

Estimating the state at T = 0+ Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5 At T=0, after the first observation, we must update our belief about the states The first observation provided some evidence about the state of the system It modifies our belief in the state of the system 11-755/18797

Predicting the state at T=1 0.5 1/3 0.25 Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5 Predicting the probability of idling at T=1 P(idling|idling) = 0.5; P(idling | deceleration) = 0.25 P(idling at T=1| x0) = P(IT=0|x0) P(I|I) + P(DT=0|x0) P(I|D) = 2.1 x 10-5 In general, for any state S P(ST=1 | x0) = SST=0 P(ST=0 | x0) P(ST=1|ST=0) 11-755/18797

Predicting the state at T = 1 Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5 P(ST=1 | x0) = SST=0 P(ST=0 | x0) P(ST=1|ST=0) 2.1x10-5 0.33 Rounded. In reality, they sum to 1.0 11-755/18797

Updating after the observation at T=1 45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) At T=1 we observe x1 = 63dB SPL P(x1|idle) = 0 P(x1|deceleration) = 0.2 P(x1|acceleration) = 0.001 P(x1|cruising) = 0.5 11-755/18797

Update after observing x1 P(state | x0:1) = C P(state| x0)P(x1|state) P(idle | x0:1) = 0 P(deceleration | x0,1) = C 0.066 P(cruising | x0:1) = C 0.165 P(acceleration | x0:1) = C 0.00033 Normalizing P(deceleration | x0:1) = 0.285 P(cruising | x0:1) = 0.713 P(acceleration | x0:1) = 0. 0014 45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 0.33 0.33 0.33 2.1x10-5 11-755/18797

Estimating the state at T = 1+ Idling Accelerating Cruising Decelerating 0.0 0.713 0.0014 0.285 The updated probability at T=1 incorporates information from both x0 and x1 It is NOT a local decision based on x1 alone Because of the Markov nature of the process, the state at T=0 affects the state at T=1 x0 provides evidence for the state at T=1 11-755/18797

Estimating a Unique state What we have estimated is a distribution over the states If we had to guess a state, we would pick the most likely state from the distributions State(T=0) = Accelerating State(T=1) = Cruising Idling Accelerating Cruising Decelerating 0.0 0.57 0.42 8.3 x 10-5 Idling Accelerating Cruising Decelerating 0.0 0.713 0.0014 0.285 11-755/18797

Overall procedure T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST) Update the distribution of the state at T after observing xT Predict the distribution of the state at T PREDICT UPDATE At T=0 the predicted state distribution is the initial state probability At each time T, the current estimate of the distribution over states considers all observations x0 ... xT A natural outcome of the Markov nature of the model The prediction+update is identical to the forward computation for HMMs to within a normalizing constant 11-755/18797

Comparison to Forward Algorithm T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST) Update the distribution of the state at T after observing xT Predict the distribution of the state at T PREDICT UPDATE Forward Algorithm: P(x0:T,ST) = P(xT|ST) SST-1 P(x0:T-1, ST-1) P(ST|ST-1) Normalized: P(ST|x0:T) = (SS’T P(x0:T,S’T))-1 P(x0:T,ST) = C P(x0:T,ST) PREDICT UPDATE 11-755/18797

Decomposing the forward algorithm P(x0:T,ST) = P(xT|ST) SST-1 P(x0:T-1, ST-1) P(ST|ST-1) Predict: P(x0:T-1,ST) = SST-1 P(x0:T-1, ST-1) P(ST|ST-1) Update: P(x0:T,ST) = P(xT|ST) P(x0:T-1,ST) 11-755/18797

Estimating the state T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST) Update the distribution of the state at T after observing xT Predict the distribution of the state at T Estimate(ST) = argmax STP(ST | x0:T) Estimate(ST) The state is estimated from the updated distribution The updated distribution is propagated into time, not the state 11-755/18797

Predicting the next observation T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST) Update the distribution of the state at T after observing xT Predict the distribution of the state at T Predict P(xT|x0:T-1) Predict xT The probability distribution for the observations at the next time is a mixture: P(xT|x0:T-1) = SST P(xT|ST) P(ST|x0:T-1) The actual observation can be predicted from P(xT|x0:T-1) 11-755/18797

Predicting the next observation MAP estimate: argmaxxT P(xT|x0:T-1) MMSE estimate: Expectation(xT|x0:T-1) 11-755/18797

Difference from Viterbi decoding Estimating only the current state at any time Not the state sequence Although we are considering all past observations The most likely state at T and T+1 may be such that there is no valid transition between ST and ST+1 11-755/18797

A known state model HMM assumes a very coarsely quantized state space Idling / accelerating / cruising / decelerating Actual state can be finer Idling, accelerating at various rates, decelerating at various rates, cruising at various speeds Solution: Many more states (one for each acceleration /deceleration rate, crusing speed)? Solution: A continuous valued state 11-755/18797

The real-valued state model A state equation describing the dynamics of the system st is the state of the system at time t et is a driving function, which is assumed to be random The state of the system at any time depends only on the state at the previous time instant and the driving term at the current time An observation equation relating state to observation ot is the observation at time t gt is the noise affecting the observation (also random) The observation at any time depends only on the current state of the system and the noise 11-755/18797

Continuous state system The state is a continuous valued parameter that is not directly seen The state is the position of the automobile or the star The observations are dependent on the state and are the only way of knowing about the state Sensor readings (for the automobile) or recorded image (for the telescope)

Statistical Prediction and Estimation Given an a priori probability distribution for the state P0(s): Our belief in the state of the system before we observe any data Probability of state of navlab Probability of state of stars Given a sequence of observations o0..ot Estimate state at time t 11-755/18797

Prediction and update at t = 0 Initial probability distribution for state P(s0) = P0(s0) Update: Then we observe o0 We must update our belief in the state P(s0|o0) = C.P0(s0)P(o0|s0) 11-755/18797

The observation probability: P(o|s) This is a (possibly many-to-one) stochastic function of state st and noise gt Noise gt is random. Assume it is the same dimensionality as ot Let Pg(gt) be the probability distribution of gt Let {g:g(st,g)=ot} be all g that result in ot 11-755/18797

The observation probability P(o|s) = ? The J is a Jacobian For scalar functions of scalar variables, it is simply a derivative: 11-755/18797

Predicting the next state Given P(s0|o0), what is the probability of the state at t=1 State progression function: et is a driving term with probability distribution Pe(et) P(st|st-1) can be computed similarly to P(o|s) P(s1|s0) is an instance of this 11-755/18797

And moving on P(s1|o0) is the predicted state distribution for t=1 Then we observe o1 We must update the probability distribution for s1 P(s1|o0:1) = CP(s1|o0)P(o1|s1) We can continue on 11-755/18797

Discrete vs. Continuous state systems 0.4 1 2 0.3 0.2 P(s) p = 0.1 1 2 3 s 3 Prediction at time 0: P(s0) = p (s0) P(s0) = P(s) Update after O0: P(s0 | O0) = C p (s0)P(O0| s0) P(s0| O0) = C P(s0) P(O0| s0) Prediction at time 1: Update after O1: P(s1 | O0,O1) = C P(s1 | O0) P(O1|s1) P(s1| O0 ,O1) = C P(s1| O0) P(O1| s1)

Discrete vs. Continuous State Systems 1 2 3 Prediction at time t: Update after Ot:

Discrete vs. Continuous State Systems 1 2 3 Parameters Initial state prob. Transition prob Observation prob

Special case: Linear Gaussian model A linear state dynamics equation Probability of state driving term e is Gaussian Sometimes viewed as a driving term me and additive zero-mean noise A linear observation equation Probability of observation noise g is Gaussian At, Bt and Gaussian parameters assumed known May vary with time 11-755/18797

The initial state probability We also assume the initial state distribution to be Gaussian Often assumed zero mean 11-755/18797

The observation probability The probability of the observation, given the state, is simply the probability of the noise, with the mean shifted Since the only uncertainty is from the noise The new mean is the mean of the distribution of the noise + the value of the observation in the absence of noise 11-755/18797

The updated state probability at T=0 P(s0| o0) = C P(s0) P(o0| s0) 11-755/18797

Note 1: product of two Gaussians The product of two Gaussians is a Gaussian Not a good estimate -- 11-755/18797

The updated state probability at T=0 P(s0| o0) = C P(s0) P(o0| s0) 11-755/18797

The state transition probability The probability of the state at time t, given the state at time t-1 is simply the probability of the driving term, with the mean shifted 11-755/18797

Note 2: integral of product of two Gaussians The integral of the product of two Gaussians is a Gaussian 11-755/18797

Note 2: integral of product of two Gaussians P(y) is the integral of the product of two Gaussians is a Gaussian 11-755/18797

The predicted state probability at t=1 Remains Gaussian 11-755/18797

The updated state probability at T=1 P(s1| o0:1) = C P(s1 |o0) P(o1| s1) 11-755/18797

The Kalman Filter! Prediction at T Update at T 11-755/18797

Linear Gaussian Model a priori Transition prob. State output prob P(s) = P(st|st-1) = P(Ot|st) = a priori Transition prob. State output prob P(s0) = P(s) P(s0| O0) = C P(s0) P(O0| s0) P(s1| O0:1) = C P(s1| O0) P(O1| s0) P(s2| O0:2) = C P(s2| O0:1) P(O2| s2) All distributions remain Gaussian

The Kalman filter The actual state estimate is the mean of the updated distribution Predicted state at time t Updated estimate of state at time t 11-755/18797

Stable Estimation The above equation fails if there is no observation noise Qg = 0 Paradoxical? Happens because we do not use the relationship between o and s effectively Alternate derivation required Conventional Kalman filter formulation 11-755/18797

Conditional Probability of y|x If P(x,y) is Gaussian: The conditional probability of y given x is also Gaussian The slice in the figure is Gaussian The mean of this Gaussian is a function of x The variance of y reduces if x is known Uncertainty is reduced 11-755/18797

A matrix inverse identity Work it out.. 11-755/18797

For any jointly Gaussian RV Using the Matrix Inversion Identity 11-755/18797

For any jointly Gaussian RV Using the Matrix Inversion Identity 11-755/18797

For any jointly Gaussian RV The conditional of Y is a Gaussian 11-755/18797

Conditional Probability of y|x If P(x,y) is Gaussian: The conditional probability of y given x is also Gaussian The slice in the figure is Gaussian The mean of this Gaussian is a function of x The variance of y reduces if x is known Uncertainty is reduced 11-755/18797

Estimating P(s|o) Consider the joint distribution of o and s Dropping subscript t and o0:t-1 for brevity Assuming g is 0 mean Consider the joint distribution of o and s O is a linear function of s Hence O is also Gaussian 11-755/18797

The joint PDF of o and s o is Gaussian. Its cross covariance with s: 11-755/18797

The probability distribution of O 11-755/18797

The probability distribution of O 11-755/18797

The probability distribution of O 11-755/18797

The probability distribution of O Writing it out in extended form 11-755/18797

Recall: For any jointly Gaussian RV Applying it to our problem (replace Y by s, X by o): 11-755/18797

Stable Estimation Note that we are not computing Qg-1 in this formulation 11-755/18797

The Kalman filter The actual state estimate is the mean of the updated distribution Predicted state at time t Updated estimate of state at time t 11-755/18797

The Kalman filter Prediction Update 11-755/18797

The Kalman filter Prediction Update 11-755/18797

The Kalman Filter Very popular for tracking the state of processes Control systems Robotic tracking Simultaneous localization and mapping Radars Even the stock market.. What are the parameters of the process? 11-755/18797

Kalman filter contd. Model parameters A and B must be known Often the state equation includes an additional driving term: st = Atst-1 + Gtut + et The parameters of the driving term must be known The initial state distribution must be known 11-755/18797

Defining the parameters State state must be carefully defined E.g. for a robotic vehicle, the state is an extended vector that includes the current velocity and acceleration S = [X, dX, d2X] State equation: Must incorporate appropriate constraints If state includes acceleration and velocity, velocity at next time = current velocity + acc. * time step St = ASt-1 + e A = [1 t 0.5t2; 0 1 t; 0 0 1] 11-755/18797

Parameters Observation equation: Critical to have accurate observation equation Must provide a valid relationship between state and observations Observations typically high-dimensional May have higher or lower dimensionality than state 11-755/18797

Problems f() and/or g() may not be nice linear functions Conventional Kalman update rules are no longer valid e and/or g may not be Gaussian Gaussian based update rules no longer valid 11-755/18797

Solutions f() and/or g() may not be nice linear functions Conventional Kalman update rules are no longer valid Extended Kalman Filter e and/or g may not be Gaussian Gaussian based update rules no longer valid Particle Filters 11-755/18797