Particle Filtering
Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models to estimate ground truth, unobserved variables, make forecasts
Hidden Markov Model Use observations to get a better idea of where the robot is at time t Hidden state variables X0 X1 X2 X3 Observed variables z1 z2 z3 Predict – observe – predict – observe…
Last Class Kalman Filtering and its extensions Exact Bayesian inference for Gaussian state distributions, process noise, observation noise What about more general distributions? Key representational issue How to represent and perform calculations on probability distributions?
Agenda Bayesian filtering, in more detail Particle filtering: a Monte Carlo approach to Bayesian filtering with complex distributions Principles Ch. 9
Aside… why is this hard?
Bayesian Prediction on Markov Chain The probability distribution of X1 depends probabilistically on the value of X0 The probability distribution of X2 depends probabilistically on the value of X1 … X0 X1 X2 X3 P(Xt|Xt-1) known as transition model
Bayesian Prediction on MC Prediction / forecasting: what’s the probability distribution over a future state Xt?
Bayesian Prediction on MC Prediction / forecasting: what’s the probability distribution over a future state Xt? Need to marginalize over the possible values of the prior state 𝑃 𝑋 𝑡 =𝑥 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 𝑃 𝑥 𝑡−1 𝑑 𝑥 𝑡−1 Transition model Distribution over previous state
Bayesian Prediction on MC Prediction / forecasting: what’s the probability distribution over a future state Xt? Need to marginalize over the possible values of the prior state 𝑃 𝑋 𝑡 =𝑥 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 𝑃 𝑥 𝑡−1 𝑑 𝑥 𝑡−1 Recursive inference: maintain a belief state Belt(X)=P(Xt), use above equation to advance to Belt+1(X) Transition model Distribution over previous state
Belief state evolution P(Xt) = Sxt-1P(Xt|Xt-1) P(Xt-1) “Blurs” over time, and, if domain is bounded, typically approaches a stationary distribution as t grows Limited prediction power Rate of blurring known as “mixing time”
History Dependence In Markov models, the state must be chosen so that the future distribution is determined entirely by the current state (history independence) Often this requires adding variables that cannot be directly observed minimum essentials “the bare” market wipes himself with the rabbit Are these people walking toward you or away from you? What comes next?
Partial Observability Hidden Markov Model (HMM) Hidden state variables X0 X1 X2 X3 Observed variables z1 z2 z3 P(Zt|Xt) called the observation model (or sensor model)
Bayesian Filtering Name comes from signal processing Query variable X0 z1 z2 z3 Observed variables
Bayesian Filtering Name comes from signal processing Maintain belief over time 𝐵𝑒𝑙 𝑡 𝑥 =𝑃( 𝑋 𝑡 =𝑥| 𝑧 1 ,…, 𝑧 𝑡 ) Query variable X0 X1 X2 X3 z1 z2 z3 Observed variables
Bayesian Filtering 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 𝑃 𝑥 𝑡−1 𝑧 1 ,…, 𝑧 𝑡−1 𝑑 𝑥 𝑡−1 Query variable X0 X1 X2 X3 z1 z2 z3 Observed variables
Bayesian Filtering 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 𝐵𝑒𝑙 𝑡−1 ( 𝑥 𝑡−1 ) 𝑑 𝑥 𝑡−1 Query variable X0 X1 X2 X3 z1 z2 z3 Observed variables
Bayesian Filtering 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 𝐵𝑒𝑙 𝑡−1 ( 𝑥 𝑡−1 ) 𝑑 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 =𝑃 𝑧 𝑡 𝑥 𝑡−1 , 𝑋 𝑡 =𝑥 𝑃 𝑋 𝑡 =𝑥 𝑋 𝑡−1 /𝑃( 𝑧 𝑡 | 𝑥 𝑡−1 ) X0 X1 X2 X3 z1 z2 z3
Bayesian Filtering 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 𝐵𝑒𝑙 𝑡−1 ( 𝑥 𝑡−1 ) 𝑑 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 =𝑃 𝑧 𝑡 𝑥 𝑡−1 , 𝑋 𝑡 =𝑥 𝑃 𝑋 𝑡 =𝑥 𝑋 𝑡−1 /𝑃( 𝑧 𝑡 | 𝑥 𝑡−1 ) 𝑃 𝑧 𝑡 𝑥 𝑡−1 , 𝑋 𝑡 =𝑥 =𝑃( 𝑧 𝑡 | 𝑋 𝑡 =𝑥) X0 X1 X2 X3 z1 z2 z3
Bayesian Filtering 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 𝐵𝑒𝑙 𝑡−1 ( 𝑥 𝑡−1 ) 𝑑 𝑥 𝑡−1 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 , 𝑧 𝑡 =𝑃 𝑧 𝑡 𝑥 𝑡−1 , 𝑋 𝑡 =𝑥 𝑃 𝑋 𝑡 =𝑥 𝑥 𝑡−1 /𝑃( 𝑧 𝑡 | 𝑥 𝑡−1 ) 𝑃 𝑧 𝑡 𝑥 𝑡−1 , 𝑋 𝑡 =𝑥 =𝑃( 𝑧 𝑡 | 𝑋 𝑡 =𝑥) 𝑃 𝑧 𝑡 𝑥 𝑡−1 = 𝑥 𝑡 𝑃 𝑧 𝑡 𝑥 𝑡 𝑃 𝑥 𝑡 𝑥 𝑡−1 𝑑 𝑥 𝑡 X0 X1 X2 X3 z1 z2 z3
Bayesian Filtering Recap: Two-step interpretation: 𝑃 𝑋 𝑡 =𝑥 𝑧 1 ,…, 𝑧 𝑡 = 𝑥 𝑡−1 𝑃 𝑧 𝑡 𝑋 𝑡 =𝑥 /𝑃( 𝑧 𝑡 | 𝑥 𝑡−1 )𝑃 𝑋 𝑡 =𝑥 𝑋 𝑡−1 𝐵𝑒𝑙 𝑡−1 ( 𝑥 𝑡−1 ) 𝑑 𝑥 𝑡−1 𝑃 𝑧 𝑡 𝑥 𝑡−1 = 𝑥 𝑡 𝑃 𝑧 𝑡 𝑥 𝑡 𝑃 𝑥 𝑡 𝑥 𝑡−1 𝑑 𝑥 𝑡 Two-step interpretation: Predict 𝐵𝑒𝑙 𝑡 ′ (𝑥)=𝑃( 𝑋 𝑡 =𝑥| 𝑧 1 ,…, 𝑧 𝑡−1 ) w/o 𝑧 𝑡 Update using the information from 𝑧 𝑡 to derive 𝐵𝑒 𝑙 𝑡 (𝑥) X0 X1 X2 X3 z1 z2 z3
Particle Filtering (aka Sequential Monte Carlo) Represent distributions as a set of particles Applicable to non-gaussian high-D distributions Convenient implementations Widely used in vision, robotics
Simultaneous Localization and Mapping (SLAM) Mobile robots Odometry Locally accurate Drifts significantly over time Vision/ladar/sonar Inaccurate locally Global reference frame Combine the two State: (robot pose, map) Observations: (sensor input)
General problem xt ~ Bel(xt) (arbitrary p.d.f.) xt+1 = f(xt,u,ep) zt+1 = g(xt+1,eo) ep ~ arbitrary p.d.f., eo ~ arbitrary p.d.f. Process noise Observation noise
Particle Representation Bel(xt) = {(wk,xk), k=1,…,n} wk are weights, xk are state hypotheses Weights sum to 1 Approximates the underlying distribution
Monte Carlo Integration More formally, if P(x) ≈ Bel(x) = {(wk,xk), k=1,…,N} 𝐸 𝑃 𝜙 𝑥 = 𝑥 𝜙(𝑥)𝑃(𝑥)𝑑𝑥 ≈ 𝑘=1 𝑁 𝑤𝑘𝜙(𝑥𝑘) for any test function φ(x) What might you want to compute? Mean: use φ(x) = x Variance: use φ(x) = x2 (recover Var(x) = E[x2]-E[x]2) P(y): use φ(x) = P(y|x) Because P(y) = integral[ P(y|x)P(x)dx ]
Recovering the Distribution Kernel density estimation P(x) = Sk wk K(x,xk) K(x,xk) is the kernel function Better approximation as # particles, kernel sharpness increases
Performing a transformation Let P(x) ≈ Bel(x) = {(wk,xk), k=1,…,N} Let 𝑦=𝑓(𝑥) be a general nonlinear transformation Want to recover distribution Q(y) = P(f(X)) Hypothesis: Bel(y) = {(wk,f(xk)), k=1,…,N} approximates Q(y)
Particle Propagation
Performing a transformation Hypothesis: Bel(y) = {(wk,f(xk)), k=1,…,N} approximates Q(y) Let φ be a test function 𝐸 𝑄 𝜙 𝑦 = 𝑦 𝜙 𝑦 𝑄 𝑦 𝑑𝑦 = 𝑦 𝜙 𝑦 𝑥 𝐼[𝑓 𝑥 =𝑦]𝑃 𝑥 𝑑𝑥 𝑑𝑦
Performing a transformation Hypothesis: Bel(y) = {(wk,f(xk)), k=1,…,N} approximates Q(y) Let φ be a test function 𝐸 𝑄 𝜙 𝑦 = 𝑦 𝜙 𝑦 𝑄 𝑦 𝑑𝑦 = 𝑦 𝜙 𝑦 𝑥 𝐼[𝑓 𝑥 =𝑦]𝑃 𝑥 𝑑𝑥 𝑑𝑦 Now consider 𝐼[𝑓 𝑥 =𝑦] as a test function 𝜓 𝑦 (𝑥) 𝑥 𝐼[𝑓 𝑥 =𝑦]𝑃 𝑥 𝑑𝑥 ≈ 𝑘=1 𝑁 𝑤𝑘 𝜓 𝑦 𝑥 𝑘 = 𝑘=1 𝑁 𝑤𝑘 𝐼[𝑓 𝑥 𝑘 =𝑦] 𝐸 𝑄 𝜙 𝑦 ≈ 𝑦 𝜙 𝑦 𝑘=1 𝑁 𝑤𝑘 𝐼 𝑓 𝑥 𝑘 =𝑦 𝑑𝑦 = 𝑘=1 𝑁 𝑤 𝑘 𝑦 𝜙 𝑦 𝐼 𝑓 𝑥 𝑘 =𝑦 𝑑𝑦 = 𝑘=1 𝑁 𝑤 𝑘 𝜙 𝑦 𝑘
Filtering Steps Predict Update Compute Bel’(xt+1): distribution of xt+1 using dynamics model alone Update Compute a representation of P(xt+1|zt+1) via likelihood weighting for each particle in Bel’(xt+1) Resample to produce Bel(xt+1) for next step
Predict Step Given input particles Bel(xt) Distribution of xt+1=f(xt,ut,e) determined by sampling e from its distribution and then propagating individual particles Gives Bel’(xt+1)
Update Step Goal: compute a representation of P(xt+1 | zt+1) given Bel’(xt+1), zt+1 P(xt+1 | zt+1) = a P(zt+1 | xt+1) P(xt+1) P(xt+1) = Bel’(xt+1) (given) Each state hypothesis xk Bel’(xt+1) is reweighted by P(zt+1 | xt+1) Likelihood weighting: wk wk P(zt+1|xt+1=xk) Then renormalize to 1
Update Step wk wk’ * P(zt+1 | xt+1=xk) 1D Gaussian example: g(x,eo) = h(x) + eo eo ~ N(m,s) P(zt+1 | xt+1=xk) = C exp(- (h(xk)-zt+1)2 / 2s2) In general, distribution can be calibrated using experimental data
Resampling Likelihood weighted particles may no longer represent the distribution efficiently Importance resampling: sample new particles proportionally to weight
Sampling Importance Resampling (SIR) variant Predict Update Resample
Particle Filtering Issues Variance Std. dev. of a quantity (e.g., mean) computed as a function of the particle representation ~ 1/sqrt(N) Loss of particle diversity Resampling will likely drop particles with low likelihood They may turn out to be useful hypotheses in the future
Other Resampling Variants Selective resampling Keep weights, only resample when # of “effective particles” < threshold Stratified resampling Reduce variance using quasi-random sampling Optimization Explicitly choose particles to minimize deviance from posterior …
Storing more information with same # of particles Unscented Particle Filter Each particle represents a local gaussian, maintains a local covariance matrix Combination of particle filter + Kalman filter Rao-Blackwellized Particle Filter State (x1,x2) Particle contains hypothesis of x1, analytical distribution over x2 Reduces variance
Recap Bayesian mechanisms for state estimation are well understood Representation challenge Methods: Kalman filters: highly efficient closed-form solution for Gaussian distributions Particle filters: approximate filtering for high-D, non-Gaussian distributions Implementation challenges for different domains (localization, mapping, SLAM, tracking)