Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint –Membrane stabilizer: E p (u) = 0.5  i,j [(u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] –Thin plate stabilizer E p (u) = 0.5  i,j [(u i,j+1 + u i,j-1 – 2u i,j ) 2 + (u i+1,j + u i-1,j – 2u i,j ) 2 + (u i+1,j+1 + u i,j – u i+1,j – u i,j+1 ) 2 ] –Linear combinations of the two E d (u,d) : Energy function, measures compatibility between observations and data –E d (u,d) = 0.5  i,j c i,j (d i,j – u i,j ) 2 –c i,j is the inverse of the variance in measurement d i,j

Stabilizing function – membrane stabilizer u i,j j i E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ]

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i ATOM

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 1 ATOM u i,j u i,j + u i,j+1 u i,j+1 – u i,j u i,j+1 – u i,j+1 u i,j

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 2 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 3 ATOM u i,j u i,j + u i,j+1 u i,j+1 – u i,j u i,j+1 – u i,j+1 u i,j

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j = u

Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j E p (u) = 0.5u T A p u Rows of A p have the form 0 0 0 –1 0 0 …. 0 –1 4 –1 0 … 0 0 –1 0.. = u

Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i

Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i 20 -8 2 22 2 1 11 1 ATOM: E p (u) = 0.5u T A p u Rows of A p have the form 0 0 1 0 0... 0 2 –8 2 0 0.. 1 –8 20 –8 1 0 0 …0 0 2 –8 2 0.. 0 –1 0..

Stabilizing function – Examples (1-D) points membrane thin platethin plate + membrane

Stabilizing function – Examples (2-D) Samples from u membrane thin platemembrane + thin plate

Energy function Data on grid –d i,j = u i,j + e i,j (e i,j is N(0,  2 )) –E d (u,d) = 0.5  i,j c i,j (d i,j – u i,j ) 2 (c i,j =  -2 ) Data off grid –d k = h 0,0 u i,j + h 0,1 u i,j+1 + h 1,0 u i+1,j + h 1,1 u i+1,j+1 + e i,j –E d (u,d) = 0.5  k c k (d k, – H k u) 2 In all examples here we assume data on grid –E d (u,d) = 0.5 (u-d) T A d (u-d) –A d =  -2 I : measurement variance assumed constant for all data

Overall energy E(u) = E p (u) + (1- )E d (u,d) (  is regularization factor) = 0.5{ u T A p u + (1- )(u-d) T A d (u-d)} = 0.5u T Au – u T b + const Where A = A p + (1- )A d b = (1- ) A d d Solution for u can be directly obtained by minimizing E(u) u = A -1 b

Minimizing overall energy 1-D ( = 0.5) membrane thin plate From noisy observation No observation noise

Minimizing overall energy 2-D ( = 0.5) OriginalNoisy Added 0 mean unit variance Gaussian noise to all elements

Minimizing overall energy 2-D ( = 0.5) OriginalFrom Noisy membrane thin plate

Minimizing overall energy 2-D ( = 0.5) OriginalNoisy Added 0 mean unit variance Gaussian noise to all elements

Minimizing overall energy 2-D ( = 0.5) OriginalFrom Noisy membrane thin plate

Minimizing energy by Relaxation Direct computation of A -1 is inefficient –Large matrices: for a 256x256 grid, A has size 65536 x 65536 –Sparseness of A not utilized: only a small fraction of elements have non zero values Relaxation replaces inversion of A with many local estimates u i + = a i,i -1 (b i –  a i,j u j ) –Updates can be done in parallel –All local computations very simple –Can be slow to converge

Minimizing energy by relaxation 1-D ( = 0.5) 100 iters 500 iters 1000 iters Membrane

Minimizing energy by relaxation 1-D ( = 0.5) 1000 iters 10000 iters 100000 iters Thin plate: much slower to converge

Minimizing energy by relaxation 2-D ( = 0.5) Original Membrane 1000 iters 10000 iters100000 iters

Minimizing energy by relaxation 2-D ( = 0.5) Original Thin plate: much slower to converge 1000 iters 10000 iters100000 iters

Prior Models A Boltzmann distribution based on the stabilizing function P(u) = K.exp(-E p (u)/T p ) K is a normalizing constant, T p is temperature Samples can be generated by repeated sampling of local distributions: P(u i |u) P(u i |u) = Z i exp(-a i,i -1 (u i – u i + )/2T p ) u i + = a i,i -1 (b i –  a i,j u j ) –This is the local estimate of u i in the relaxation method –The variance of the local sample is T p /a i,i

Samples from prior distribution 1-D Membrane stabilizer based Boltzmann Thin plate stabilizer based Boltzmann

Samples from prior distribution 2-D Membrane prior

Samples from prior distribution 2-D Thin plate prior

Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Initially generate sample for a very coarse grid

Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Interpolate from coarse grid to finer grid, use the interpolated values to initilize gibbs sampling for a less coarse grid.

Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Repeat process on a finer grid

Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Final sample for entire grid

Multigrid sampling of prior distribution Membrane prior Thin plate prior

Sensor models Sparse data model –Uses a simple energy function Assumption: data points are all on grid Only use sparse data model used in examples –Others such as force field models, optical flow, image intensity etc. not simulated for this presentation –Measurement variance assumed constant for all data points

Posterior model Simple Bayes’ rule: P(u|d) = K.exp(-E p (u)/T p - E d (u)) Also a Gibbs distribution –1/T p is the equivalent of the regularization factor T p = (1-  / In following figures only thin plate prior considered

Sampling the posterior model (T=1)

MAP estimation from the Gibbs posterior Restate Gibbs posterior distribution as P(u) = K.exp(-E(u)/T) E(u) is the total energy T again is temperature –Not to be confused with regularization term T p Reduce T with iterations – iteration is defined as a complete sweep through the data Guaranteed convergence to MAP estimate as T goes to 0, provided T does not go down faster than 1/log(iter), where iter is the iteration number –In practice, much faster cooling is possible For simple sparse data sensor model, MAP estimate must be identical to that obtained using relaxation or matrix inversion

MAP estimates from posterior 1-D Relaxation 100000 iters Annealed Gibbs sampling 100000 iters

MAP estimates from posterior 2-D Actual MAP solutionAnnealed Gibbs Sampling based MAP solution

The contaminated Gaussian sensor model Also a sparse data sensor model Assumes measurement error has two modes –1. A high probability, low variance Gaussian –2. A low probability, high variance Gaussian P(d i,j |u) = (1-  )N(u i,j,  1 2 ) +  N(u i,j,  2 2 ) 0.05 >  1 2 Posterior probability is also a mixture of Gaussians (1-  ) P 1 (d i,j |u) +  P 2 (d i,j |u)

Samples from posterior using contaminated Gaussian

MAP estimates of contaminated Gaussian 1-D For contaminated Gaussian there is no closed form estimate MAP estimate –Gibbs sampling provides a MAP estimate MAP estimate using single Gaussian sensor model MAP estimate using contaminated Gaussian sensor model

MAP estimates of contaminated Gaussian 2-D MAP estimate using a single Gaussian sensor model MAP estimate using a contaminated Gaussian sensor model For contaminated Gaussian MAP estimate obtained using annealed Gibbs sampling

Why Bayesian? Bayesian and regularization solutions identical for some models Bayesian approach provides several other advantages –For complex sensor models, e.g. contaminated Gaussian model –Provides uncertainty estimates –Provides handle to estimate optimal regularization factor –Provides formalism for methods such as Kalman filtering –Etc….

Why Bayesian? Uncertainty measurement Blue curve is MAP estimate Red curve shows 1 standard deviation on either side

Why Bayesian? Uncertainty measurement (T=1) Figure is actually a sandwich –Surface in middle is MAP estimate –Boundaries indicate one standard deviation

Why Bayesian? Uncertainty measurement Variance field –For thin plate prior variance is constant except at boundaries –Variance of posterior fluctuates from thin plate variance only at measured data points –Other prior distributions would have prettier variance and covariance fields

Why Bayesian: Optimize regularization factor E(u|d) is a Gaussian Has two terms, 1/sqrt(2   ) and exp(-0.5(u-u)  /   ) -Log (E(u|d)) has two terms –E 1 (d) = 0.5log(2   ) –E 2 (d) = 0.5(u-u)  /   Both terms are functions of   –   is a function of regularization factor As increases E 1 (d) increases, but E 2 (d) decreases There is a specific value of at which E 1 (d) + E 2 (d) is minimum –This is the maximum likelihood estimate of

Why Bayesian: Optimize regularization factor Black curve is MAP estimate without measurement noise = 0.25 = 0.5 = 0.75

Why Bayesian: Optimize regularization factor No measurement noise = 0.25 = 0.5 = 0.75

Why Bayesian: Optimize regularization factor 1-D Optimal log(T) around –1.9 E1+E2 E1 E2 log(T)

Why Bayesian: Optimize regularization factor 1-D For optimal T ( ~= 0.1496)

Why Bayesian: Optimize regulariztion factor 2-D E1+E2 E1 E2 T X axis is T (not log(T)). Optimal T about 2.7 Estimation E1 and E2 requires computation of determinant of A and A p –A p singular for thin plate prior Diagonal loading not sufficient. Compute determinant from eigenvalues to avoid underflow/overflow

Why Bayesian: Optimize regularization factor 2-D No observation noiseMaximum Likelihood estimate

Why Bayesian: Kalman filter Text

Why Bayesian: Kalman filter Animation 1

Why Bayesian: Kalman filter Animation 2

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Similar presentations

Presentation on theme: "Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Similar presentations

Presentation on theme: "Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint."— Presentation transcript:

Similar presentations

About project

Feedback