Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Similar presentations


Presentation on theme: "Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint."— Presentation transcript:

1 Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint –Membrane stabilizer: E p (u) = 0.5  i,j [(u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] –Thin plate stabilizer E p (u) = 0.5  i,j [(u i,j+1 + u i,j-1 – 2u i,j ) 2 + (u i+1,j + u i-1,j – 2u i,j ) 2 + (u i+1,j+1 + u i,j – u i+1,j – u i,j+1 ) 2 ] –Linear combinations of the two E d (u,d) : Energy function, measures compatibility between observations and data –E d (u,d) = 0.5  i,j c i,j (d i,j – u i,j ) 2 –c i,j is the inverse of the variance in measurement d i,j

2 Stabilizing function – membrane stabilizer u i,j j i E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ]

3 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i

4 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i

5 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i ATOM

6 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 1 ATOM u i,j u i,j + u i,j+1 u i,j+1 – u i,j u i,j+1 – u i,j+1 u i,j

7 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 2 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j

8 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 3 ATOM u i,j u i,j + u i,j+1 u i,j+1 – u i,j u i,j+1 – u i,j+1 u i,j

9 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j

10 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j = u

11 Stabilizing function – membrane stabilizer E p (u) = 0.5  i,j [ (u i,j+1 – u i,j ) 2 + (u i+1,j – u i,j ) 2 ] u i,j j i 4 ATOM u i,j u i,j + u i+1,j u i+1,j – u i,j u i+1,j – u i+1,j u i,j E p (u) = 0.5u T A p u Rows of A p have the form 0 0 0 –1 0 0 …. 0 –1 4 –1 0 … 0 0 –1 0.. = u

12 Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i

13 Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i

14 Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i

15 Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i

16 Stabilizing function – thin plate stabilizer E p (u) = 0.5  i,j {[(u i,j+1 – u i,j ) + (u i,j-1 – u i,j )] 2 + [(u i+1,j – u i,j ) + (u i-1,j – u i,j )] 2 + 2[(u i+1,j+1 – u i,j ) + (u i,j – u i+1,j ) + (u i,j –u i,j+1 )] 2 } u i,j j i 20 -8 2 22 2 1 11 1 ATOM: E p (u) = 0.5u T A p u Rows of A p have the form 0 0 1 0 0... 0 2 –8 2 0 0.. 1 –8 20 –8 1 0 0 …0 0 2 –8 2 0.. 0 –1 0..

17 Stabilizing function – Examples (1-D) points membrane thin platethin plate + membrane

18 Stabilizing function – Examples (2-D) Samples from u membrane thin platemembrane + thin plate

19 Stabilizing function – Examples (2-D) Samples from u membrane thin platemembrane + thin plate

20 Energy function Data on grid –d i,j = u i,j + e i,j (e i,j is N(0,  2 )) –E d (u,d) = 0.5  i,j c i,j (d i,j – u i,j ) 2 (c i,j =  -2 ) Data off grid –d k = h 0,0 u i,j + h 0,1 u i,j+1 + h 1,0 u i+1,j + h 1,1 u i+1,j+1 + e i,j –E d (u,d) = 0.5  k c k (d k, – H k u) 2 In all examples here we assume data on grid –E d (u,d) = 0.5 (u-d) T A d (u-d) –A d =  -2 I : measurement variance assumed constant for all data

21 Overall energy E(u) = E p (u) + (1- )E d (u,d) (  is regularization factor) = 0.5{ u T A p u + (1- )(u-d) T A d (u-d)} = 0.5u T Au – u T b + const Where A = A p + (1- )A d b = (1- ) A d d Solution for u can be directly obtained by minimizing E(u) u = A -1 b

22 Minimizing overall energy 1-D ( = 0.5) membrane thin plate From noisy observation No observation noise

23 Minimizing overall energy 2-D ( = 0.5) OriginalNoisy Added 0 mean unit variance Gaussian noise to all elements

24 Minimizing overall energy 2-D ( = 0.5) OriginalFrom Noisy membrane thin plate

25 Minimizing overall energy 2-D ( = 0.5) OriginalNoisy Added 0 mean unit variance Gaussian noise to all elements

26 Minimizing overall energy 2-D ( = 0.5) OriginalFrom Noisy membrane thin plate

27 Minimizing energy by Relaxation Direct computation of A -1 is inefficient –Large matrices: for a 256x256 grid, A has size 65536 x 65536 –Sparseness of A not utilized: only a small fraction of elements have non zero values Relaxation replaces inversion of A with many local estimates u i + = a i,i -1 (b i –  a i,j u j ) –Updates can be done in parallel –All local computations very simple –Can be slow to converge

28 Minimizing energy by relaxation 1-D ( = 0.5) 100 iters 500 iters 1000 iters Membrane

29 Minimizing energy by relaxation 1-D ( = 0.5) 1000 iters 10000 iters 100000 iters Thin plate: much slower to converge

30 Minimizing energy by relaxation 2-D ( = 0.5) Original Membrane 1000 iters 10000 iters100000 iters

31 Minimizing energy by relaxation 2-D ( = 0.5) Original Thin plate: much slower to converge 1000 iters 10000 iters100000 iters

32 Prior Models A Boltzmann distribution based on the stabilizing function P(u) = K.exp(-E p (u)/T p ) K is a normalizing constant, T p is temperature Samples can be generated by repeated sampling of local distributions: P(u i |u) P(u i |u) = Z i exp(-a i,i -1 (u i – u i + )/2T p ) u i + = a i,i -1 (b i –  a i,j u j ) –This is the local estimate of u i in the relaxation method –The variance of the local sample is T p /a i,i

33 Samples from prior distribution 1-D Membrane stabilizer based Boltzmann Thin plate stabilizer based Boltzmann

34 Samples from prior distribution 2-D Membrane prior

35 Samples from prior distribution 2-D Thin plate prior

36 Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Initially generate sample for a very coarse grid

37 Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Interpolate from coarse grid to finer grid, use the interpolated values to initilize gibbs sampling for a less coarse grid.

38 Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Repeat process on a finer grid

39 Sampling prior distributions Samples are fractal Tend to favour high frequencies Multi-grid sampling to get smoother samples: Final sample for entire grid

40 Multigrid sampling of prior distribution Membrane prior Thin plate prior

41 Sensor models Sparse data model –Uses a simple energy function Assumption: data points are all on grid Only use sparse data model used in examples –Others such as force field models, optical flow, image intensity etc. not simulated for this presentation –Measurement variance assumed constant for all data points

42 Posterior model Simple Bayes’ rule: P(u|d) = K.exp(-E p (u)/T p - E d (u)) Also a Gibbs distribution –1/T p is the equivalent of the regularization factor T p = (1-  / In following figures only thin plate prior considered

43 Sampling the posterior model (T=1)

44 MAP estimation from the Gibbs posterior Restate Gibbs posterior distribution as P(u) = K.exp(-E(u)/T) E(u) is the total energy T again is temperature –Not to be confused with regularization term T p Reduce T with iterations – iteration is defined as a complete sweep through the data Guaranteed convergence to MAP estimate as T goes to 0, provided T does not go down faster than 1/log(iter), where iter is the iteration number –In practice, much faster cooling is possible For simple sparse data sensor model, MAP estimate must be identical to that obtained using relaxation or matrix inversion

45 MAP estimates from posterior 1-D Relaxation 100000 iters Annealed Gibbs sampling 100000 iters

46 MAP estimates from posterior 2-D Actual MAP solutionAnnealed Gibbs Sampling based MAP solution

47 The contaminated Gaussian sensor model Also a sparse data sensor model Assumes measurement error has two modes –1. A high probability, low variance Gaussian –2. A low probability, high variance Gaussian P(d i,j |u) = (1-  )N(u i,j,  1 2 ) +  N(u i,j,  2 2 ) 0.05 >  1 2 Posterior probability is also a mixture of Gaussians (1-  ) P 1 (d i,j |u) +  P 2 (d i,j |u)

48 Samples from posterior using contaminated Gaussian

49 MAP estimates of contaminated Gaussian 1-D For contaminated Gaussian there is no closed form estimate MAP estimate –Gibbs sampling provides a MAP estimate MAP estimate using single Gaussian sensor model MAP estimate using contaminated Gaussian sensor model

50 MAP estimates of contaminated Gaussian 2-D MAP estimate using a single Gaussian sensor model MAP estimate using a contaminated Gaussian sensor model For contaminated Gaussian MAP estimate obtained using annealed Gibbs sampling

51 Why Bayesian? Bayesian and regularization solutions identical for some models Bayesian approach provides several other advantages –For complex sensor models, e.g. contaminated Gaussian model –Provides uncertainty estimates –Provides handle to estimate optimal regularization factor –Provides formalism for methods such as Kalman filtering –Etc….

52 Why Bayesian? Uncertainty measurement Blue curve is MAP estimate Red curve shows 1 standard deviation on either side

53 Why Bayesian? Uncertainty measurement (T=1) Figure is actually a sandwich –Surface in middle is MAP estimate –Boundaries indicate one standard deviation

54 Why Bayesian? Uncertainty measurement Variance field –For thin plate prior variance is constant except at boundaries –Variance of posterior fluctuates from thin plate variance only at measured data points –Other prior distributions would have prettier variance and covariance fields

55 Why Bayesian: Optimize regularization factor E(u|d) is a Gaussian Has two terms, 1/sqrt(2   ) and exp(-0.5(u-u)  /   ) -Log (E(u|d)) has two terms –E 1 (d) = 0.5log(2   ) –E 2 (d) = 0.5(u-u)  /   Both terms are functions of   –   is a function of regularization factor As increases E 1 (d) increases, but E 2 (d) decreases There is a specific value of at which E 1 (d) + E 2 (d) is minimum –This is the maximum likelihood estimate of

56 Why Bayesian: Optimize regularization factor Black curve is MAP estimate without measurement noise = 0.25 = 0.5 = 0.75

57 Why Bayesian: Optimize regularization factor No measurement noise = 0.25 = 0.5 = 0.75

58 Why Bayesian: Optimize regularization factor 1-D Optimal log(T) around –1.9 E1+E2 E1 E2 log(T)

59 Why Bayesian: Optimize regularization factor 1-D For optimal T ( ~= 0.1496)

60 Why Bayesian: Optimize regulariztion factor 2-D E1+E2 E1 E2 T X axis is T (not log(T)). Optimal T about 2.7 Estimation E1 and E2 requires computation of determinant of A and A p –A p singular for thin plate prior Diagonal loading not sufficient. Compute determinant from eigenvalues to avoid underflow/overflow

61 Why Bayesian: Optimize regularization factor 2-D No observation noiseMaximum Likelihood estimate

62 Why Bayesian: Kalman filter Text

63 Why Bayesian: Kalman filter Animation 1

64 Why Bayesian: Kalman filter Animation 2


Download ppt "Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint."

Similar presentations


Ads by Google