Download presentation
Presentation is loading. Please wait.
Published byMervyn Stone Modified over 6 years ago
1
Expectation-Propagation performs smooth gradient descent Advances in Approximate Bayesian Inference 2016 Guillaume Dehaene Iβd like to thank the organizers for inviting me. Iβm going to present some new work on expectation propagation, which shines a new light on this algorithm, by showing that it performs a smoothed gradient descent
2
Computational troubles in Bayesia
If we want to approximate π π½ : Gaussian approximations: Laplace approximation + Gradient Descent Variational Bayes (and a variant) Expectation Propagation The problem: Bayesian methods are susceptible to computational trouble You should think of the first choice as conservative, and a bit boring, sort of like Jeb Bush While the next two are the opposite of that, so maybe like 2007 Senator Barrack Obama What I will show in this talk is that these three methods are closely linked.
3
Laplace + Gradient Descent
Laplace = Gaussian approximation at the mode Computed using Gradient Descent on π=β log π Probability So here is one example of a posterior distribution The laplace approximation consists in finding its maximum and fitting a purely local Gaussian approximation there π
4
Laplace + Gradient Descent
Laplace = Gaussian approximation at the mode Computed using Gradient Descent on π=β log π The mathematically conservative choice: Gradient Descent is well-understood Laplace is exact in the large-data limit
5
Physical intuitions Gradient Descent β dynamics of a sliding object
- Log probability - Log probability Final great feature: Gradient descent appeals to our physical intuitions because it matches the dynamics of an object sliding down a slope
6
Linking GD, VB and EP VB and EP iterate Gaussian approximations
We can define an algorithm that: Iterates Gaussian Computes the Laplace Does Gradient Descent
7
Algorithm 1: disguised gradient descent
Initialize with any Gaussian π 0 Loop: π π = πΈ π π π π= π β² π π π½= π β²β² π π π π+1 π β exp βπ πβ π π β π½ 2 πβ π π 2 π π+π = π π β π β² π π π β²β² π π This is Newtonβs method !
8
Algorithm 1: disguised gradient descent
Newtonβs method DGD πβππ’πππππ‘ππ πβ exp βππ’πππππ‘ππ πβπΊππ’π π πππ
9
Variational Bayes Gaussian approximation
The Variational Bayes approach: Minimize KL π,π = πΈ π log π π for π a Gaussian Local minima respect (Opper, Archambeau, 2007): πΈ π β π β² =0 πΈ π β π β²β² = π£π π π β β1
10
Algorithm 2: smoothed gradient descent
Initialize with any Gaussian π 0 Loop: π π = πΈ π π π π= πΈ π π π β² π π½= πΈ π π π β²β² π π π+1 π β exp βπ πβ π π β π½ 2 πβ π π 2 β π β² ( π π ) β π β²β² ( π π )
11
Algorithm 2: smoothed gradient descent
12
πΌ-Divergence minimization
If instead of KL, we minimize: π· πΌ π,π =β« π 1βπΌ π πΌ Then, local minima π β are such that: β β β π 1βπΌ π β πΌ πΈ β β π β² =0 πΈ β β πβ π β π β² =1
13
Algorithm 3: hybrid smoothing GD
Initialize with any Gaussian π 0 Loop: β π β π 1βπΌ π π πΌ π π = πΈ β π π π= πΈ β π π β² π π½= π£π π β π β1 πΈ β π πβ π β π β² π π π+1 π β exp βπ πβ π π β π½ 2 πβ π π 2 β π β² ( π π ) β π β²β² ( π π ) β π¬ π π π β²β² = ππ π π π βπ π¬ π π π½β π π π β²
14
Interpreting algorithm 3
The only difference (not obvious for π½-term): Replacing π π , a poor approximation to π By a superior hybrid approximation: β π β π 1βπΌ π π πΌ βπ
15
Expectation Propagation
Assume that the target can be factorized: π π β π π π π Then EP seeks a Gaussian approximation for each π π : π π π β π π π They are improved iteratively
16
Algorithm 4: classic Expectation Propagation
Loop: Compute the π π‘β hybrid: β π β π π π πβ π π π π and its mean and variance: π π = πΈ β π π π£ π =π£π π β π New π π‘β approximation: π π π = exp β πβ π π π£π π β π πβ π π π π βπ β π π β π π
17
Algorithm 5: smooth EP Factorizing π has split the energy landscape: π π = i π π π For each component π π π , use a different smoothing: β π β π π πβ π π π βπ Then, update π π β π π = exp (β π π )
18
π π = πΈ β π π π= πΈ β π π π β² π π½= π£π π β π β1 πΈ β π πβ π π π π β² π
Algorithm 5: smooth EP Initialize with any Gaussians π 1 , π 2 β¦ π π Loop: β π β π π πβ π π π π π = πΈ β π π π= πΈ β π π π β² π π½= π£π π β π β1 πΈ β π πβ π π π π β² π π π π β exp βπ πβ π π β π½ 2 πβ π π 2
19
Classic vs Smooth EP Algorithm 4: Computationally efficient
Completely unintuitive Algorithm 5: Intuitive: linked to Newtonβs method Tractable to analysis Which should we choose?
20
Conclusion Algorithm 1: iterating on Gaussians to perform GD Algorithm 2: smoothed GD computes VB approx. Algorithm 3: hybrid smoothing compute π· πΌ approx Algorithm 5: complicated hybrid smoothing which computes EP approximation We can re-use our understanding of Newtonβs when we think of EP Possible path towards improved EP algorithms?
21
Conclusion This might prove a path towards theoretical results on EP:
Intuitively proves the link between EP and VB: The only difference between 2 and 5: π π or β π smoothing In the limit where all β π β π π , EP β VB Corresponds to a large-number of weak factors I hope that this can help you understand EP better and apply it on your own problems. Thank you very much for your attention.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.