Download presentation
Presentation is loading. Please wait.
Published bySamson Payne Modified over 8 years ago
1
Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka
2
Motivation Graphical models are widely used in real- world applications, such as wireless communications and bioinformatics. Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. Need a new method that better balances the trade-off between accuracy and efficiency.
3
Motivation Efficiency Accuracy Current Techniques What we want
4
Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work
5
Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions
6
Graphical Models DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2
7
Inference on Graphical Models Bayesian inference techniques: –Belief propagation(BP): Kalman filtering /smoothing, forward-backward algorithm –Monte Carlo: Particle filter/smoothers, MCMC Loopy BP: typically efficient, but not accurate Monte Carlo: accurate, but often not efficient
8
Efficiency vs. Accuracy Efficiency Accuracy EP ? BP MC
9
Expectation Propagation in a Nutshell Approximate a probability distribution by simpler parametric terms: Each approximation term lives in an exponential family (e.g. Gaussian)
10
Update Term Approximation Iterate the fixed-point equation by moment matching: Where the leave-one-out approximation is
11
Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions
12
EP on Dynamic Systems DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2
13
Object Tracking Guess the position of an object given noisy observations Object
14
Bayesian Network (random walk) e.g. want distribution of x’s given y’s x1x1 x2x2 xTxT y1y1 y2y2 yTyT
15
Approximation Factorized and Gaussian in x (proportional)
16
Message Interpretation = (forward msg)(observation)(backward msg) xtxt ytyt Forward Message Backward Message Observation Message
17
EP on Dynamic Systems Filtering: t = 1, …, T –Incorporate forward message –Initialize observation message Smoothing: t = T, …, 1 –Incorporate the backward message –Compute the leave-one-out approximation by dividing out the old observation messages –Re-approximate the new observation messages Re-filtering: t = 1, …, T –Incorporate forward and observation messages
18
Extension of EP Instead of matching moments, use any method for approximate filtering. –Examples: Extended Kalman filter, statistical linearization, unscented filter All methods can be interpreted as finding linear/Gaussian approximations to original terms
19
Example: Poisson Tracking is an integer valued Poisson variate with mean
20
Poisson Tracking Model
21
Approximate Observation Message is not Gaussian Moments of x not analytic Two approaches: –Gauss-Hermite quadrature for moments –Statistical linearization instead of moment- matching Both work well
22
EP Accuracy Improves Significantly in only a few Iterations
23
Approximate vs. Exact Posterior
24
EP vs. Monte Carlo: Accuracy Variance Mean
25
Accuracy/Efficiency Tradeoff
26
EP for Digital Wireless Communication Signal detection problem Transmitted signal s t = vary to encode each symbol Complex representation: Re Im
27
Binary Symbols, Gaussian Noise Symbols are 1 and –1 (in complex plane) Received signal y t = Optimal detection is easy
28
Fading Channel Channel systematically changes amplitude and phase: changes over time
29
Benchmark: Differential Detection Classical technique Use previous observation to estimate state Binary symbols only
30
Bayesian network for Signal Detection x1x1 x2x2 xTxT y1y1 y2y2 yTyT s1s1 s2s2 sTsT
31
On-line EP Joint Signal Detector and Channel Estimation Iterate over the last observations Observations before act as prior for the current estimation
32
Computational Complexity Expectation propagation O(nLd 2 ) Stochastic mixture of Kalman filters O(LMd 2 ) Rao-blackwised paricle smoothers O(LMNd 2 ) n: Number of EP iterations (Typically, 4 or 5) d: Dimension of the parameter vector L: Smooth window length M: Number of samples in filtering N: Number of samples in smoothing
33
Experimental Results EP outperforms particle smoothers in efficiency with comparable accuracy. (Chen, Wang, Liu 2000)
34
Bayesian Networks for Adaptive Decoding x1x1 x2x2 xTxT y1y1 y2y2 yTyT e1e1 e2e2 eTeT The information bits e t are coded by a convolutional error-correcting encoder.
35
EP Outperforms Viterbi Decoding
36
Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions
37
EP on Boltzman machines DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2
38
Inference on Grids Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(x i ), i = 1,..., 16. X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X 13 X 14 X 15 X 16
39
Boltzmann Machines Joint distribution is product of pair potentials: Want to approximate by a simpler distribution
40
BP vs. EP BP EP
41
Junction Tree Representation p(x) q(x) Junction tree
42
Approximating an Edge by a Tree Each potential f a in p is projected onto the tree-structure of q Correlations are not lost, but projected onto the tree
43
Moment Matching Match single and pairwise marginals of Reduces to exact inference on single loops –Use cutset conditioning and
44
Local Propagation Original EP: globally propagate evidence to the whole tree –Problem: Computationally expensive Exploit the junction tree representation: only locally propagate evidence within the minimal subtree that is directly connected to the off-tree edge. – Reduce computational complexity – Save memory
45
x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 3 x 4 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 Global propagation x 3 x 4 Local propagation x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 1 x 2 x 1 x 3 x 1 x 4
46
4-node Graph TreeEP = the proposed method, BP = loopy belief propagation, GBP = generalized belief propagation on triangles, MF = mean-field, TreeVB =variational tree.
47
Fully-connected graphs Results are averaged over 10 graphs with randomly generated potentials TreeEP performs the same or better than all other methods in both accuracy and efficiency!
48
8x8 grids, 10 trials MethodFLOPSError Exact30,0000 TreeEP300,0000.149 BP/double-loop15,500,0000.358 GBP17,500,0000.003
49
TreeEP versus BP and GBP TreeEP is always more accurate than BP and is often faster TreeEP is much more efficient than GBP and more accurate on some problems TreeEP converges more often than BP and GBP
50
Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions
51
EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency Efficiency Accuracy EP
52
Future Work EP is applicable to a wide range of applications EP is sensitive to choice of approximation –How to choose an approximation family (e.g. tree structure) –More flexible approximation: mixture of EP? –Error bound?
53
Future Work DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2
54
End
55
EP versus BP EP approximation is in a restricted family, e.g. Gaussian EP approximation does not have to be factorized EP applies to many more problems –e.g. mixture of discrete/continuous variables
56
EP versus Monte Carlo Monte Carlo is general but expensive EP exploits underlying simplicity of the problem if it exists Monte Carlo is still needed for complex problems (e.g. large isolated peaks) Trick is to know what problem you have
57
(Loopy) Belief propagation Specialize to factorized approximations: Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized) –“send messages” “messages”
58
Limitation of BP If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements I.e. BP equations are not “closed” –Beliefs need not stay within a given family * or any other exponential family *
59
Approximate filtering Compute a Gaussian belief which approximates the true posterior: E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed- density filter
60
EP perspective Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations implies Gaussian
61
EP perspective EKF, UKF, ADF are all algorithms for: Nonlinear, Non-Gaussian Linear, Gaussian
62
Terminology Filtering: p(x t |y 1:t ) Smoothing: p(x t |y 1:t+L ) where L>0 On-line: old data is discarded (fixed memory) Off-line: old data is re-used (unbounded memory)
63
Kalman filtering / Belief propagation Prediction: Measurement: Smoothing:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.