Download presentation
Presentation is loading. Please wait.
Published byGrady How Modified over 10 years ago
1
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi
2
2 Outline Motivation Inference in graphical models Exact inference is intractable Variational methodology –Sequential approach –Block approach Conclusions
3
3 Motivation (Example: Medical Diagnosis) symptoms diseases What is the most probable disease?
4
4 Motivation We want to answer some queries about our data Graphical model is a way to model data Inference in some graphical models is intractable (NP-hard) Variational methods simplify the inference in graphical models by using approximation
5
5 Graphical Models Directed (Bayesian network) Undirected S1S1 S3S3 S5S5 S4S4 S2S2 P(S 2 ) P(S 1 ) P(S 5 |S 3,S 4 ) P(S 3 |S 1,S 2 ) P(S 4 |S 3 ) (C 1 ) (C 2 ) (C 3 )
6
6 Inference in Graphical Models Inference: Given a graphical model, the process of computing answers to queries How computationally hard is this decision problem? Theorem: Computing P(X = x) in a Bayesian network is NP-hard
7
7 Why Exact Inference is Intractable? symptoms diseases Diagnose the most probable disease
8
8 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms
9
9 Why Exact Inference is Intractable? symptoms diseases :Noisy-OR model 101
10
10 Why Exact Inference is Intractable? symptoms diseases : Noisy-OR model 101
11
11 Why Exact Inference is Intractable?
12
12 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms
13
13 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms
14
14 Reducing the Computational Complexity Variational Methods Simple graph for exact methods Approximate the probability distribution Use the role of convexity
15
15 Express a Function Variationally is a concave function
16
16 Express a Function Variationally is a concave function
17
17 Express a Function Variationally If the function is not convex or concave: transform the function to a desired form Example: logistic function Transformation Approximation Transforming back
18
18 Approaches to Variational Methods Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process Block Approach: (off-line) has obvious substructures
19
19 Sequential Approach (Two Methods) Untransformed Graph Transform one node at a time Simple Graph for exact methods Reintroduce one node at a time Simple Graph for exact methods Completely transformed Graph
20
20 Sequential Approach (Example) symptoms diseases Log Concave
21
21 Sequential Approach (Example) symptoms diseases Log Concave
22
22 Sequential Approach (Example) symptoms diseases 1
23
23 Sequential Approach (Example) symptoms diseases 1
24
24 Sequential Approach (Example) symptoms diseases 1
25
25 Sequential Approach (Upper Bound and Lower Bound) We need both lower bound and upper bound
26
26 How to Compute Lower Bound for a Concave Function? Lower bound for concave functions: Variational parameter is probability distribution
27
27 Block Approach (Overview) Off-line application of sequential approach –Identify some structure amenable to exact inference –Family of probability distribution via introduction of parameters –Choose best approximation based on evidence
28
28 Block Approach (Details) KL divergence Family of Minimize KL divergence
29
29 Block Approach (Example – Boltzmann machine) SiSi SjSj
30
30 Block Approach (Example – Boltzmann machine) SiSi S j =1
31
31 Block Approach (Example – Boltzmann machine) sisi sjsj
32
32 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence
33
33 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence Mean field equations: solve for fixed point
34
34 Conclusions Time or space complexity of exact calculation is unacceptable Complex graphs can be probabilistically simple Inference in simplified models provides bounds on probabilities in the original model
35
35
36
36 Extra Slides
37
37 Concerns Approximation accuracy Strong dependencies can be identified Not based on convexity transformation Not able to assure that the framework will transfer to other examples Not straightforward to develop a variational approximation for new architectures
38
38 Justification for KL Divergence Best lower bound on the probability of the evidence
39
39 EM Maximum likelihood parameter estimation: Following function is the lower bound on log likelihood KL Divergence between Q(H|E) and P(H|E, )
40
40 EM 1.Maximize the bound with respect to Q 2.Fix Q, maximize with respect to Traditional EM Approximation to EM algorithm
41
41 Principle of Inference DAG Junction Tree Inconsistent Junction Tree Initialization Consistent Junction Tree Propagation Marginalization
42
42 Example: Create Join Tree X1X2 Y1Y2 HMM with 2 time steps: Junction Tree: X1,X2 X1,Y1 X2,Y2 X1 X2
43
43 Example: Initialization Variable Associated Cluster Potential function X1X1,Y1 Y1X1,Y1 X2X1,X2 Y2X2,Y2 X1,X2 X1,Y1 X2,Y2 X1 X2
44
44 Example: Collect Evidence Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. Call recursively neighboring cliques for messages: 1. Call X1,Y1. –1. Projection: –2. Absorption:
45
45 Example: Collect Evidence (cont.) 2. Call X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2
46
46 Example: Distribute Evidence Pass messages recursively to neighboring nodes Pass message from X1,X2 to X1,Y1: –1. Projection: –2. Absorption:
47
47 Example: Distribute Evidence (cont.) Pass message from X1,X2 to X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2
48
48 Example: Inference with evidence Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation) Assign likelihoods to the potential functions during initialization:
49
49 Example: Inference with evidence (cont.) Repeating the same steps as in the previous case, we obtain:
50
50 Variable Elimination General idea: Write query in the form Iteratively –Move all irrelevant terms outside of innermost sum –Perform innermost sum, getting a new term –Insert the new term into the product
51
51 Complexity of variable elimination Suppose in one elimination step we compute This requires multiplications additions Complexity is exponential in number of variables in the intermediate factor
52
52 Chordal Graphs elimination ordering undirected chordal graph Graph: Maximal cliques are factors in elimination Factors in elimination are cliques in the graph Complexity is exponential in size of the largest clique in graph L T A B X V S D V S L T A B XD
53
53 Induced Width The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity is called the induced width of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph
54
54 Properties of Junction Trees In every junction tree: –For each cluster (or sepset), –The probability distribution of any variable, using any cluster (or sepset) that contains
55
55 Exact inference Using Junction Trees Undirected tree Each node is a cluster Running intersection property: –Given two clusters and, all clusters on the path between and contain Separator sets (sepsets): –Intersection of adjacent clusters ADEABD DEF ADDE Cluster ABD Sepset DE
56
56 Constructing Junction Trees Marrying Parents X4X4 X6X6 X5X5 X3X3 X2X2 X1X1
57
57 Moral Graph X4X4 X6X6 X5X5 X3X3 X2X2 X1X1
58
58 Triangulation X4X4 X6X6 X5X5 X3X3 X2X2 X1X1
59
59 Identify Cliques X4X4 X6X6 X5X5 X3X3 X2X2 X1X1 X2X5X6X2X5X6 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4
60
60 Junction Tree Junction tree is a subgraph of the clique graph satisfying the running intersection property X1X2X3X1X2X3 X2X5X6X2X5X6 X2X3X5X2X3X5 X2X3X2X3 X2X5X2X5 X2X2 X2X5X6X2X5X6 X2X4X2X4 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4
61
61 Constructing Junction Trees DAG Moral GraphTriangulated GraphJunction TreeIdentify Cliques
62
62 Sequential Approach (Example) Lower bound for medical diagnosis ex:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.