Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Similar presentations

Presentation on theme: "Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi."— Presentation transcript:

1 Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi

2 2 Outline Motivation Inference in graphical models Exact inference is intractable Variational methodology –Sequential approach –Block approach Conclusions

3 3 Motivation (Example: Medical Diagnosis) symptoms diseases What is the most probable disease?

4 4 Motivation We want to answer some queries about our data Graphical model is a way to model data Inference in some graphical models is intractable (NP-hard) Variational methods simplify the inference in graphical models by using approximation

5 5 Graphical Models Directed (Bayesian network) Undirected S1S1 S3S3 S5S5 S4S4 S2S2 P(S 2 ) P(S 1 ) P(S 5 |S 3,S 4 ) P(S 3 |S 1,S 2 ) P(S 4 |S 3 )  (C 1 )  (C 2 )  (C 3 )

6 6 Inference in Graphical Models Inference: Given a graphical model, the process of computing answers to queries How computationally hard is this decision problem? Theorem: Computing P(X = x) in a Bayesian network is NP-hard

7 7 Why Exact Inference is Intractable? symptoms diseases Diagnose the most probable disease

8 8 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

9 9 Why Exact Inference is Intractable? symptoms diseases :Noisy-OR model 101

10 10 Why Exact Inference is Intractable? symptoms diseases : Noisy-OR model 101

11 11 Why Exact Inference is Intractable?

12 12 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

13 13 Why Exact Inference is Intractable? symptoms diseases : Observed symptoms

14 14 Reducing the Computational Complexity Variational Methods Simple graph for exact methods Approximate the probability distribution Use the role of convexity

15 15 Express a Function Variationally is a concave function

16 16 Express a Function Variationally is a concave function

17 17 Express a Function Variationally If the function is not convex or concave: transform the function to a desired form Example: logistic function Transformation Approximation Transforming back

18 18 Approaches to Variational Methods Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process Block Approach: (off-line) has obvious substructures

19 19 Sequential Approach (Two Methods) Untransformed Graph Transform one node at a time Simple Graph for exact methods Reintroduce one node at a time Simple Graph for exact methods Completely transformed Graph

20 20 Sequential Approach (Example) symptoms diseases Log Concave

21 21 Sequential Approach (Example) symptoms diseases Log Concave

22 22 Sequential Approach (Example) symptoms diseases 1

23 23 Sequential Approach (Example) symptoms diseases 1

24 24 Sequential Approach (Example) symptoms diseases 1

25 25 Sequential Approach (Upper Bound and Lower Bound) We need both lower bound and upper bound

26 26 How to Compute Lower Bound for a Concave Function? Lower bound for concave functions: Variational parameter is probability distribution

27 27 Block Approach (Overview) Off-line application of sequential approach –Identify some structure amenable to exact inference –Family of probability distribution via introduction of parameters –Choose best approximation based on evidence

28 28 Block Approach (Details) KL divergence Family of Minimize KL divergence

29 29 Block Approach (Example – Boltzmann machine) SiSi SjSj

30 30 Block Approach (Example – Boltzmann machine) SiSi S j =1

31 31 Block Approach (Example – Boltzmann machine) sisi sjsj

32 32 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence

33 33 Block Approach (Example – Boltzmann machine) sisi sjsj Minimize KL Divergence Mean field equations: solve for fixed point

34 34 Conclusions Time or space complexity of exact calculation is unacceptable Complex graphs can be probabilistically simple Inference in simplified models provides bounds on probabilities in the original model

35 35

36 36 Extra Slides

37 37 Concerns Approximation accuracy Strong dependencies can be identified Not based on convexity transformation Not able to assure that the framework will transfer to other examples Not straightforward to develop a variational approximation for new architectures

38 38 Justification for KL Divergence Best lower bound on the probability of the evidence

39 39 EM Maximum likelihood parameter estimation: Following function is the lower bound on log likelihood KL Divergence between Q(H|E) and P(H|E,  )

40 40 EM 1.Maximize the bound with respect to Q 2.Fix Q, maximize with respect to Traditional EM Approximation to EM algorithm

41 41 Principle of Inference DAG Junction Tree Inconsistent Junction Tree Initialization Consistent Junction Tree Propagation Marginalization

42 42 Example: Create Join Tree X1X2 Y1Y2 HMM with 2 time steps: Junction Tree: X1,X2 X1,Y1 X2,Y2 X1 X2

43 43 Example: Initialization Variable Associated Cluster Potential function X1X1,Y1 Y1X1,Y1 X2X1,X2 Y2X2,Y2 X1,X2 X1,Y1 X2,Y2 X1 X2

44 44 Example: Collect Evidence Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. Call recursively neighboring cliques for messages: 1. Call X1,Y1. –1. Projection: –2. Absorption:

45 45 Example: Collect Evidence (cont.) 2. Call X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2

46 46 Example: Distribute Evidence Pass messages recursively to neighboring nodes Pass message from X1,X2 to X1,Y1: –1. Projection: –2. Absorption:

47 47 Example: Distribute Evidence (cont.) Pass message from X1,X2 to X2,Y2: –1. Projection: –2. Absorption: X1,X2 X1,Y1 X2,Y2 X1 X2

48 48 Example: Inference with evidence Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation) Assign likelihoods to the potential functions during initialization:

49 49 Example: Inference with evidence (cont.) Repeating the same steps as in the previous case, we obtain:

50 50 Variable Elimination General idea: Write query in the form Iteratively –Move all irrelevant terms outside of innermost sum –Perform innermost sum, getting a new term –Insert the new term into the product

51 51 Complexity of variable elimination Suppose in one elimination step we compute This requires multiplications additions Complexity is exponential in number of variables in the intermediate factor

52 52 Chordal Graphs elimination ordering  undirected chordal graph Graph: Maximal cliques are factors in elimination Factors in elimination are cliques in the graph Complexity is exponential in size of the largest clique in graph L T A B X V S D V S L T A B XD

53 53 Induced Width The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity is called the induced width of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

54 54 Properties of Junction Trees In every junction tree: –For each cluster (or sepset), –The probability distribution of any variable, using any cluster (or sepset) that contains

55 55 Exact inference Using Junction Trees Undirected tree Each node is a cluster Running intersection property: –Given two clusters and, all clusters on the path between and contain Separator sets (sepsets): –Intersection of adjacent clusters ADEABD DEF ADDE Cluster ABD Sepset DE

56 56 Constructing Junction Trees Marrying Parents X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

57 57 Moral Graph X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

58 58 Triangulation X4X4 X6X6 X5X5 X3X3 X2X2 X1X1

59 59 Identify Cliques X4X4 X6X6 X5X5 X3X3 X2X2 X1X1 X2X5X6X2X5X6 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4

60 60 Junction Tree Junction tree is a subgraph of the clique graph satisfying the running intersection property X1X2X3X1X2X3 X2X5X6X2X5X6 X2X3X5X2X3X5 X2X3X2X3 X2X5X2X5 X2X2 X2X5X6X2X5X6 X2X4X2X4 X1X2X3X1X2X3 X2X3X5X2X3X5 X2X4X2X4

61 61 Constructing Junction Trees DAG Moral GraphTriangulated GraphJunction TreeIdentify Cliques

62 62 Sequential Approach (Example) Lower bound for medical diagnosis ex:

Download ppt "Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi."

Similar presentations

Ads by Google