SURVEY: Foundations of Bayesian Networks

Slides:

Advertisements

Similar presentations

A Tutorial on Learning with Bayesian Networks

Advertisements

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Lauritzen-Spiegelhalter Algorithm

Biointelligence Laboratory, Seoul National University

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

Supervised Learning Recap

An Introduction to Variational Methods for Graphical Models.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

From Variable Elimination to Junction Trees

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

CS774. Markov Random Field : Theory and Application Lecture 06 Kyomin Jung KAIST Sep

Visual Recognition Tutorial

. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Recent Development on Elimination Ordering Group 1.

Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Maximum Likelihood (ML), Expectation Maximization (EM)

Visual Recognition Tutorial

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Inferring structure to make substantive conclusions: How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC,

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

An Introduction to Variational Methods for Graphical Models

Lecture 2: Statistical learning primer for biologists

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Machine Learning 5. Parametric Methods.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Today Graphical Models Representing conditional dependence graphically

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

An introduction to chordal graphs and clique trees

Inference in Bayesian Networks

Irina Rish IBM T.J.Watson Research Center

Exact Inference Continued

Latent Variables, Mixture Models and EM

CSCI 5822 Probabilistic Models of Human and Machine Learning

Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)

Bayesian Models in Machine Learning

Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)

CSCI 5822 Probabilistic Models of Human and Machine Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Autumn 2015 Lecture 10 Minimum Spanning Trees

An Algorithm for Bayesian Network Construction from Data

Bayesian Learning Chapter

Parameter Learning 2 Structure Learning 1: The good

Chapter 20. Learning and Acting with Bayes Nets

Parametric Methods Berlin Chen, 2005 References:

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)

Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)

Presentation transcript:

SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Contents From DAG to Junction Tree From Elimination Tree to Junction Tree Junction Tree Algorithms Learning Bayesian Networks Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Typical Example of DAG A B C F D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort Algorithm 4.1 [Topological sort] Begin with all vertices unnumbered. Set counter i = 1. While any vertices remain: Select any vertex that has no parents; number the selected vertex as i; delete the numbered vertex and all its adjacent edges from the graph; increment i by 1. Objective: acquiring well-ordering Well-ordering: predecessors of any node  have lower number than . Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (1) 1 A B C F D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (2) 1 A 2 B C F D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (3) 1 A 2 3 B C F D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (4) 1 A 2 3 B C F 4 D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (5) 1 A 2 3 B C F 5 4 D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1. Topological Sort (6) 1 A 2 3 B C F 5 4 D 6 G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2. Moral Graph Making moral graph of DAG Add undirected edge between the nodes which have same child. Remove directions Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2. Moral Graph (1) 1 A 2 3 B C F 5 4 D 6 G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2. Moral Graph (2) 1 A 2 3 B C F 5 4 D 6 G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Junction tree Definition Tree from nodes C1, C2,... Intersection of C1 and C2 is contained in every node on path between C1 and C2. Corollaries Decomposable, chordal, junction tree of cliques, perfect numbering: all are equal in undirected graph. Perfect numbering: ne(vj)  {v1, ..., vj-1} induce complete subgraph. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (1) Algorithm 4.9 [Maximum Cardinality Search] Set Output := ‘G is chordal’. Set counter i := 1. Set L = . For all v  V, set c(v) := 0. While L  V: Set U := V \ L. Select any vertex v maximizing c(v) over v  V and label it i. If vi :=ne(vi)  L is not complete in G: Set Output :=‘G is not chordal’. Otherwise, set c(w) = c(w) + 1 for each vertex w  ne(vi)  U. Set L = L  {vi}. Increment i by 1. Report Output. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (2) B C F D G Simple DAG Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (2) 1, ={} A . . B C F . D G Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (3) 1, = A 2, ={A} .. B C . F .. D G Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (4) 1, = A 2, ={A} 3, ={A, B} B C .. F .. D G Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (5) 1, = A 2, ={A} 3, ={A, B} B C .. F 4, ={A, B} D G Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (6) 1, = A 2, ={A} 3, ={A, B} B C F 5, ={B, C} 4, ={A, B} D . G Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (7) 1, = A 2, ={A} 3, ={A, B} B C F 5, ={B, C} 4, ={A, B} D G 6, ={F} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

3. Maximum Cardinality Search (8) 1, = A 2, ={A} 3, ={A, B} B C F 5, ={B, C} 4, ={A, B} D G 6, ={F} Output = “G is chordal” Copyright (c) 2002 by SNU CSE Biointelligence Lab.

4. Cliques of Chordal Graph (1) Algorithm 4.11 [Finding the Cliques of a Chordal Graph] From numbering (v1,..., vk) obtained by maximum cardinality search i = cardinality of vi Make ladder nodes. i = ladder node if i = k or i = ladder node if i < k and i+1 < 1 + i Define cliques Cj = {j}  j C1, C2... Posess RIP (running intersection property). Copyright (c) 2002 by SNU CSE Biointelligence Lab.

4. Cliques of Chordal Graph (2) 1, = A 2, ={A} 3, ={A, B} B C C1 = {A, B, C} F 5, ={B, C} C3 = {B, C, F} 4, ={A, B} D C2 = {A, B, D} G 6, ={F} C4 = {F, G} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Running Intersection Property RIP : definition Given (C1, C2, ..., Ck), For all 1 < j  k, there is an i < j such that Cj  (C1 ...  Cj-1)  Ci. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

5. Junction Tree Construction (1) Algorithm 4.8 [Junction Tree Construction] From the cliques (C1, ..., Cp) of a chordal graph ordered with RIP, Associate a node of the tree with each clique Cj. For j = 2, ..., p, add an edge between Cj and Ci where i is any one value in {1, ..., j-1} such that Cj  (C1 ...  Cj-1)  Ci. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

5. Junction Tree Construction (2) 1, = C1 ABC A 2, ={A} 3, ={A, B} C2 ABD B C C1 = {A, B, C} C3 BCF F 5, ={B, C} C3 = {B, C, F} 4, ={A, B} D C4 FG C2 = {A, B, D} G 6, ={F} C4 = {F, G} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

5. Junction Tree Construction (3) 1, = C1 ABC A 2, ={A} 3, ={A, B} C2 ABD B C C1 = {A, B, C} C3 BCF F 5, ={B, C} C3 = {B, C, F} 4, ={A, B} D C4 FG C2 = {A, B, D} G 6, ={F} C4 = {F, G} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

5. Junction Tree Construction (4) 1, = C1 ABC A 2, ={A} 3, ={A, B} C2 ABD B C C1 = {A, B, C} C3 BCF F 5, ={B, C} C3 = {B, C, F} 4, ={A, B} D C4 FG C2 = {A, B, D} G 6, ={F} C4 = {F, G} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

5. Junction Tree Construction (5) 1, = C1 ABC A 2, ={A} 3, ={A, B} C2 ABD B C C1 = {A, B, C} C3 BCF F 5, ={B, C} C3 = {B, C, F} 4, ={A, B} D C4 FG C2 = {A, B, D} G 6, ={F} C4 = {F, G} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Contents From DAG to Junction Tree From Elimination Tree to Junction Tree Junction Tree Algorithms Learning Bayesian Networks Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (1) When need triangulation? If MCS (Maximum Cardinality Search) failed. Triangulation introduces Fill-in. produces perfect numbering. Optimal triangulation: NP-hard Size of each cliques matters... Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (2) Algorithm 4.13 [One-step Look Ahead Triangulation] Start with all vertices unnumbered, set counter i := k. While there are still some unnumbered vertices: Select an unnumbered vertex v to optimize the criterion c(v). or Select v = (i) [ is an order]. Label it with the number i. Form the set Ci consisting of vi and its unnumbered neighbours. Fill in edges where none exist between all pairs of vertices in Ci. Eliminate vi and decrement i by 1. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (3) A B C F D G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (4) A B C F 5, C5 = {B,C,F} D G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (5) A B C F 5, C5 = {B,C,F} D 4, C4 = {A,B,D} G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (6) A B C 3, C3 = {A,B,C} F 5, C5 = {B,C,F} D 4, C4 = {A,B,D} G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (7) A 2, C2 = {A,B} B C 3, C3 = {A,B,C} F 5, C5 = {B,C,F} D 4, C4 = {A,B,D} G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Triangulation (8) 1, C1 = {A} Elimination set Cj contains vj. vj  Cl for all l < j. (C1,..., Ck) has RIP. The cliques of the triangulated graph G’ are contained in (C1,..., Ck). A 2, C2 = {A,B} B C 3, C3 = {A,B,C} F 5, C5 = {B,C,F} D 4, C4 = {A,B,D} G 6, C6 = {F, G}  = (A,B,C,D,F,G) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (1) Algorithm 4.14 [Elimination Tree Construction] Associate a node of the tree with each set Ci. For j = 1, ..., k, if Cj contains more than one vertex, add an edge between Cj and Ci where i is the largest index of a vertex in Cj \ {vj} Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (2) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (3) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (4) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (5) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (6) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Elimination Tree Construction (7) B:A C:AB C3 F:BC C5 D:AB C4 G:F C6 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. From etree to jtree (1) Lemma 4.16 Let C1,..., Ck be a sequence of sets with RIP Assume that Ct  Cp for some t  p and that p is minimal with this property for fixed t. Then: (i) If t > p, then C1, ..., Ct-1, Ct+1, ..., Ck has the running intersection property (ii) If t < p, then C1,..., Ct-1, Cp, Ct+1, ..., Cp-1, Cp+1,..., Ck has the RIP. Simple removal of redundant elimination set might lead to destroy RIP. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. From etree to jtree (2) C1 A: C2 C2 B:A B:A C:AB C:AB C3 C3 F:BC F:BC C5 C5 D:AB D:AB C4 C4 G:F G:F C6 C6 Condition (ii): t = 1, p = 2 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. From etree to jtree (3) C2 B:A C:AB C:AB C3 C3 F:BC F:BC C5 C5 D:AB D:AB C4 C4 G:F G:F C6 C6 Condition (ii): t = 2, p = 3 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. MST for making jtree (1) Algorithm From Elimination set (C1, ..., Ck) Remove redundant Cis Make junction graph. If |Ci  Cj | > 0 add edge between Ci and Cj. Set weight of the edge as |Ci  Cj |. Construct MST (Maximum Weight Spanning Tree) The resulting tree is junction tree. Also the clique set has RIP. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. MST for making jtree (2) C1 ABC ABC 2 2 2 2 C2 C3 1 ABD ABD BCF BCF 1 1 FG C4 FG Junction graph MST Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. MST for making jtree (3) Optimal jtree (for a fixed elimination ordering) cost of edge e = (v, w) Use cost of edge to break tie when constructing MST. (minimum preferred) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Contents From DAG to Junction Tree From Elimination Tree to Junction Tree Junction Tree Algorithms Learning Bayesian Networks Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Collect phase separator Ck Updated potential projection Initial potential Cj From leaf to root Ci Ci’ Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Distribute phase Ck From root to leaf j* contains marginal distribution of clique j. Cj Ci Ci’ Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Contents From DAG to Junction Tree From Elimination Tree to Junction Tree Junction Tree Algorithms Learning Bayesian Networks Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Learning Paradigm Known structure or unknown structure Full observability or partial observability Frequentist or Bayesian Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Fr (1) Given training set D = {D1, ..., DM} MLE of parameters of each CPD MLE (Maximum likelihood Estimates) CPD (Conditional Probability Distribution) Decomposition, for each node # of nodes # of data Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Fr (2) Multinomial distributions , for tabular CPD Log-likelihood MLE constraint: Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Fr (3) MLE of Multinomial distr. Constrained optimization Derivatives of ijk Setting Derivatives of ijk zero Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Fr (4) Conditional linear Gaussian distributions Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (1) Frequentist: point estimation Bayesian: distributional estimation Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (2) Multinomial distributions Two assumptions on prior Global independence: Local independence: Global independence + likelihood equivalence leads to Dirichlet prior: Conjugate prior for multinomial Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (3) Remark on Bayesian P(|D)  P(D| )*P() Conjugate priors Posterior has same form with prior distribution. Many exponential family belongs to conjugate priors. Posterior Prior Likelihood Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (4) Multinomial distributions Dirichlet prior on tabular CPDs ij: multinomial r.v. with ri possible values Posterior distribution Posterior mean Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (5) Dirichlet distribution Hyper parameter ijk Positive number Pseudo count # of imaginary cases ijk - 1 Posterior distribution Combined count between pseudo count and # of observed data Simple sum Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Fo, Ba (6) Gaussian distributions Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Fr (1) Log likelihood Not decomposable into a sum of local terms, one per node EM algorithm hidden visible (observed) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Fr (2) EM algorithm From Jensen’s inequality constraint: Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Fr (3) Maximizing w.r.t. q (E-step) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Fr (4) Maximizing w.r.t  (M-step) After q is maximized to p(h|Vm) Maximizing Expected complete-data log-likelihood Iteration until convergence E-step Calculate expected complete-data log-likelihood M-step Get * maximizing expected complete-data log-likelihood Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Fr (5) Multinomial distribution E-step M-step Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Ks, Po, Ba (1) Gibbs sampling: stochastic version of EM Variational Bayes: P(, H|V)  q(|V)q(H|V) Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (1) Issues Hypothesis space Evaluation function Search algorithm Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (2) Search space DAG # of DAGs ~ O(2n^2) 10 nodes ~ O(1018) DAGs Finding optimal DAG: doomed to failure Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (3) Search algorithm Local search Operators: adding, deleting, reversing a single arc Choose G somehow While not converged For each G’ in nbd(G) Compute score(G’) G* := arg maxG’ score(G’) If score(G*) > score(G) then G :=G* else converged := true Psedo-code for hill-climbing. nbd(G) is the neighborhood of G, i.e., the models that can be reached by applying a single local change operator. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (4) Search algorithm PC algorithm Starts with fully connected undirected graph CI (conditional independence) test If X  Y|S, arc between X and Y is removed. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (5) Scoring function MLE selects fully connected graph. score(G)  P(D|G)P(G) Automatically penalizing effect on complex model. has more parameters. Not much probability mass to the space where data actually lies. penalizing complex models Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (6) Scoring function Under global independences, and conjugate priors Integration at closed form Decomposition as factored form Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (7) Scoring function Under not conjugate priors: approximation Laplace approximation: BIC (Bayesian Information Criterioin) Case of multinomial distribution dim. of the model ML estimate of params. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (8) Scoring function Advantage of decomposed score Marginal likelihood at most two different terms in single link mismatched graphs. Ex) G1:X1X2  X3  X4, G2:X1  X2X3  X4 Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Fr (9) Scoring function Marginal likelihood for the multinomial distribution with Dirichlet prior Bayesian Dirichlet (BD) score posterior mean Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Ba (1) Posterior over all models is intractable Focusing on some features Bayesian model averaging Needs to calculate P(G|D) Solution MCMC: Metropolis-Hastings algorithm Only need to ratio R. Integration is avoided. f(G)=1 if G contains a certain edge Integration is intractable. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Fo, Ba (2) Calculation of P(G|D) Sampling G Choose G somehow While not converged Pick a G’ u.a.r. from nbd(G) Compute R = P(G’|D)q(G|G’)/P(G|D)q(G’|G) Sample u ~ uniform(0,1) If u < min{1, R} then G := G’ Psedo-code for MC3 algorithm. u.a.r. means uniformly at random. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Po, Fr (1) Partially observable Computation of marginal likelihood: Intractable Not decomposable to the product of local terms Solutions Approximating the marginal likelihood Structural EM hidden variables Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Po, Fr (2) Approximating the marginal likelihood Candidate’s method from BN’s inference algorithm trivial from Gibbs sampling MLE of params. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Po. Fr (3) Structural EM Idea: decomposition of expected complete-data log-likelihood (BIC-score) Search inside EM (EM inside Search is high cost process) MLE of params. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Us, Po, Ba (1) Combined MCMC MCMC for Bayesian model averaging MCMC over the values of the unobserved nodes. Copyright (c) 2002 by SNU CSE Biointelligence Lab.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. Conclusion Has learning of structure important meaning? In paper, Yes. In engineering, No. What can AI do for human? What can human do for Machine learning algorithm? Copyright (c) 2002 by SNU CSE Biointelligence Lab.