Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring,

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes Lecture 17 Slides May 30 th, 2006

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 2 READING: –M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) Reminder: TA discussions and office hours: –Office hours: Thursdays 3:30-4:30, Sieg Ground Floor Tutorial Center –Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor Tutorial Center Lecture Room No more homework this quarter, concentrate on final projects!! Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email). Announcements

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 3 L1: Tues, 3/28: Overview, GMs, Intro BNs. L2: Thur, 3/30: semantics of BNs + UGMs L3: Tues, 4/4: elimination, probs, chordal I L4: Thur, 4/6: chrdal, sep, decomp, elim L5: Tue, 4/11: chdl/elim, mcs, triang, ci props. L6: Thur, 4/13: MST,CI axioms, Markov prps. L7: Tues, 4/18: Mobius, HC-thm, (F)=(G) L8: Thur, 4/20: phylogenetic trees, HMMs L9: Tue, 4/25: HMMs, inference on trees L10: Thur, 4/27: Inference on trees, start poly L11: Tues, 5/2: polytrees, start JT inference L12: Thur, 5/4: Inference in JTs Tues, 5/9: away Thur, 5/11: away L13: Tue, 5/16: JT, GDL, Shenoy-Schafer L14: Thur, 5/18: GDL, Search, Gaussians I L15: Mon, 5/22: laptop crash  L16: Tues, 5/23: search, Gaussians I L17: Thur, 5/25: Gaussians Mon, 5/29: Holiday L18: Tue, 5/30 L19: Thur, 6/1: final presentations Class Road Map

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 4 L1: Tues, 3/28: L2: Thur, 3/30: L3: Tues, 4/4: L4: Thur, 4/6: L5: Tue, 4/11: L6: Thur, 4/13: L7: Tues, 4/18: L8: Thur, 4/20: Team Lists, short abstracts I L9: Tue, 4/25: L10: Thur, 4/27: short abstracts II L11: Tues, 5/2: L12: Thur, 5/4: abstract II + progress L--: Tues, 5/9 L--: Thur, 5/11: 1 page progress report L13: Tue, 5/16: L14: Thur, 5/18: 1 page progress report L15: Tues, 5/23 L16: Thur, 5/25: 1 page progress report L17: Tue, 5/30: Today L18: Wed, 5/31: L19: Thur, 6/1: final presentations L20: Tue, 6/6 4-page papers due (like a conference paper), Only.pdf versions accepted. Final Project Milestone Due Dates Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only). Final reports must be turned in electronically in PDF (no other formats accepted). No need to repeat what was on previous progress reports/abstracts, I have those available to refer to. Progress reports must report who did what so far!!

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 5 Gaussian Graphical Models Summary of Last Time

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 6 Other forms of inference. Structure learning in graphical models Outline of Today’s Lecture

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 7 Books and Sources for Today Jordan chapters 13-15 Other references contained in presentation …

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 8 Graphical Models 1.We start with some probability distribution P 1.Could be specified as a given, or more likely we have training data of some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.) 2.The graph G=(V,E) represents “structure” in P 3.Graph can provide efficient representation and computational inference for P 4.There can be multiple graphs that represent a given P (e.g., complete graph represents all P). 5.Goal: find computationally cheap exact or approximate graph cover for P 6.Once we do this, we just compute probabilities using the junction tree algorithm or search algorithm, etc.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 9 Graphical Models & Tree-width 1.The complexity parameter for G=(V,E) 2.Def: k-tree: k-nodes, clique of size k. n>k nodes, connect n th node to previous k fully connected nodes 3.Example: 4-tree note: all separators are of size 4 4-tree with 4 nodes 4-tree with 5 nodes4-tree with 6 nodes

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 10 Graphical Models & Tree-width 1.Def: partial k-tree: any sub-graph of a k-tree 2.Def: tree-width of a graph G is smallest k such that G is a partial k-tree. 3.Thm: The tree-width decision problem is NP-complete 1.We mentioned this before, proven by Arnborg, 4.Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width 1.Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive? 5.The big question, what if exact inference is too expensive?

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 11 When exact inference is too expensive 1.Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem. 2.Exact solution to approximate problem 1.Structure learning: find a low tree-width (or “cheap” in some way) graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model. 2.This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm. 3.Approximate solution to an exact problem 1.Approximate inference, tries to approximate in some way what must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 12 Finding k-trees 1.How do we score a k-tree? 1.Maximum likelihood, or conditional score 2.May we assume that truth itself is a k-tree 1.Sometimes simplifications can be made if we assume that truth is part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes. 3.How to find best 1-tree?

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 13 Finding 1-trees 1.Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 14 Finding 1-trees

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 20 Plethora of negative results Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”) Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score. Meek2001: learning even a path (sub-class of trees) in ML sense is NP- hard.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 21 Plethora of negative results Srebro/Karger2001: learning k-trees in ML sense is hard. So, generative model structure learning is likely to be a difficult problem (unless k=1, or P=NP). We next spend a bit of time talking about the Srebro/Karger result.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 22 Optimal ML k-trees is NP-complete

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 28 Some good news … PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree. Hoeffgen’93: Can robustly (polynomial samples in n, 1/  1/  ) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees. Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 29 More good news … Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs –note: this does not have complexity guarantee. E.g., nxn grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 30 How to PAC-learn such graphs … Mutual information is symmetric submodular

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 31 How to PAC-learn such graphs … Submodularity and Optimization (Narisimhan&Bilmes,2004)

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 32 Another positive result Since mutual information is symmetric-submodular, we can find optimal partitions: where This has implications for clustering ( Narishamhan,Jojic,Bilmes’05 ) and also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator).

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 33 Finding ML decompositions … Optimal to one level

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 34 Discriminative structure Goal might be classification using a generative model. Distinction between parameters & structure Two possible goals: –1) find one global structure that classifies well –2) find class-specific structure (one per class) In either case, finding a good discriminative structure may render discriminative parameter learning less necessary.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 35 Optimal discriminative structure procedure … choose k (for now, lets just assume k=1) Find tree that best satisfies:

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 36 Properties Options: –can fix structure and train parameters using either maximum likelihood (generative) or maximum conditional likelihood (discriminative) –Can learn discriminative structure, and can train either generatively or discriminatively –In all cases, assume appropriate regularization. Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case. Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case).

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 37 EAR measure EAR (explaining away residual) measure. (Bilmes,’98) Goal is to maximize EAR: –Intuition: dependence class-conditionally, but otherwise independent EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables.

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 38 Conditional mutual information? Conditional mutual information is not guaranteed to discriminate well. Building a MST using I(X 1 ;X 2 |C) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases. Example: 3 features (X 1,X 2,X 3 ) and a class C

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 39 Generative training/structure

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 40 Generative training/structure

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 41 General Structure Learning

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring,

Similar presentations

Presentation on theme: "Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring,

Similar presentations

Presentation on theme: "Lec 17: May 30th, 2006EE512 - Graphical Models - J. BilmesPage 1 Jeff A. Bilmes University of Washington Department of Electrical Engineering EE512 Spring,"— Presentation transcript:

Similar presentations

About project

Feedback