Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

CS188: Computational Models of Human Behavior
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
1 Chapter 5 Belief Updating in Bayesian Networks Bayesian Networks and Decision Graphs Finn V. Jensen Qunyuan Zhang Division. of Statistical Genomics,
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Lauritzen-Spiegelhalter Algorithm
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Identifying Conditional Independencies in Bayes Nets Lecture 4.
An Introduction to Variational Methods for Graphical Models.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Tree Algorithm Brookes Vision Reading Group.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Global Approximate Inference Eran Segal Weizmann Institute.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
December Marginal and Joint Beliefs in BN1 A Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian Networks and its complexity Mark.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Exact Inference: Clique Trees
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Computer vision: models, learning and inference
CSC2535 Spring 2013 Lecture 2a: Inference in factor graphs Geoffrey Hinton.
Machine Learning Queens College Lecture 13: SVM Again.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Inferring structure to make substantive conclusions: How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC,
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
An Introduction to Variational Methods for Graphical Models
Intro to Junction Tree propagation and adaptations for a Distributed Environment Thor Whalen Metron, Inc.
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
Belief Propagation and its Generalizations Shane Oldenburger.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
Machine Learning – Lecture 18
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 13 Exact Inference & Belief Propagation Bastian.
Today Graphical Models Representing conditional dependence graphically
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Inference in Bayesian Networks
The minimum cost flow problem
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Random Fields Presented by: Vladan Radosavljevic.
Exact Inference Eric Xing Lecture 11, August 14, 2010
Expectation-Maximization & Belief Propagation
Variable Elimination 2 Clique Trees
Lecture 3: Exact Inference in GMs
Advanced Machine Learning
Presentation transcript:

Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm

Today Graphical Models –Representing conditional dependence graphically –Inference –Junction Tree Algorithm 1

Undirected Graphical Models In an undirected graphical model, there is no trigger/response relationship. Represent slightly different conditional independence relationships Conditional independence determined by graph separability. 2 D D B B C C A A

Undirected Graphical Models Different relationships can be described with directed and undirected graphs. Cannot represent: 3

Undirected Graphical Models Different relationships can be described with directed and undirected graphs. 4

Probabilities in Undirected Graphs Clique: a set of nodes such that there is an edge between every pair of nodes that are members of the set We will defined the joint probability as a relationship between functions defined over cliques in the graphical model 5

Probabilities in Undirected Graphs Potential Functions: positive functions over groups of connected variables (represented by maximal cliques of graphical model nodes) –Maximal cliques: if a clique of nodes A are not a proper subset of a clique B, then A is a maximal clique. 6 Guanrantees a sum of 1

Logical Inference In logical inference, nodes are binary, edges represent gates. –AND, OR, XOR, NAND, NOR, NOT, etc. Inference: given observed variables, predict others Problems: uncertainty, conflicts, inconsistency 7 AND NOT XOR

Probabilistic Inference Rather than a logic network, use a Bayesian Network Probabilistic Inference: given observed variables, calculate marginals over others. Logic networks are generalized by Bayesian Networks 8 AND NOT XOR B=TRUEB=FALSE A=TRUE01 A=FALSE10 NOT

Probabilistic Inference Rather than a logic network, use a Bayesian Network Probabilistic Inference: given observed variables, calculate marginals over others. Logic networks are generalized by Bayesian Networks 9 AND NOT XOR B=TRUEB=FALSE A=TRUE A=FALSE NOT-ish

Inference in Graphical Models General Problem: Given a graphical model, for any subsets of observed and expected variables find Direct approach can be quite inefficient if there are many irrelevant variables 10

Marginal Computation Graphical models provide efficient storage by decomposing p(x) into conditional probabilities and a simple MLE result. Now look for efficient calculation of marginals which will lead to efficient inference. 11

Brute Force Marginal Calculation First approach: have CPTs and graphical model. We can compute arbitrary joints. –Assume 6 variables 12

Computation of Marginals Pass messages (small tables) around the graph The messages are small functions that propagate potentials around and undirected graphical model. The inference technique is the Junction Tree Algorithm 13

Junction Tree Algorithm Efficient Message Passing for Undirected Graphs. –For Directed Graphs, first convert to undirected. Goal: Efficient Inference in Graphical Models 14

Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 15

Moralization Converts a directed graph to an undirected graph. Moralization “marries” the parents. –Insert an undirected edge between every pair of nodes that have a child in common. –Replace all directed edges with undirected edges. 16

Moralization Examples 17

Moralization Examples 18

Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 19

Introduce Evidence Given a moral graph, identify the observed variables. Reduce probability functions since we know some are fixed. Only keep probability functions over remaining nodes. 20

Slices Differentiate potential functions from slices Potential Functions are related to joint probabilities over groups of nodes, but aren’t necessarily correctly normalized, and can even be initialized to conditionals. A slice of a potential function is a row or column of the underlying table (in the discrete case) or unnormalized marginal (in the continuous case)

Separation from Introducing Evidence Observing nodes separates conditionally independent sets of variables Normalization Calculation. Don’t bother until the end when we want to determine an individual marginal. 22

Junction Trees Construction of junction trees. –Each node represents a clique of variables. –Edges connect cliques –There is a unique path from node to root –Between each clique node is a separator node. –Separators contain intersections of variables 23

Triangulation Constructing a junction tree. Need to guarantee that a Junction Graph, made up of cliques and separators of an undirected graph is a Tree. –Eliminate any chordless cycles of four or more nodes. 24 B B C C D D A A E E B B C C D D A A E E ABD CE BC DE B B C C D D E E

Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 25

Triangulation When eliminating cycles there may be many choices about which edge to add. Want to keep the largest clique size small – small potential functions Triangulation that minimizes the largest clique size is NP- complete. Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions. 26

Triangulation When eliminating cycles there may be many choices about which edge to add. Want to keep the largest clique size small – small potential functions Triangulation that minimizes the largest clique size is NP- complete. Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions. 27

Triangulation Examples 28 A A F F D D E E B B C C A A A A A A A A A A

Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 29

Constructing Junction Trees Junction trees must satisfy the Running Intersection Property: –All nodes on a path between a clique node V and clique node W must include all nodes in V ∩ W Junction trees will have maximal separator cardinality. 30 B B C C D D A A E E CDE BCD ABD BD CD CDE BCD ABD D D CD

Forming a Junction Tree Given a set of cliques, connect the nodes, s.t. the Running Intersection Property holds. –Maximize the cardinality of the separators. Maximum Spanning Tree (Kruskal’s algorithm) –Initialize a tree with no edges. –Calculate the size of separators between all pairs O(N 2 ) –Connect two cliques with the largest separator cardinality without creating a loop. –Repeat until all nodes are connected. 31

Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities 32

Propagating Probabilities We have a valid junction tree. –What can we do with it? Probabilities in Junction Trees: –De-absorb smaller cliques from maximal cliques. –Doesn’t change anything, but is a less compact description. 33

Conversion from Directed Graph Example conversion. Represent CPTs as potential and separator functions (with a normalizer) 34 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X2 X3 X3 X4 X2 X3

Junction Tree Algorithm Goal: Make marginals consistent Junction Tree Algorithm sends messages between cliques and separators until this consistency is reached. 35

Message passing Send a message from a clique to a separator The message is what the clique thinks the marginal should be. Normalize the clique by each message from the separators s.t. agreement is reached 36 A B B C B B If they agree, finished. Otherwise, iterate.

Message Passing 37 A B B C B B

Junction Tree Algorithm When convergence is reached – clique potentials are marginals and separator potentials are submarginals. p(x) is consistent across all of the message passing. This implies that, so long as p(x) is correctly represented in the potential functions, the JTA can be used to make each potential function correspond to an appropriate marginal without impacting the overall probability function. 38

Converting a DAG to a Junction Tree Initialize separators to 1 and clique tables to CPTs Run JTA to convert potential functions (CPTs) to marginals. 39 X1 X2 X3 X4 X5 X6 X7 X3 X4 X2 X3 X1 X2 X3 X5 X5 X6 X5 X7

Evidence in a Junction Tree Initialize as usual. Update with a slice rather that the whole table 40 Conditional

Efficiency of the Junction Tree Algorithm Construct CPTs –Polynomial in # of data points Moralization –Polynomial in # of nodes Introduce Evidence –Polynomial in # of nodes Triangulate –Suboptimal = polynomial. Optimal = NP Construct Junction Tree –Polynomial in the number of cliques –Identifying cliques = polynomial in the number of nodes Propagate Probabilities –Polynomial in the number of cliques –Exponential in the size of cliques 41

Hidden Markov Models Powerful graphical model to describe sequential information. 42 Q1 Q2 Q3 Q4 X1 X2 X3 X4

Research Projects Run a machine learning experiment –Identify a problem/task. –Find appropriate data –Implement one or more ML algorithm –Evaluate the performance. Write a report of the experiment –4 pages including references –Abstract One paragraph describing the experiment –Introduction Describe the problem/task –Data Describe the data set, features extracted, cleaning processes –Method Describe the algorithm/approach –Results Present and Discuss results –Conclusion Summarize the experiment and results Teams of two people are acceptable. –Requires a report from each participant (written independently) describing who was responsible for the components of the work. 43

Sample Problems/Tasks Vision/Graphics –Object Classification –Facial Recognition –Fingerprint Identification –Fingerprint ID –Handwriting recognition Non english languages? Language –Topic classification –Sentiment analysis –Speech recognition –Speaker identification –Punctuation restoration –Semantic Segmentation –Recognition of Emotion, Sarcasm, etc. –SMS Text normalization –Chat participant Id –Twitter classification –Twitter threading 44

Sample Problems/Tasks Games –Chess –Checkers –Poker –Blackjack –Go Recommenders (Collaborative Filtering) –Netflix –Courses –Jokes –Books –Facebook Video Classification –Motion classification –Segmentation 45

ML Topics to explore in the project L1-regularization Non-linear kernels Loopy belief propagation Non-parametric Belief propagation Soft-decision trees Analysis of Neural Network Hidden Layers Structured Learning Generalized Expectation One-class learning Evaluation Measures –Cluster Evaluation –Semi-supervised evaluation Graph Embedding Dimensionality Reduction Feature Selection Graphical Model Construction Non-parametric Bayesian Methods Latent Dirichlet Allocation 46

Next Time Hidden Markov Models Sampling in Graphical Models 47