Machine Learning CUNY Graduate Center Lecture 21: Graphical Models
Today Graphical Models –Representing conditional dependence graphically 1
Graphical Models and Conditional Independence More generally about probabilities, but used in classification and clustering. Both Linear Regression and Logistic Regression use probabilistic models. Graphical Models allow us to structure, and visualize probabilistic models, and the relationships between variables. 2
(Joint) Probability Tables Represent multinomial joint probabilities between K variables as K-dimensional tables Assuming D binary variables, how big is this table? What is we had multinomials with M entries? 3
Probability Models What if the variables are independent? If x and y are independent: The original distribution can be factored How big is this table, if each variable is binary? 4
Conditional Independence Independence assumptions are convenient (Naïve Bayes), but rarely true. More often some groups of variables are dependent, but others are independent. Still others are conditionally independent. 5
Conditional Independence If two variables are conditionally independent. E.g. y = flu?, x = achiness?, z = headache? 6
Factorization if a joint Assume How do you factorize: 7
Factorization if a joint What if there is no conditional independence? How do you factorize: 8
Structure of Graphical Models Graphical models allow us to represent dependence relationships between variables visually –Graphical models are directed acyclic graphs (DAG). –Nodes: random variables –Edges: Dependence relationship –No Edge: Independent variables –Direction of the edge: indicates a parent-child relationship –Parent: Source – Trigger –Child: Destination – Response 9
Example Graphical Models Parents of a node i are denoted π i Factorization of the joint in a graphical model: 10 x x y y x x y y
Basic Graphical Models Independent Variables Observations When we observe a variable, (fix its value from data) we color the node grey. Observing a variable allows us to condition on it. E.g. p(x,z|y) Given an observation we can generate pdfs for the other variables. 11 x x y y z z x x y y z z
Example Graphical Models X = cloudy? Y = raining? Z = wet ground? Markov Chain 12 x x y y z z
Example Graphical Models Markov Chain Are x and z conditionally independent given y? 13 x x y y z z
Example Graphical Models Markov Chain 14 x x y y z z
One Trigger Two Responses X = achiness? Y = flu? Z = fever? 15 x x y y z z
Example Graphical Models Are x and z conditionally independent given y? 16 x x y y z z
Example Graphical Models 17 x x y y z z
Two Triggers One Response X = rain? Y = wet sidewalk? Z = spilled coffee? 18 x x y y z z
Example Graphical Models Are x and z conditionally independent given y? 19 x x y y z z
Example Graphical Models 20 x x y y z z
Factorization 21 x0 x1 x2 x4 x3 x5
Factorization 22 x0 x1 x2 x4 x3 x5
How Large are the probability tables? 23
Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multivariate Bernouli 24 µ0 x0 µ1 x1 µ2 x2
Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multinomial 25 x0 µ µ x1 x2
Naïve Bayes Classification Observed variables xi are independent given the class variable y The distribution can be optimized using maximum likelihood on each variable separately. Can easily combine various types of distributions 26 x0 y y x1 x2
Graphical Models Graphical representation of dependency relationships Directed Acyclic Graphs Nodes as random variables Edges define dependency relations What can we do with Graphical Models –Learn parameters – to fit data –Understand independence relationships between variables –Perform inference (marginals and conditionals) –Compute likelihoods for classification. 27
Plate Notation To indicate a repeated variable, draw a plate around it. 28 x0 y y x1 xn … y y xi n
Completely observed Graphical Model Observations for every node Simplest (least general) graph, assume each independent 29
Completely observed Graphical Model Observations for every node Second simplest graph, assume complete dependence 30
Maximum Likelihood Each node has a conditional probability table, θ Given the tables, we can construct the pdf. Use Maximum Likelihood to find the best settings of θ 31
Maximum likelihood 32
Count functions Count the number of times something appears in the data 33
Maximum Likelihood Define a function: Constraint: 34
Maximum Likelihood Use Lagrange Multipliers 35
Maximum A Posteriori Training Bayesians would never do that, the thetas need a prior. 36
Conditional Dependence Test Can check conditional independence in a graphical model –“Is achiness (x3) independent of the flue (x0) given fever(x1)?” –“Is achiness (x3) independent of sinus infections(x2) given fever(x1)?” 37
D-Separation and Bayes Ball Intuition: nodes are separated or blocked by sets of nodes. –E.g. nodes x1 and x2, “block” the path from x0 to x5. So x0 is cond. ind.from x5 given x1 and x2 38
Bayes Ball Algorithm Shade nodes x c Place a “ball” at each node in x a Bounce balls around the graph according to rules If no balls reach x b, then cond. ind. 39
Ten rules of Bayes Ball Theorem 40
Bayes Ball Example 41
Bayes Ball Example 42
Undirected Graphs What if we allow undirected graphs? What do they correspond to? Not Cause/Effect, or Trigger/Response, but general dependence Example: Image pixels, each pixel is a bernouli –P(x11,…, x1M,…, xM1,…, xMM) –Bright pixels have bright neighbors No parents, just probabilities. Grid models are called Markov Random Fields 43
Undirected Graphs Undirected separability is easy. To check conditional independence of A and B given C, check the Graph reachability of A and B without going through nodes in C 44 D D B B C C A A
Next Time Inference in Graphical Models –Belief Propagation –Junction Tree Algorithm 45