CS 2750: Machine Learning Directed Graphical Models

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Bayesian networks Chapter 14. Outline Syntax Semantics.
1 CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 2: Statistical learning primer for biologists
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
Pattern Recognition and Machine Learning
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
1 Chapter 6 Bayesian Learning lecture slides of Raymond J. Mooney, University of Texas at Austin.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
CS 2750: Machine Learning Review
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Computer Science Department
Read R&N Ch Next lecture: Read R&N
Learning Bayesian Network Models from Data
Bayesian Networks Probability In AI.
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets
Bayesian Networks: Motivation
Read R&N Ch Next lecture: Read R&N
Structure and Semantics of BN
Bayesian Networks Based on
CAP 5636 – Advanced Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Class #16 – Tuesday, October 26
Structure and Semantics of BN
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Warm-up as you walk in Each node in a Bayes net represents a conditional probability distribution. What distribution do you get when you multiply all of.
Presentation transcript:

CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017

Graphical Models If no assumption of independence is made, must estimate an exponential number of parameters No realistic amount of training data is sufficient If we assume all variables independent, efficient training and inference possible, but assumption too strong Graphical models use graphs over random variables to specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated Bayesian networks: Directed acyclic graphs indicate causal structure Markov networks: Undirected graphs capture general dependencies Adapted from Ray Mooney

Bayesian Networks Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences Burglary Earthquake Alarm JohnCalls MaryCalls Ray Mooney

Conditional Probability Tables Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents (conditioning case) Roots (sources) of the DAG that have no parents are given prior probabilities P(B) .001 P(E) .002 Burglary Earthquake B E P(A) T .95 F .94 .29 .001 Alarm A P(J) T .90 F .05 A P(M) T .70 F .01 JohnCalls MaryCalls Ray Mooney

CPT Comments Probability of false not given since rows must add to 1 Example requires 10 parameters rather than 25–1=31 for specifying the full joint distribution Number of parameters in the CPT for a node is exponential in the number of parents Ray Mooney

Bayes Net Inference Given known values for some evidence variables, determine the posterior probability of some query variables Example: Given that John calls, what is the probability that there is a Burglary? ??? John calls 90% of the time there is an Alarm and the Alarm detects 94% of Burglaries so people generally think it should be fairly high. However, this ignores the prior probability of John calling. Burglary Earthquake Alarm JohnCalls MaryCalls Ray Mooney

Bayes Net Inference Example: Given that John calls, what is the probability that there is a Burglary? P(B) .001 John also calls 5% of the time when there is no Alarm. So over 1,000 days we expect 1 Burglary and John will probably call. However, he will also call with a false report 50 times on average. So the call is about 50 times more likely a false report: P(Burglary | JohnCalls) ≈ 0.02 ??? Burglary Earthquake Alarm JohnCalls MaryCalls A P(J) T .90 F .05 Ray Mooney

Bayesian Networks No independence encoded Chris Bishop

Bayesian Networks More interesting: Some independences encoded General Factorization Chris Bishop

Conditional Independence a is independent of b given c Equivalently Notation Chris Bishop

Conditional Independence: Example 1 Node c is “tail to tail” for path from a to b: No independence of a and b follows from this path Chris Bishop

Conditional Independence: Example 1 Node c is “tail to tail” for path from a to b: Observing c blocks the path thus making a and b conditionally independent Chris Bishop

Conditional Independence: Example 2 Σc p(c|a) p(b|c) = Σc p(c|a) p(b|c, a) = // next slide Σc p(b, c|a) = p(b|a) Node c is “head to tail” for path from a to b: No independence of a and b follows from this path Chris Bishop

Conditional Independence: Example 2 Node c is “head to tail” for path from a to b: Observing c blocks the path thus making a and b conditionally independent Chris Bishop

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c unobserved. Node c is “head to head” for path from a to b: Unobserved c blocks the path thus making a and b independent Chris Bishop

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c observed. Node c is “head to head” for path from a to b: Observing c unblocks the path thus making a and b conditionally dependent Chris Bishop

Bayesian Networks More interesting: Some independences encoded General Factorization Chris Bishop

“Am I out of fuel?” B = Battery (0=flat, 1=fully charged) and hence B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full) Chris Bishop

“Am I out of fuel?” Probability of an empty tank increased by observing G = 0. Chris Bishop

“Am I out of fuel?” Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”. Chris Bishop

D-separation A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d-separated from B by C. If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies . Chris Bishop

D-separation: Example Chris Bishop

D-separation: I.I.D. Data The xi’s conditionally independent. Are the xi’s marginally independent? Chris Bishop

Naïve Bayes Conditioned on the class z, the distributions of the input variables x1, …, xD are independent. Are the x1, …, xD marginally independent? Chris Bishop

Bayesian Curve Fitting Polynomial Chris Bishop

Bayesian Curve Fitting Plate Chris Bishop

Bayesian Curve Fitting Input variables and explicit hyperparameters Chris Bishop

Bayesian Curve Fitting —Learning Condition on data Chris Bishop

Bayesian Curve Fitting —Prediction Predictive distribution: where Chris Bishop

Discrete Variables General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters Chris Bishop

Discrete Variables General joint distribution over M variables: KM { 1 parameters M -node Markov chain: K { 1 + (M { 1) K(K { 1) parameters Chris Bishop

Discrete Variables: Bayesian Parameters Chris Bishop

Discrete Variables: Bayesian Parameters Shared prior Chris Bishop

The Markov Blanket Factors independent of xi cancel between numerator and denominator. The parents, children and co-parents of xi form its Markov blanket, the minimal set of nodes that isolate xi from the rest of the graph. Chris Bishop