Directed Graphical Probabilistic Models: the sequel

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Bayesian Belief Networks
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Topic Models.
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Slide 1 Directed Graphical Probabilistic Models: learning in DGMs William W. Cohen Machine Learning
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
Oliver Schulte Machine Learning 726
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Latent Dirichlet Allocation
Today.
Read R&N Ch Next lecture: Read R&N
Approximate Inference
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
CS 4/527: Artificial Intelligence
Bayesian Networks Probability In AI.
Artificial Intelligence Chapter 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Networks Based on
Pattern Recognition and Image Analysis
CAP 5636 – Advanced Artificial Intelligence
Professor Marie desJardins,
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Statistics and Belief Networks
CSE 473: Artificial Intelligence Autumn 2011
CS 188: Artificial Intelligence Fall 2007
Directed Graphical Probabilistic Models:
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Biointelligence Lab School of Computer Sci. & Eng.
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2007
Inference III: Approximate Inference
Approximate Inference by Sampling
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Chapter 14 February 26, 2004.
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

Directed Graphical Probabilistic Models: the sequel William W. Cohen Machine Learning 10-601 Feb 2008

Summary of Monday(1): Bayes nets Many problems can be solved using the joint probability P(X1,…,Xn). Bayes nets describe a way to compactly write the joint. For a Bayes net: A P(A) 1 0.33 2 3 First guess The money B P(B) 1 0.33 2 3 A B Stick or swap? C The goat D A B C P(C|A,B) 1 2 0.5 3 1.0 … E Conditional independence: Second guess A C D P(E|A,C,D) …

Aside: Conditional Independence? (Chain rule – always true) (Fancy version of c.r.) (Def’n of cond. Indep.) Caveat divisor: we’ll usually assume no probabilities are zero, so division is safe….

Summary of Monday(2): d-separation X E Y There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there?  If X is d-separated from Y given E, then I<X,E,Y> Z Z Z

d-separation continued… X Y Question: is E X P(X) 0.5 1 ? E Y P(Y|E) 0.5 1 It depends…on the CPTs X E P(E|X) 0.01 1 0.99 This is why d-separation => independence but not the converse… E Y P(Y|E) 0.01 1 0.99

d-separation continued… X Y Question: is E Yes! ?

d-separation continued… X Y Question: is E Yes! ? Bayes rule Fancier version of B.R. From previous slide…

An aside: computations with Bayes Nets X Y Question: what is ? E Main point: inference has no preferred direction in a Bayes net

d-separation X E Y Z Z Z

d-separation X E Y ? Z ? Z Z

d-separation continued… X Y Question: is E Yes! ?

d-separation continued… X Y Question: is E No ? E P(E) 0.5 1

d-separation X E Y ? Z ? Z Z

d-separation continued… X Y Question: is E Yes! ?

d-separation continued… X Y Question: is E ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

d-separation continued… X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

d-separation continued… X Y Question: is E No! ? X Y E P(E|X,Y) P(E,X,Y) 0.96 0.24 1 0.04 0.01

“Explaining away” NO X Y E YES This is “explaining away”: E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable since X alone is enough to “explain” E

“Explaining away” and common-sense Historical note: Classical logic is monotonic: the more you know, the more you deduce. “Common-sense” reasoning is not monotonic birds fly but, not after being cooked for 20min/lb at 350o F This led to numerous “non-monotonic logics” for AI This examples shows that Bayes nets are not monotonic If P(Y|E) is “your belief” in Y after observing E, and P(Y|X,E) is “your belief” in Y after observing E,X your belief in Y decreases after you discover X

A special case: linear chain networks X1 Xn ... Xj ... d-separation “backward” “forward”

A special case: linear chain networks X1 Xn ... Xj ... Fwd: Recursion! (fwd) CPT entry

A special case: linear chain networks X1 Xn ... Xj ... Back: Chain rule CPT Recursion backward

A special case: linear chain networks X1 Xn ... Xj ... “backward” “forward” Instead of recursion: iteratively compute P(Xj|x1) from P(Xj-1|x1) – the forward probabilities iteratively compute P(xn|Xj) from P(xn|Xj+1) – the backward probabilities can view the forward computations as passing a “message” forward and vice versa

Linear-chain message passing How long is this !#@ line? 1 1 Xj ... Xn X1 E+ E- j-1 n-j 2 … … ? ? … How many ahead? … How many behind?

Linear-chain message passing P(Xj|E) = P(X|E+)P(X|E-) … true by d-separation Xj ... Xn X1 E+ E- Pass forward: P(Xj|E+)…computed from P(Xj-1|E+) and CPT for Xj Pass backward: P(Xj|E-)…computed from P(Xj+1|E-) and CPT for Xj+1

Inference in Bayes nets General problem: given evidence E1,…,Ek compute P(X|E1,..,Ek) for any X Big assumption: graph is “polytree” <=1 undirected path between any nodes X,Y Notation: U1 U2 X Z1 Z2 Y1 Y2

Inference in Bayes nets: P(X|E)

Inference in Bayes nets: P(X|E+) d-sep – write as product d-sep.

Inference in Bayes nets: P(X|E+) d-sep – write as product d-sep. CPT table lookup So far: simple way of propogating “belief due to causal evidence” up the tree Recursive call to P(.|E+)

Inference in Bayes nets: P(E-|X) recursion

Inference in Bayes nets: P(E-|X) recursion

Inference in Bayes nets: P(E-|X)

Inference in Bayes nets: P(E-|X) Recursive call to P(.|E) CPT Recursive call to P(E-|.) where

More on Message Passing We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E+) i.e., CPT for X and product of “forward” messages from parents P(E-|X=x) i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propogation)

Learning for Bayes nets Input: Sample of the joint: Graph structure of the variables for I=1,…,N, you know Xi and parents(Xi) Output: Estimated CPTs A B B P(B) 1 0.33 2 3 C D Method (discrete variables): Estimate each CPT independently Use a MLE or MAP A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

Learning for Bayes nets Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MLE: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

MAP estimates The beta distribution: “pseudo-data”: like hallucinating a few heads and a few tails

The Dirichlet distribution: MAP estimates The Dirichlet distribution: “pseudo-data”: like hallucinating αi examples of X=i for each value of i

Learning for Bayes nets Method (discrete variables): Estimate each CPT independently Use a MLE or MAP MAP: A B B P(B) 1 0.33 2 3 C D A B C P(C|A,B) 1 2 0.5 3 1.0 … E …

Additional reading ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf A Tutorial on Learning With Bayesian Networks, Heckerman