Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

A Tutorial on Learning with Bayesian Networks
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Network : An Introduction May 2005 김 진형 KAIST
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Bayesian network inference
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
Bayesian Networks CS 271: Fall 2007 Instructor: Padhraic Smyth.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14. Outline Syntax Semantics.
A Brief Introduction to Graphical Models
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Introduction on Graphic Models
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Bayesian Networks Read R&N Ch Next lecture: Read R&N
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Learning Bayesian Network Models from Data
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
Read R&N Ch Next lecture: Read R&N
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Chapter 14 February 26, 2004.
Presentation transcript:

Bayesian Networks

Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a joint probability P(X 1, …, X n )

Introduction Example: Alarm The story: In LA burglary and earthquake are not uncommon. They both can cause alarm. In case of alarm, two neighbors John and Mary may call Problem: Estimate the probability of a burglary based who has or has not called Variables: Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M) Knowledge required to solve the problem: P(B, E, A, J, M)

Introduction What is the probability of burglary given that Mary called, P(B = y | M = y)? Compute marginal probability: P(B, M) =  E, A, J P(B, E, A, J, M) Use the definition of conditional probability Answer:

Introduction Difficulty: Complexity in model construction and inference In Alarm example: –31 numbers needed –Computing P(B = y | M = y) takes 29 additions In general –P(X 1, … X n ) needs at least 2 n – 1numbers to specify the joint probability –Exponential storage and inference

Conditional Independence Overcome the problem of exponential size by exploiting conditional independence The chain rule of probabilities:

Conditional Independence Conditional independence in the problem domain: Domain usually allows to identify a subset pa(X i ) µ {X 1, …, X i – 1 } such that given pa(X i ), X i is independent of all variables in {X 1, …, X i - 1 } \ pa{X i }, i.e. P(X i | X 1, …, X i – 1 ) = P(X i | pa(X i )) Then Conditional independence in the problem domain: Domain usually allows to identify a subset pa(X i ) µ {X 1, …, X i – 1 } such that given pa(X i ), X i is independent of all variables in {X 1, …, X i - 1 } \ pa{X i }, i.e. P(X i | X 1, …, X i – 1 ) = P(X i | pa(X i )) Then

Conditional Independence As a result, the joint probability P(X 1, …, X n ) can be represented as the conditional probabilities P(X i | pa(X i )) Example continued: P(B, E, A, J, M) =P(B)P(E|B)P(A|B,E)P(J|A,B,E)P(M|B,E,A,J) =P(B)P(E)P(A|B,E)P(J|A)P(M|A) pa(B) = {}, pa(E) = {}, pa(A) = {B, E}, pa{J} = {A}, pa{M} = {A} Conditional probability table specifies: P(B), P(E), P(A | B, E), P(M | A), P(J | A)

Conditional Independence As a result: Model size reduced Model construction easier Inference easier

Graphical Representation To graphically represent the conditional independence relationships, construct a directed graph by drawing an arc from X j to X i iff X j pa(X i ) X j pa(X i ) pa(B) = {}, pa(E) = {}, pa(A) = {B, E}, pa{J} = {A}, pa{M} = {A} A BE JM

Graphical Representation We also attach the conditional probability table P(X i | pa(X i )) to node X i The result: Bayesian network A BE JM P(B)P(E) P(J | A) P(M | A) P(A | B, E)

Formal Definition A Bayesian network is: An acyclic directed graph (DAG), where Each node represents a random variable And is associated with the conditional probability of the node given its parents

Intuition A BN can be understood as a DAG where arcs represent direct probability dependence Absence of arc indicates probability independence: a variable is conditionally independent of all its nondescendants given its parents From the graph: B ? E, J ? B | A, J ? E | A A BE JM

Construction Procedure for constructing BN: Choose a set of variables describing the application domain Choose an ordering of variables Start with empty network and add variables to the network one by one according to the ordering

Construction To add i-th variable X i : –Determine pa(X i ) of variables already in the network (X 1, …, X i – 1 ) such that P(X i | X 1, …, X i – 1 ) = P(X i | pa(X i )) (domain knowledge is needed there) –Draw an arc from each variable in pa(X i ) to X i

Example Order: B, E, A, J, M –pa(B)=pa(E)={}, pa(A)={B,E}, pa(J)={A}, pa{M}={A} Order: M, J, A, B, E –pa{M}={}, pa{J}={M}, pa{A}={M,J}, pa{B}={A}, pa{E}={A,B} Order: M, J, E, B, A –Fully connected graph A BE JM A B E J M A B E J M

Construction Which variable order? Naturalness of probability assessment M, J, E, B, A is bad because of P(B | J, M, E) is not natural Minimize number of arcs M, J, E, B, A is bad (too many arcs), the first is good Use casual relationship: cause come before their effects M, J, E, B, A is bad because M and J are effects of A but come before A A BE JM A B E J M VS

Casual Bayesian Networks A causal Bayesian network, or simply causal networks, is a Bayesian network whose arcs are interpreted as indicating cause-effect relationships Build a causal network: – –Choose a set of variables that describes the domain – –Draw an arc to a variable from each of its direct causes (Domain knowledge required)

Example Visit Africa Tuberculosis X-Ray Smoking Lung Cancer Bronchitis Dyspnea Tuberculosis or Lung Cancer

Casual BN Causality is not a well understood concept. – –No widely accepted denition. – –No consensus on whether it is a property of the world or a concept in our minds Sometimes causal relations are obvious: – –Alarm causes people to leave building. – –Lung Cancer causes mass on chest X-ray. At other times, they are not that clear. Doctors believe smoking causes lung cancer but the tobacco industry has a different story: SC Surgeon General (1964) * CS Tobacco Industry

Inference Posterior queries to BN –We have observed the values of some variables –What are the posterior probability distributions of other variables? Example: Both John and Mary reported alarm –What is the probability of burglary P(B|J=y,M=y)?

Inference General form of query P(Q | E = e) = ? Q is a list of query variables E is a list of evidence variables e denotes observed variables

Inference Types Diagnostic inference: P(B | M = y) Predictive/Casual Inference: P(M | B = y) Intercasual inference (between causes of a common effect) P(B | A = y, E = y) Mixed inference (combining two or more above) P(A | J = y, E = y) (diagnostic and casual) All the types are handled in the same way

Naïve Inference Naïve algorithm for solving P(Q|E = e) in BN Get probability distribution P(X) over all variables X by multiplying conditional probabilities BN structure is not used, for many variables the algorithm is not practical Generally exact inference is NP-hard

Basic Example Conditional Probabilities: P(A),P(B|A),P(C|B),P(D|C) Query: P(D) = ? P(D) =  A, B, C P(A, B, C, D) =  A, B, C P(A)P(B|A)P(C|B)P(D|C) (1) =  C P(D|C)  B P(C|B)  A P(A)P(B|A) (2) Complexity: –Use (1): –Use (2): ABCD

Inference Though generally exact inference is NP- hard, in some cases the problem is tractable, e.g. if BN has a (poly)-tree structure efficient algorithm exists (a poly tree is a directed acyclic graph in which no two nodes have more than one path between them) Another practical approach: Stochastic Simulation

A general sampling algorithm For i = 1 to n 1.Find parents of X i (X p(i, 1), …, X p(i, n) ) 2.Recall the values that those parents where randomly given 3.Look up the table for P(X i | X p(i, 1) = x p(i, 1), …, X p(i, n) = x p(i, n) ) 4.Randomly set x i according to this probability

Stochastic Simulation We want to know P(Q = q| E = e) Do a lot of random samplings and count –N c : Num. samples in which E = e –N s : Num. samples in which Q = q and E = e –N: number of random samples If N is big enough –N c / N is a good estimate of P(E = e) –N s / N is a good estimate of P(Q = q, E = e) –N s / N c is then a good estimate of P(Q = q | E = e)

Parameter Learning Example: –given a BN structure –A dataset –Estimate conditional probabilities P(X i | pa(X i )) X1X1 X3X3 X5X5 X4X4 X2X2 X1X1X1X1 X2X2X2X2 X3X3X3X3 X4X4X4X4 X5X5X5X ?00? …………… ? means missing values

Parameter Learning We consider cases with full data Use maximum likelihood (ML) algorithm and bayesian estimation Mode of learning: –Sequential learning –Batch learning Bayesian estimation is suitable both for sequential and batch learning ML is suitable only for batch learning

ML in BN with Complete Data n variables X 1, …, X n Number of states of X i : r i = |  X i | Number of configurations of parents of X i : q i = |  pa(X i ) | Parameters to be estimated:  ijk =P(X i = j | pa(X i ) = k), i = 1, …, n; j = 1, …, r i ; k = 1, …, q i

ML in BN with Complete Data Example: consider a BN. Assume all variables are binary taking values 1, 2.  ijk =P(X i = j | pa(X i ) = k) X1X1 X3X3 X2X2 Number of parents configuration

ML in BN with Complete Data A complete case: D l is a vector of values, one for each variable (all data is known). Example: D l = (X 1 = 1, X 2 = 2, X 3 = 2) Given: A set of complete cases: D = {D 1, …, D m } Find: the ML estimate of the parameters 

ML in BN with Complete Data Loglikelihood: l(  | D) = log L(  | D) = log P(D |  ) = log  l P(D l |  ) =  l log P(D l |  ) The term log P(D l |  ): –D 4 = (1, 2, 2) log P(D 4 |  ) = log P(X 1 = 1, X 2 = 2, X 3 = 2 |  ) = log P(X 1 =1 |  ) P(X 2 =2 |  ) P(X 3 =2 | X 1 =1, X 2 =2,  ) = log  log  log  322 = log P(X 1 =1 |  ) P(X 2 =2 |  ) P(X 3 =2 | X 1 =1, X 2 =2,  ) = log  log  log  322 Recall:  ={  111,  121,  211,  221,  311,  312,  313,  314,  321,  322,  323,  324  X1X1 X3X3 X2X2

ML in BN with Complete Data Define the characteristic function of D l : When l = 4, D 4 = {1, 2, 2}  (1,1,1:D 4 )=  (2,2,1:D 4 )=  (3,2,2:D 4 )=1,  (i, j, k: D 4 ) = 0 for all other i, j, k So, log P(D 4 |  ) =  ijk  (i, j, k: D 4 ) log  ijk In general, log P(D l |  ) =  ijk  (i, j, k: D l ) log  ijk X1X1 X3X3 X2X2

ML in BN with Complete Data Define: m ijk =  l  (i, j, k: D l ) the number of data cases when X i = j and pa(X i ) = k Then l(  | D) =  l log P(D l |  ) =  l  i, j, k  (i, j, k : D l ) log  ijk =  i, j, k  l  (i, j, k : D l ) log  ijk =  i, j, k m ijk log  ijk =  i,k  j m ijk log  ijk

ML in BN with Complete Data We want to find: argmax l(  | D) = argmax  i,k  j m ijk log  ijk  ijk Assume that  ijk = P(X i = j | pa(X i ) = k) is not related to  i’j’k’ provided that i  i’ OR k  k’ Consequently we can maximize separately each term in the summation  i, k […] argmax  j m ijk log  ijk  ijk

ML in BN with Complete Data As a result we have: In words, the ML estimate for  ijk = P( X i = j | pa(X i ) = k) is number of cases where X i =j and pa(X i ) = k number of cases where pa(X i ) = k

More to do with BN Learning parameters with some values missing Learning the structure of BN from training data Many more…

References Pearl, Judea, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, Heckerman, David, "A Tutorial on Learning with Bayesian Networks," Technical Report MSR-TR-95-06, Microsoft Research, R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter. "Probabilistic Networks and Expert Systems". Springer-Verlag