1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Probabilistic Reasoning (2)
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 561, Session 29 1 Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Baye’s Rule.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Representation of CPTs CH discrete Canonical distribution: standard Deterministic nodes: values computable exactly from parent nodes Noisy-OR relations:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
CS 541: Artificial Intelligence
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Web-Mining Agents Data Mining
Probabilistic Reasoning; Network-based reasoning
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Chapter 14 February 26, 2004.
Presentation transcript:

1 Chapter 14 Probabilistic Reasoning

2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation* Approximate inference by Markov Chain Monte Carlo*

3 Motivations 3  Full joint probability distribution can answer any question but can become intractably large as number of variable increases  Specifying probabilities for atomic events can be difficult, e.g., large set of data, statistical estimates, etc.  Independence and conditional independence reduce the probabilities needed for full joint probability distribution.

4 Bayesian networks  A directed, acyclic graph (DAG)  A set of nodes, one per variable (discrete or continuous)  A set of directed links (arrows) connects pairs of nodes. X is a parent of Y if there is an arrow (direct influence) from node X to node Y.  Each node has a conditional probability distribution that quantifies the effect of the parents on the node.  Combinations of the topology and the conditional distributions specify (implicitly) the full joint distribution for all the variables.

5 Bayesian networks Example 1 : The Teeth Disease Bayesian

6 6 Example: Burglar alarm system I have a burglar alarm installed at home I have a burglar alarm installed at home  It is fairly reliable at detecting a burglary, but also responds on occasion to minor earth quakes. I also have two neighbors, John and Mary I also have two neighbors, John and Mary  They have promised to call me at work when they hear the alarm  John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too.  Mary likes rather loud music and sometimes misses the alarm altogether. Bayesian networks variables:  Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

7 7 Example: Burglar alarm system Network topology reflects “causal” knowledge:  A burglar can set the alarm off  An earthquake can set the alarm off  The alarm can cause Mary to call  The alarm can cause John to call conditional probability table (CPT): each row contains the conditional probability of each node value for a conditioning case (a possible combination of values for the parent nodes).

8 Compactness of Bayesian networks A CPT for Boolean X i with k Boolean parents has 2^k rows for the combinations of parent values. Each row requires one number p for X i =true (the number for X i =false is just 1-p) If each variable has no more than k parents, The complete network requires O(n. 2^k) numbers i.e., grows linearly with n, vs. O(2^k) for the full joint distribution For Burglary net, =10 numbers (vs. 2^5-1=31)

9 Global semantics of Bayesian networks Global semantics defines the full joint distribution as the product of the local conditional distributions

10 Local semantics of Bayesian network e.g., JohnCalls is independent of Burglary and Earthquake, given the value of Alarm.

11 Markov blanket e.g., Burglary is independent of JohnCalls and MaryCalls, given the value of Alarm and Earthquake.

12 Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics. The correct order in which to add nodes is to add the “root causes” first, then the variables they influence, and so on. What happens if we choose the wrong order?

13 Example

14 Example

15 Example

16 Example

17 Example

18 Example Deciding conditional independence is hard in noncausal directions. Deciding conditional independence is hard in noncausal directions. Assessing conditional probabilities is hard in noncausal directions Assessing conditional probabilities is hard in noncausal directions Network is less compact: = 13 numbers needed Network is less compact: = 13 numbers needed

19 Example: Car diagnosis Initial evidence: car won’t start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters

20 Example: Car insurance

21 Class Exercise 1.Using the axioms of probability, prove 2.Given Bayesian Network –Calculate –In what condition, P p(P) Q p(Q|P), p(Q| ﹁ P)

22 Efficient representation of conditional distributions 22 CPT grows exponentially with number of parents O(2 k ) CPT grows exponentially with number of parents O(2 k ) CPT becomes infinite with continuous-valued parent or child CPT becomes infinite with continuous-valued parent or child Solution: canonical distribution that can be specified by a few parameters Solution: canonical distribution that can be specified by a few parameters Simplest example: deterministic node whose value specified exactly by the values of its parents, with no uncertainty Simplest example: deterministic node whose value specified exactly by the values of its parents, with no uncertainty

23 Efficient representation of conditional distributions “Noisy” logic relationships: uncertain relationships noisy-OR model allows for uncertainty about the ability of each parent to cause the child to be true, but the causal relationship between parent and child maybe inhibited.  E.g.: Fever is caused by Cold, Flu, or Malaria, but a patient could have a cold, but not exhibit a fever. Two assumptions of noisy-OR  parents include all the possible causes (can add leak node that covers “miscellaneous causes.”)  inhibition of each parent is independent of inhibition of any other parents, e.g., whatever inhibits Malaria from causing a fever is independent of whatever inhibits Flu from causing a fever Cold MalariaFlu

24 Efficient representation of conditional distributions Other probabilities can be calculated from the product of the inhibition probabilities for each parent Number of parameters linear in number of parents Number of parameters linear in number of parents O(2 k )  O(k)

25 Bayesian nets with continuous variables discrete variables + continuous variables Hybrid Bayesian network: discrete variables + continuous variables Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) Two options  discretization – possibly large errors, large CPTs  finitely parameterized canonical families(E.g.: Gaussian Distribution ) Two kinds of conditional distributions for hybrid Bayesian network  continuous variable given discrete or continuous parents (e.g., Cost)  discrete variable given continuous parents (e.g., Buys?)

26 Continuous child variables Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: Most common is the linear Gaussian model, e.g.,: Mean Cost varies linearly with Harvest, variance is fixed Mean Cost varies linearly with Harvest, variance is fixed the linear model is reasonable only if the harvest size is limited to a narrow range the linear model is reasonable only if the harvest size is limited to a narrow range

27 Continuous child variables Discrete + continuous linear Gaussian network is a conditional Gaussian network, i.e., a multivariate Gaussian distribution over all continuous variables for each combination of discrete variable values. A multivariate Gaussian distribution is a surface in more than one dimension that has a peak at the mean and drops off on all sides

28 Discrete variable with continuous parents 28 Probability of Buys? given Cost should be a “soft” threshold” Probit distribution uses integral of Gaussian:

29 Discrete variable with continuous parents Sigmoid (or logit) distribution also used in neural networks: Sigmoid has similar shape to probit but much longer tails

30 Exact inference by enumeration 30 A query can be answered using a Bayesian network by computing sums of products of conditional probabilities from the network. sum over hidden variables: earthquake and alarm d = 2 when we have n Boolean variables

31 Evaluation tree 31

32 Exact inference by variable elimination 32 Do the calculation once and save the results for later use – idea of dynamic programming Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid re-computation

33 Variable elimination – basic operations 33

34 Variable elimination – irrelevant variables 34 d = 2 (n Boolean variables) The complexity of variable elimination  Single connected networks (or polytrees) any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are, i.e., linear in the number of variables (nodes) if the number of parents of each node is bounded by a constant  Multiply connected network variable elimination can have exponential time and space complexity even the number of parents per node is bounded.

35 Approximate inference by stochastic simulation* Direct sampling Direct sampling  Generate events from (an empty) network that has no associated evidence  Rejection sampling: reject samples disagreeing with evidence  Likelihood weighting: use evidence to weight samples Markov chain Monte Carlo (MCMC)* Markov chain Monte Carlo (MCMC)*  sample from a stochastic process whose stationary distribution is the true posterior

36 Example of sampling from an empty network 36

37 Example of sampling from an empty network 37

38 Example of sampling from an empty network 38

39 Example of sampling from an empty network 39

40 Example of sampling from an empty network 40

41 Example of sampling from an empty network 41

42 Example of sampling from an empty network 42

43 Example of sampling from an empty network

44 Rejection sampling

45 Likelihood weighting

46 Likelihood weighting 46

47 Likelihood weighting

48 Likelihood weighting 48

49 Likelihood weighting

50 Likelihood weighting

51 Likelihood weighting 51

52 Likelihood weighting analysis

53 Approximate inference using MCMC*

54 The Markov chain

55 MCMC example

56 Markov blanket sampling

57