CHAPTER 14 Oliver Schulte Bayesian Networks. Environment Type: Uncertain Artificial Intelligence a modern approach 2 Fully Observable Deterministic Certainty:

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Uncertain knowledge and reasoning
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Uncertainty Chapter 13.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayes Nets and Probabilities
Bayesian Networks Material used 1 Random variables
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 20, 2012.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
CMPT 726 Simon Fraser University CHAPTER 14 Oliver Schulte
Reasoning Under Uncertainty: Belief Networks
CS 188: Artificial Intelligence Spring 2007
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Conditional Probability, Bayes’ Theorem, and Belief Networks
Learning Bayesian Network Models from Data
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Bayesian Networks CSE 573.
Chapter 14 February 26, 2004.
Presentation transcript:

CHAPTER 14 Oliver Schulte Bayesian Networks

Environment Type: Uncertain Artificial Intelligence a modern approach 2 Fully Observable Deterministic Certainty: Search Uncertainty no yes no

Motivation

Logical Inference Does knowledge K entail A? Model-based Model checking enumerate possible worlds Rule-based Logical Calculus Proof Rules Resolution

Probabilistic Inference What is P(A|K)? Read: probability of A given K. Model-based Sum over possible worlds Rule-based Probability Calculus Product Rule Marginalization Bayes’ theorem

Knowledge Representation Format Constraint Satisfaction LogicBayes nets Basic UnitVariableLiteralRandom Variable FormatVariable Graph (Undirected) (Horn) ClausesVariable Graph (Directed) + Probability Horn clauses InferenceArc ConsistencyResolutionBelief Propagation Product-Sum (not covered)

Bayes Net Applications Used in many applications: medical diagnosis, office clip, … 400-page book about applications. Companies: Hugin, Netica, Microsoft.

Basic Concepts

Bayesian Network Structure A graph where:  Each node is a random variable.  Edges are directed.  There are no directed cycles (directed acyclic graph).

Example: Bayesian Network Graph Cavity Catchtoothache

Bayesian Network A Bayesian network structure + For each node X, for each value x of X, a conditional probability P(X=x|Y 1 = y 1,…,Y k = y k ) = p for every assignment of values to the parents of X. Demo in AIspace tool

Example: Complete Bayesian Network

The Story You have a new burglar alarm installed at home. It’s reliable at detecting burglary but also responds to earthquakes. You have two neighbors that promise to call you at work when they hear the alarm. John always calls when he hears the alarm, but sometimes confuses alarm with telephone ringing. Mary listens to loud music and sometimes misses the alarm.

Bayesian Networks and Horn Clauses Let P(X=x|Y 1 = y 1,…,Y k = y k ) = p be a conditional probability specified in a BN. This can be interpreted as a probability clause P(X = x) = p  Y 1 = y 1,…,Y k = y k. Logical Horn Clause = special case where head has probability 1 (p = 100%). A Bayes net can be seen as a knowledge base containing probability clauses. For short, a probabilistic knowledge base.

Bayes Nets Encode the Joint Distribution

Bayes Nets and the Joint Distribution A Bayes net compactly encodes the joint distribution over the random variables X 1,…,X n. How? Let x 1,…,x n be a complete assignment of a value to each random variable. Then P(x 1,…,x n ) = Π P(x i |parents(X i )) where the index i=1,…,n runs over all n nodes. This is the product formula for Bayes nets.

Computing The Product In words, the joint probability is computed as follows. 1. For each node X i : 2. Find the assigned value x i. 3. Find the values y 1,..,y k assigned to the parents of X i. 4. Look up the conditional probability P(x i |y 1,..,y k ) in the Bayes net. 5. Multiply together these conditional probabilities.

Product Formula Example: Burglary Query: What is the joint probability that all variables are true? P(M, J, A, E, B) = P(M|A) p(J|A) p(A|E,B)P(E)P(B) =.7 x.9 x.95 x.002 x.001

Cavity Example Query: What is the joint probability that there is a cavity but no toothache and the probe doesn’t catch? P(Cavity = T, toothache = F, Catch = F) = P(Cavity= T) p(T = F|Cavity = T) p(Catch = F|Cavity = T) =.2 x.076 x 0.46

Compactness of Bayesian Networks Consider n binary variables Unconstrained joint distribution requires O(2 n ) probabilities If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2 k ) probabilities Example  Full unconstrained joint distribution  n = 30: need 10 9 probabilities for full joint distribution  Bayesian network  n = 30, k = 4: need 480 probabilities

Completeness of Bayes nets The Bayes net encodes all joint probabilities. Knowledge of all joint probabilities is sufficient to answer any probabilistic query.  A Bayes net can in principle answer every query.

Is it Magic? Why does the product formula work? 1. The Bayes net topological or graphical semantics.  The graph by itself entails conditional independencies. 2. The Chain Rule.

Bayes Nets Graphical Semantics

Common Causes: Spot the Pattern Cavity Catchtoothache Catch is independent of toothache given Cavity.

Burglary Example JohnCalls, MaryCalls are conditionally independent given Alarm.

Spot the Pattern: Chain Scenario MaryCalls is independent of Burglary given Alarm. JohnCalls is independent of Earthquake given Alarm.

The Markov Condition A Bayes net is constructed so that: each variable is conditionally independent of its nondescendants given its parents.  The graph alone (without specified probabilities) entails conditional independencies.

Derivation of the Product Formula

The Chain Rule We can always write P(a, b, c, … z) = P(a | b, c, …. z) P(b, c, … z) (Product Rule) Repeatedly applying this idea, we obtain P(a, b, c, … z) = P(a | b, c, …. z) P(b | c,.. z) P(c|.. z)..P(z) Order the variables such that children come before parents. Then given its parents, each node is independent of its other ancestors by the topological independence.  P(a,b,c, … z) = Π x. P(x|parents)

Example in Burglary Network P(M, J,A,E,B) = P(M| J,A,E,B) p(J,A,E,B)= P(M|A) p(J,A,E,B) = P(M|A) p(J|A,E,B) p(A,E,B) = P(M|A) p(J|A) p(A,E,B) = P(M|A) p(J|A) p(A|E,B) P(E,B) = P(M|A) p(J|A) p(A|E,B) P(E)P(B) Colours show applications of the Bayes net topological independence.

Explaining Away

Common Effects: Spot the Pattern Influenza and Smokes are independent. Given Bronchitis, they become dependent. InfluenzaSmokes Bronchitis Battery Age Charging System OK Battery Voltage Battery Age and Charging System are independent. Given Battery Voltage, they become dependent.

Conditioning on Children A B C Independent Causes: A and B are independent. “Explaining away” effect: Given C, observing A makes B less likely. E.g. Bronchitis in UBC “Simple Diagnostic Problem”. ⇒ A and B are (marginally) independent, become dependent once C is known. This pattern requires an extension of the Markov condition known as d- separation.

Mathematical Analysis Theorem: If A, B have no common ancestors and neither is a descendant of the other, then they are independent of each other. Proof for our example:  P(a,b) = Σ c P(a,b,c) = Σ c P(a) P(b) P(c|a,b)  Σ c P(a) P(b) P(c|a,b) = P(a) P(b) Σ c P(c|a,b) = P(a) P(b) A B C

Bayes’ Theorem

Abductive Reasoning Horn clauses are often causal, from cause to effect. Many important queries are diagnostic, from effect to cause. This reversal is difficult to capture with logical Horn clauses. Wumpus Stench Cavity Toothache

Bayes’ Theorem: Another Example A doctor knows the following. The disease meningitis causes the patient to have a stiff neck 50% of the time. The prior probability that someone has meningitis is 1/50,000. The prior that someone has a stiff neck is 1/20. Question: knowing that a person has a stiff neck what is the probability that they have meningitis?

Spot the Pattern: Diagnosis P(Cavity)P(Toothache|C avity) P(Toothache)P(Cavity|Tooth ache) P(Wumpus)P(Stench|Wum pus) P(Stench)P(Wumpus|Ste nch) P(Meningitis)P(Stiff Neck| Meningitis) P(Stiff Neck)P(Meningitis|S tiff Neck) 1/50,0001/21/200.6

Spot the Pattern: Diagnosis P(Cavity)xP(Toothach e|Cavity) /P(Toothache)=P(Cavity|Toot hache) P(Wumpus)xP(Stench|Wumpu s) /P(Stench)=P(Wumpus|S tench) P(Meningitis)xP(Stiff Neck| Meningitis) /P(Stiff Neck)=P(Meningitis| Stiff Neck) 1/50,0001/21/201/5,000

Explain the Pattern: Bayes’ Theorem Exercise: Prove Bayes’ Theorem P(A | B) = P(B | A) P(A) / P(B).

On Bayes’ Theorem P(a | b) = P(b | a) P(a) / P(b). Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect). Likelihood: how well does the cause explain the effect? Prior: how plausible is the explanation before any evidence? Evidence Term/Normaliz ation Constant: how surprising is the evidence?

Example for Bayes’ Theorem P(Wumpus|Stench) = P(Stench|Wumpus) x P(Wumpus) / P(Stench). Assume that P(Stench|Wumpus) = 1. Is P(Wumpus[1,3]|Stench[2,3]) > P(Wumpus[1,3])?

OTHER TOPICS

Inference and Learning Efficient Inference Algorithms exploit the graphical structure (see book). Much work on learning Bayes nets from data (including yours truly).

1 st -order Bayes nets Can we combine 1 st -order logic with Bayes nets? Basic idea: use nodes with 1 st -order variables, like Prolog Horn clauses. For inference, follow grounding approach to 1 st -order reasoning. Important open topic, many researchers working on this, including yours truly.

Summary Bayes nets represent probabilistic knowledge in a graphical way, with analogies to Horn clauses. Used in many applications and companies. The graph encodes dependencies (correlations) and independencies. Supports efficient probabilistic queries. Bayes’ theorem is a formula for abductive reasoning, from effect to cause.