CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Local structures; Causal Independence, Context-sepcific independance COMPSCI 276 Fall 2007.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
1 CMSC 471 Fall 2002 Class #19 – Monday, November 4.
Lecture II-2: Probability Review
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Baye’s Rule.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 7: 29/10/2004.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Warm-up as you walk in Each node in a Bayes net represents a conditional probability distribution. What distribution do you get when you multiply all of.
Chapter 14 February 26, 2004.
Presentation transcript:

CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

TA Office Hours Chris cannot attend today’s office hours He will be available Wed, 3:30 – 4:30 Chris cannot attend today’s office hours He will be available Wed, 3:30 – 4:30

Conditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability

Conditioning A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y P(Y) = SUM P(Y|z) P(z) A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y P(Y) = SUM P(Y|z) P(z)

Independence Independence of variables in a domain can dramatically reduce the amount of information necessary to specify the full joint distribution

Bayes’ Rule

Conditional independence In general, when a single cause influences multiple effects, all of which are conditionally independent (given the cause) 2 n+1 2*n*(2 2 ) = 8n Assuming binary variables

Wumpus Are there pits in (1,3) (2,2) (3,1) given breezes in (1,2) and (2,1)? One way to solve… Find the full joint distributionFind the full joint distribution –P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) Are there pits in (1,3) (2,2) (3,1) given breezes in (1,2) and (2,1)? One way to solve… Find the full joint distributionFind the full joint distribution –P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )

Find the full joint distribution Remember the product ruleRemember the product rule P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Solve this for all P and B values Remember the product ruleRemember the product rule P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Solve this for all P and B values

Find the full joint distribution P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Givens:  the rules relating breezes to pits  each square contains a pit with probability = 0.2 –For any given P 1,1, …, P 4,4 setting with n pits  The rules of breezes tells us the value of P (B | P)  0.2 n * 0.8 (16-n) tells us the value of P(P) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Givens:  the rules relating breezes to pits  each square contains a pit with probability = 0.2 –For any given P 1,1, …, P 4,4 setting with n pits  The rules of breezes tells us the value of P (B | P)  0.2 n * 0.8 (16-n) tells us the value of P(P)

Solving an instance We have the following facts: Query: P (P 1,3 | known, b) We have the following facts: Query: P (P 1,3 | known, b) P?

Solving an instance Query: P (P 1,3 | known, b)

Solving: P (P 1,3 | known, b) We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this –2 12 = 4096 terms must be summed We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this –2 12 = 4096 terms must be summed P?

Solving an instance more quickly Independence The contents of [4,4] don’t affect the presence of a pit at [1,3]The contents of [4,4] don’t affect the presence of a pit at [1,3] Create Fringe and OtherCreate Fringe and Other –Fringe = Pitness of cells on fringe –Other = Pitness of cells in other –Breezes are conditionally independent of the Other variables Independence The contents of [4,4] don’t affect the presence of a pit at [1,3]The contents of [4,4] don’t affect the presence of a pit at [1,3] Create Fringe and OtherCreate Fringe and Other –Fringe = Pitness of cells on fringe –Other = Pitness of cells in other –Breezes are conditionally independent of the Other variables Fringe Other Query

Independence (by Bayes and summing out) (by independence of fringe and other)

Independence (relocate summation) (by independence) (relocate summation) (new alpha & sum  1)

Independence 4096 terms dropped to 4 Fringe has two cells, four possible pitness combinationsFringe has two cells, four possible pitness combinations 4096 terms dropped to 4 Fringe has two cells, four possible pitness combinationsFringe has two cells, four possible pitness combinations

Chapter 14 Probabilistic Reasoning First, Bayesian NetworksFirst, Bayesian Networks Then, InferenceThen, Inference Probabilistic Reasoning First, Bayesian NetworksFirst, Bayesian Networks Then, InferenceThen, Inference

Bayesian Networks Difficult to build a probability table with a large amount of data Independence and conditional independence seek to reduce complications (time) of building full joint distributionIndependence and conditional independence seek to reduce complications (time) of building full joint distribution Bayesian Network captures these dependencies Difficult to build a probability table with a large amount of data Independence and conditional independence seek to reduce complications (time) of building full joint distributionIndependence and conditional independence seek to reduce complications (time) of building full joint distribution Bayesian Network captures these dependencies

Bayesian Network Directed Acyclic Graph (DAG) Random variables are the nodesRandom variables are the nodes Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships Each node labeled with P(X i | Parents (X i ))Each node labeled with P(X i | Parents (X i )) Directed Acyclic Graph (DAG) Random variables are the nodesRandom variables are the nodes Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships Each node labeled with P(X i | Parents (X i ))Each node labeled with P(X i | Parents (X i ))

Another example Burglar Alarm Goes off when intruder (usually)Goes off when intruder (usually) Goes off during earthquake (sometimes)Goes off during earthquake (sometimes) Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmNeighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarm Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicNeighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to music Burglar Alarm Goes off when intruder (usually)Goes off when intruder (usually) Goes off during earthquake (sometimes)Goes off during earthquake (sometimes) Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmNeighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarm Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicNeighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to music

Another example Burglar Alarm Note the absence of Information about John and Mary’s errors. Note the presence of Conditional Probability Tables (CPTs)

Full joint distribution The Bayesian Network describes the full joint distribution P(X 1 = x 1 ^ X 2 = x 2 ^ … ^ X n = x n ) abbreviated as… P (x 1, x 2, …, x n ) = The Bayesian Network describes the full joint distribution P(X 1 = x 1 ^ X 2 = x 2 ^ … ^ X n = x n ) abbreviated as… P (x 1, x 2, …, x n ) = CPT

Burglar alarm example P (John calls, Mary calls, alarm goes off, no burglar or earthquake)

Constructing a Bayesian Network Top-down is more likely to workTop-down is more likely to work Causal rules are betterCausal rules are better Adding arcs is a judgment callAdding arcs is a judgment call –Consider decision not to add error info about John/Mary  No reference to telephones or music playing in network Top-down is more likely to workTop-down is more likely to work Causal rules are betterCausal rules are better Adding arcs is a judgment callAdding arcs is a judgment call –Consider decision not to add error info about John/Mary  No reference to telephones or music playing in network

Conditional distributions It can be time consuming to fill up all the CPTs of discrete random variables Sometimes standard templates can be usedSometimes standard templates can be used –The canonical 20% of the work solves 80% of the problem  Thanks Pareto and Juran Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table –A V B V C => D It can be time consuming to fill up all the CPTs of discrete random variables Sometimes standard templates can be usedSometimes standard templates can be used –The canonical 20% of the work solves 80% of the problem  Thanks Pareto and Juran Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table –A V B V C => D

Conditional distributions Continuous random variables DiscretizationDiscretization –Subdivide continuous region into a fixed set of intervals  Where do you put the regions? Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs) –P(X) = Gaussian, where only mean and variance need to be specified Continuous random variables DiscretizationDiscretization –Subdivide continuous region into a fixed set of intervals  Where do you put the regions? Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs) –P(X) = Gaussian, where only mean and variance need to be specified

Conditional distributions Mixing discrete and continuous Example: Probability I buy fruit is a function of its costProbability I buy fruit is a function of its cost Its cost is a function of the harvest quality and the presence of government subsidiesIts cost is a function of the harvest quality and the presence of government subsidies How do we mix the items? Mixing discrete and continuous Example: Probability I buy fruit is a function of its costProbability I buy fruit is a function of its cost Its cost is a function of the harvest quality and the presence of government subsidiesIts cost is a function of the harvest quality and the presence of government subsidies How do we mix the items? Continuous Discrete

Hybrid Bayesians P(Cost | Harvest, Subsidy) P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy) P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy) P(Cost | Harvest, Subsidy) P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy) P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy) Enumerate the discrete choices

Hybrid Bayesians How does Cost change as a function of Harvest? Linear GaussianLinear Gaussian –Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation is constant How does Cost change as a function of Harvest? Linear GaussianLinear Gaussian –Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation is constant Need two of these… One for each subsidy

Multivariate Gaussian A network of continuous variables with linear Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the variables A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s meansA surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s means It drops off in all directions from the meanIt drops off in all directions from the mean A network of continuous variables with linear Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the variables A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s meansA surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s means It drops off in all directions from the meanIt drops off in all directions from the mean

Conditional Gaussian Adding discrete variables to a multivariate Gaussian results in a conditional Gaussian Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate GaussianGiven any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussian Adding discrete variables to a multivariate Gaussian results in a conditional Gaussian Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate GaussianGiven any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussian

Discrete variables with cont. parents Either you buy or you don’t But there is a soft threshold around your desired costBut there is a soft threshold around your desired cost Either you buy or you don’t But there is a soft threshold around your desired costBut there is a soft threshold around your desired cost

Discrete variables with cont. parents Normal Dist.