Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14.

Similar presentations


Presentation on theme: "CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14."— Presentation transcript:

1 CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14

2 Conditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability

3 Conditioning A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y P(Y) = SUM P(Y|z) P(z) A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y P(Y) = SUM P(Y|z) P(z)

4 Independence Independence of variables in a domain can dramatically reduce the amount of information necessary to specify the full joint distribution Assume dental scenario has three T/F conditionsAssume dental scenario has three T/F conditions –Toothache – yes / no –Catch – pick (does / does not) get caught –Cavity – yes / no 2 3 probabilities are required to cover all the cases2 3 probabilities are required to cover all the cases Independence of variables in a domain can dramatically reduce the amount of information necessary to specify the full joint distribution Assume dental scenario has three T/F conditionsAssume dental scenario has three T/F conditions –Toothache – yes / no –Catch – pick (does / does not) get caught –Cavity – yes / no 2 3 probabilities are required to cover all the cases2 3 probabilities are required to cover all the cases

5 Independence 2 3 probabilities are required to cover all the cases2 3 probabilities are required to cover all the cases Consider adding weather (four states) to this tableConsider adding weather (four states) to this table –For each weather condition, there are 8 dental conditions 8*4=32 cells 2 3 probabilities are required to cover all the cases2 3 probabilities are required to cover all the cases Consider adding weather (four states) to this tableConsider adding weather (four states) to this table –For each weather condition, there are 8 dental conditions 8*4=32 cells

6 Independence Rainy Cloudy Sunny Windy Rainy Cloudy Sunny Windy

7 Independence Conditional probability stipulates: P(dental condition and weather condition) = P(weather|dental) P(dental)P(dental condition and weather condition) = P(weather|dental) P(dental) Because weather and dentistry are independent P (weather | dental) = P (weather)P (weather | dental) = P (weather) P (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy) * P(toothache, catch, cavity) 4-cell table 8-cell table 12 cells totalP (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy) * P(toothache, catch, cavity) 4-cell table 8-cell table 12 cells total Conditional probability stipulates: P(dental condition and weather condition) = P(weather|dental) P(dental)P(dental condition and weather condition) = P(weather|dental) P(dental) Because weather and dentistry are independent P (weather | dental) = P (weather)P (weather | dental) = P (weather) P (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy) * P(toothache, catch, cavity) 4-cell table 8-cell table 12 cells totalP (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy) * P(toothache, catch, cavity) 4-cell table 8-cell table 12 cells total

8 Bayes’ Rule Useful when you know three things and need to know the fourth

9 Conditional independence Consider toothaches, the pick catching, and cavities A cavity causes the pick to catchA cavity causes the pick to catch A cavity causes toothachesA cavity causes toothaches A toothache doesn’t cause the pick to catchA toothache doesn’t cause the pick to catch The pick catching doesn’t cause a toothacheThe pick catching doesn’t cause a toothache Consider toothaches, the pick catching, and cavities A cavity causes the pick to catchA cavity causes the pick to catch A cavity causes toothachesA cavity causes toothaches A toothache doesn’t cause the pick to catchA toothache doesn’t cause the pick to catch The pick catching doesn’t cause a toothacheThe pick catching doesn’t cause a toothache Both are likely if you have a cavity, but neither causes the other… Catching and toothaches are not related…

10 Conditional independence Toothache and catch are independent given the presence or absence of a cavity If you know you have a cavity, there’s no reason to believe the toothache and the dentist’s pick are relatedIf you know you have a cavity, there’s no reason to believe the toothache and the dentist’s pick are related Toothache and catch are independent given the presence or absence of a cavity If you know you have a cavity, there’s no reason to believe the toothache and the dentist’s pick are relatedIf you know you have a cavity, there’s no reason to believe the toothache and the dentist’s pick are related

11 Conditional independence In general, when a single cause influences multiple effects, all of which are conditionally independent (given the cause) 2 n+1 2*n*(2 2 ) 8n Assuming binary variables

12 Wumpus Are there pits in (1,3) (2,2) (3,1) given breezes in (1,2) and (2,1)? One way to solve… Find the full joint distributionFind the full joint distribution –P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) Are there pits in (1,3) (2,2) (3,1) given breezes in (1,2) and (2,1)? One way to solve… Find the full joint distributionFind the full joint distribution –P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )

13 Find the full joint distribution Remember the product ruleRemember the product rule P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Solve this for all P and B values Remember the product ruleRemember the product rule P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 )P (P 1,1, …, P 4,4, B 1,1, B 1,2, B 2,1 ) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Solve this for all P and B values

14 Find the full joint distribution P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Givens:  the rules relating breezes to pits  each square contains a pit with probability = 0.2 –For any given P 1,1, …, P 4,4 setting with n pits  The rules of breezes tells us the value of P (B | P)  0.2 n * 0.8 (16-n) tells us the value of P(P) P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 )P(B 1,1, B 1,2, B 2,1 | P 1,1, …, P 4,4 ) P(P 1,1, …, P 4,4 ) –Givens:  the rules relating breezes to pits  each square contains a pit with probability = 0.2 –For any given P 1,1, …, P 4,4 setting with n pits  The rules of breezes tells us the value of P (B | P)  0.2 n * 0.8 (16-n) tells us the value of P(P)

15 Solving an instance We have the following facts: Query: P (P 1,3 | known, b) We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this –2 12 = 4096 terms must be summed We have the following facts: Query: P (P 1,3 | known, b) We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this –2 12 = 4096 terms must be summed

16 Solving an instance more quickly Independence The contents of [4,4] don’t affect the presence of a pit at [1,3]The contents of [4,4] don’t affect the presence of a pit at [1,3] Create Fringe and OtherCreate Fringe and Other –Breezes are conditionally independent of the Other variables Independence The contents of [4,4] don’t affect the presence of a pit at [1,3]The contents of [4,4] don’t affect the presence of a pit at [1,3] Create Fringe and OtherCreate Fringe and Other –Breezes are conditionally independent of the Other variables Fringe Other Query

17 Chapter 14 Probabilistic Reasoning First, Bayesian NetworksFirst, Bayesian Networks Then, InferenceThen, Inference Probabilistic Reasoning First, Bayesian NetworksFirst, Bayesian Networks Then, InferenceThen, Inference

18 Bayesian Networks Difficult to build a probability table with a large amount of data Independence and conditional independence seek to reduce complications (time) of building full joint distributionIndependence and conditional independence seek to reduce complications (time) of building full joint distribution Bayesian Network captures these dependencies Difficult to build a probability table with a large amount of data Independence and conditional independence seek to reduce complications (time) of building full joint distributionIndependence and conditional independence seek to reduce complications (time) of building full joint distribution Bayesian Network captures these dependencies

19 Bayesian Network Directed Acyclic Graph (DAG) Random variables are the nodesRandom variables are the nodes Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships Each node labeled with P(X i | Parents (X i ))Each node labeled with P(X i | Parents (X i )) Directed Acyclic Graph (DAG) Random variables are the nodesRandom variables are the nodes Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships Each node labeled with P(X i | Parents (X i ))Each node labeled with P(X i | Parents (X i ))

20 Another example Burglar Alarm Goes off when intruder (usually)Goes off when intruder (usually) Goes off during earthquake (sometimes)Goes off during earthquake (sometimes) Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmNeighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarm Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicNeighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to music Burglar Alarm Goes off when intruder (usually)Goes off when intruder (usually) Goes off during earthquake (sometimes)Goes off during earthquake (sometimes) Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmNeighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarm Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicNeighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to music

21 Another example Burglar Alarm Note the absence of Information about John and Mary’s errors. Note the presence of Conditional Probability Tables (CPTs)

22 Full joint distribution The Bayesian Network describes the full joint distribution P(X 1 = x 1 ^ X 2 = x 2 ^ … ^ X n = x n ) abbreviated as… P (x 1, x 2, …, x n ) The Bayesian Network describes the full joint distribution P(X 1 = x 1 ^ X 2 = x 2 ^ … ^ X n = x n ) abbreviated as… P (x 1, x 2, …, x n ) CPT

23 Burglar alarm example P (John calls, Mary calls, alarm goes off, no intruder or earthquake)

24 Constructing a Bayesian Network Top-down is more likely to workTop-down is more likely to work Causal rules are betterCausal rules are better Adding arcs is a judgment callAdding arcs is a judgment call –Consider decision not to add error info about John/Mary Top-down is more likely to workTop-down is more likely to work Causal rules are betterCausal rules are better Adding arcs is a judgment callAdding arcs is a judgment call –Consider decision not to add error info about John/Mary

25 Conditional distributions It can be time consuming to fill up all the CPTs of discrete random variables Sometimes standard templates can be usedSometimes standard templates can be used –The canonical 20% of the work takes 80% of the time Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table –A V B V C => D It can be time consuming to fill up all the CPTs of discrete random variables Sometimes standard templates can be usedSometimes standard templates can be used –The canonical 20% of the work takes 80% of the time Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table –A V B V C => D

26 Conditional distributions Continuous random variables DiscretizationDiscretization –Subdivide continuous region into a fixed set of intervals  Where do you put the regions? Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs) –Gaussian, where only mean and variance need to be specified Continuous random variables DiscretizationDiscretization –Subdivide continuous region into a fixed set of intervals  Where do you put the regions? Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs) –Gaussian, where only mean and variance need to be specified

27 Conditional distributions Mixing discrete and continuous Example: Probability I buy fruit is a function of its costProbability I buy fruit is a function of its cost Its cost is a function of the harvest quality and the presence of government subsidiesIts cost is a function of the harvest quality and the presence of government subsidies How do we mix the items? Mixing discrete and continuous Example: Probability I buy fruit is a function of its costProbability I buy fruit is a function of its cost Its cost is a function of the harvest quality and the presence of government subsidiesIts cost is a function of the harvest quality and the presence of government subsidies How do we mix the items? Continuous Discrete

28 Hybrid Bayesians P(Cost | Harvest, Subsidy) P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy) P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy) P(Cost | Harvest, Subsidy) P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy) P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy) Enumerate the discrete choices

29 Hybrid Bayesians How does Cost change as a function of Harvest? Linear GaussianLinear Gaussian –Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation is constant How does Cost change as a function of Harvest? Linear GaussianLinear Gaussian –Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation is constant Need two of these… One for each subsidy

30

31 Multivariate Gaussian A network of continuous variables with linear Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the variables A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s meansA surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s means It drops off in all directions from the meanIt drops off in all directions from the mean A network of continuous variables with linear Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the variables A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s meansA surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s means It drops off in all directions from the meanIt drops off in all directions from the mean

32 Conditional Gaussian Adding discrete variables to a multivariate Gaussian results in a conditional Gaussian Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate GaussianGiven any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussian Adding discrete variables to a multivariate Gaussian results in a conditional Gaussian Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate GaussianGiven any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussian

33 Continuous variables with discrete parents Either you buy or you don’t But there is a soft threshold around your desired costBut there is a soft threshold around your desired cost Either you buy or you don’t But there is a soft threshold around your desired costBut there is a soft threshold around your desired cost

34 Thresholding functions Probit Logit Mean = 0.6, Std. Dev. = 1.0

35 Inference in Bayesian Networks First we talk about computing this exactly Will be shown to be intractable in many casesWill be shown to be intractable in many cases Later, we’ll talk about approximations First we talk about computing this exactly Will be shown to be intractable in many casesWill be shown to be intractable in many cases Later, we’ll talk about approximations

36 Inference by enumeration What is the probability of a query variable, X given a set of evidence variables e (E 1, …, E m ) P (X | e)P (X | e) Let Y represent the hidden variables P (X | e) =  P(X, e) =  SUM y P (X, e, y)P (X | e) =  P(X, e) =  SUM y P (X, e, y) We had solved this by walking through the full joint distribution. The Bayesian Network provides another way What is the probability of a query variable, X given a set of evidence variables e (E 1, …, E m ) P (X | e)P (X | e) Let Y represent the hidden variables P (X | e) =  P(X, e) =  SUM y P (X, e, y)P (X | e) =  P(X, e) =  SUM y P (X, e, y) We had solved this by walking through the full joint distribution. The Bayesian Network provides another way

37 Inference by enumeration Compute sums of products of conditional probabilities from the network P (Burglary | JohnCalls=true, MaryCalls=True)P (Burglary | JohnCalls=true, MaryCalls=True) –Hidden variables = Earthquake and Alarm –P (B | j, m) =  P (B, j, m) =  SUM e SUM a P (B, e, a, j, m) –Add four numbers composed of 5 products Network with n Booleans requires n2 n computations!!!Network with n Booleans requires n2 n computations!!! Compute sums of products of conditional probabilities from the network P (Burglary | JohnCalls=true, MaryCalls=True)P (Burglary | JohnCalls=true, MaryCalls=True) –Hidden variables = Earthquake and Alarm –P (B | j, m) =  P (B, j, m) =  SUM e SUM a P (B, e, a, j, m) –Add four numbers composed of 5 products Network with n Booleans requires n2 n computations!!!Network with n Booleans requires n2 n computations!!!

38 Variable elimination algorithm There are ways to reduce computation costs Move variables outside of the summationMove variables outside of the summation Use dynamic programming to store work you’ve done for future useUse dynamic programming to store work you’ve done for future use Will this always help? There are ways to reduce computation costs Move variables outside of the summationMove variables outside of the summation Use dynamic programming to store work you’ve done for future useUse dynamic programming to store work you’ve done for future use Will this always help?

39 Complexity of exact inference Polytree (or singly connected): there is one undirected path between any two nodes Time in space complexity is linear in size of networkTime in space complexity is linear in size of network Multiply connected Can have exponential costsCan have exponential costs In practice, people try to cluster nodes of the network to make it a polytreeIn practice, people try to cluster nodes of the network to make it a polytree Polytree (or singly connected): there is one undirected path between any two nodes Time in space complexity is linear in size of networkTime in space complexity is linear in size of network Multiply connected Can have exponential costsCan have exponential costs In practice, people try to cluster nodes of the network to make it a polytreeIn practice, people try to cluster nodes of the network to make it a polytree

40 Solving an instance more quickly Independence Use conditional independence of bUse conditional independence of bIndependence

41 Solving an instance more quickly Set up the use of independence Apply conditional independence Set up the use of independence Apply conditional independence

42 Solving an instance more quickly Move summation inwards Use absolute independence Move summation inwards Use absolute independence

43 Solving an instance more quickly Do some additional reorganization Reduced to four terms to sum over Do some additional reorganization Reduced to four terms to sum over

44


Download ppt "CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapters 13 and 14 Lecture 15 Uncertainty Chapters 13 and 14."

Similar presentations


Ads by Google