Uncertainty Russell and Norvig: Chapter 14, 15 Koller article on BNs CMCS424 Spring 2002 April 23.

Slides:



Advertisements
Similar presentations
Uncertainty. environment Uncertain Agent agent ? sensors actuators ? ? ? model.
Advertisements

Learning with Missing Data
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Learning: Parameter Estimation
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Uncertainty Russell and Norvig: Chapter 13 CMCS424 Fall 2003 based on material from Jean-Claude Latombe, Daphne Koller and Nir Friedman.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Uncertain knowledge and reasoning
Bayesian Networks - Intro - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG.
Graphical Models - Inference -
CPSC 422 Review Of Probability Theory.
Probability.
1 Reasoning in Uncertain Situations 8a 8.0Introduction 8.1Logic-Based Abductive Inference 8.2Abduction: Alternatives to Logic 8.3The Stochastic Approach.
Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Belief Networks
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Read R&N Ch Next lecture: Read R&N
A Brief Introduction to Graphical Models
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Uncertainty1 Uncertainty Russell and Norvig: Chapter 14 Russell and Norvig: Chapter 13 CS121 – Winter 2003.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
COMP 538 Reasoning and Decision under Uncertainty Introduction Readings: Pearl (1998, Chapter 1 Shafer and Pearl, Chapter 1.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty in AI. Birds can fly, right? Seems like common sense knowledge.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Guidance: Assignment 3 Part 1 matlab functions in statistics toolbox  betacdf, betapdf, betarnd, betastat, betafit.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Introduction to Artificial Intelligence
Learning Bayesian Network Models from Data
Introduction to Artificial Intelligence
Uncertainty.
Read R&N Ch Next lecture: Read R&N
Uncertainty in AI.
Introduction to Artificial Intelligence
CSE-490DF Robotics Capstone
Bayesian Networks Based on
Read R&N Ch Next lecture: Read R&N
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Presentation transcript:

Uncertainty Russell and Norvig: Chapter 14, 15 Koller article on BNs CMCS424 Spring 2002 April 23

environment Uncertain Agent agent ? sensors actuators ? ? ? model

An Old Problem …

Types of Uncertainty Uncertainty in prior knowledge E.g., some causes of a disease are unknown and are not represented in the background knowledge of a medical-assistant agent

Types of Uncertainty Uncertainty in prior knowledge E.g., some causes of a disease are unknown and are not represented in the background knowledge of a medical-assistant agent Uncertainty in actions E.g., actions are represented with relatively short lists of preconditions, while these lists are in fact arbitrary long For example, to drive my car in the morning: It must not have been stolen during the night It must not have flat tires There must be gas in the tank The battery must not be dead The ignition must work I must not have lost the car keys No truck should obstruct the driveway I must not have suddenly become blind or paralytic Etc… Not only would it not be possible to list all of them, but would trying to do so be efficient?

Types of Uncertainty Uncertainty in prior knowledge E.g., some causes of a disease are unknown and are not represented in the background knowledge of a medical-assistant agent Uncertainty in actions E.g., actions are represented with relatively short lists of preconditions, while these lists are in fact arbitrary long Uncertainty in perception E.g., sensors do not return exact or complete information about the world; a robot never knows exactly its position Courtesy R. Chatila

Types of Uncertainty Uncertainty in prior knowledge E.g., some causes of a disease are unknown and are not represented in the background knowledge of a medical-assistant agent Uncertainty in actions E.g., actions are represented with relatively short lists of preconditions, while these lists are in fact arbitrary long Uncertainty in perception E.g., sensors do not return exact or complete information about the world; a robot never knows exactly its position Sources of uncertainty: 1.Ignorance 2.Laziness (efficiency?) What we call uncertainty is a summary of all that is not explicitly taken into account in the agent’s KB

Questions How to represent uncertainty in knowledge? How to perform inferences with uncertain knowledge? Which action to choose under uncertainty?

How do we deal with uncertainty? Implicit: Ignore what you are uncertain of when you can Build procedures that are robust to uncertainty Explicit: Build a model of the world that describe uncertainty about its state, dynamics, and observations Reason about the effect of actions given the model

Handling Uncertainty Approaches: 1. Default reasoning 2. Worst-case reasoning 3. Probabilistic reasoning

Default Reasoning Creed: The world is fairly normal. Abnormalities are rare So, an agent assumes normality, until there is evidence of the contrary E.g., if an agent sees a bird x, it assumes that x can fly, unless it has evidence that x is a penguin, an ostrich, a dead bird, a bird with broken wings, …

Representation in Logic BIRD(x)   AB F (x)  FLIES(x) PENGUINS(x)  AB F (x) BROKEN-WINGS(x)  AB F (x) BIRD(Tweety) … Default rule: Unless AB F (Tweety) can be proven True, assume it is False But what to do if several defaults are contradictory? Which ones to keep? Which one to reject? Very active research field in the 80’s  Non-monotonic logics: defaults, circumscription, closed-world assumptions Applications to databases

Worst-Case Reasoning Creed: Just the opposite! The world is ruled by Murphy’s Law Uncertainty is defined by sets, e.g., the set possible outcomes of an action, the set of possible positions of a robot The agent assumes the worst case, and chooses the actions that maximizes a utility function in this case Example: Adversarial search

Probabilistic Reasoning Creed: The world is not divided between “normal” and “abnormal”, nor is it adversarial. Possible situations have various likelihoods (probabilities) The agent has probabilistic beliefs – pieces of knowledge with associated probabilities (strengths) – and chooses its actions to maximize the expected value of some utility function

How do we represent Uncertainty? We need to answer several questions: What do we represent & how we represent it? What language do we use to represent our uncertainty? What are the semantics of our representation? What can we do with the representations? What queries can be answered? How do we answer them? How do we construct a representation? Can we ask an expert? Can we learn from data?

Target Tracking Example Maximization of worst-case value of utility vs. of expected value of utility target robot Utility = escape time of target

Probability A well-known and well-understood framework for uncertainty Clear semantics Provides principled answers for: Combining evidence Predictive & Diagnostic reasoning Incorporation of new evidence Intuitive (at some level) to human experts Can be learned

Axioms of probability Notion of Probability The probability of a proposition A is a real number P(A) between 0 and 1 P(True) = 1 and P(False) = 0 P(AvB) = P(A) + P(B) - P(A  B) You drive on Rt 1 to UMD often, and you notice that 70% of the times there is a traffic slowdown at the intersection of PaintBranch & Rt 1. The next time you plan to drive on Rt 1, you will believe that the proposition “there is a slowdown at the intersection of PB & Rt 1” is True with probability 0.7 P(Av  A) = P(A)+P(  A)-P(A   A) P(True) = P(A)+P(  A)-P(False) 1 = P(A) + P(  A) So: P(A) = 1 - P(  A)

Frequency Interpretation Draw a ball from a bag containing n balls of the same size, r red and s yellow. The probability that the proposition A = “the ball is red” is true corresponds to the relative frequency with which we expect to draw a red ball  P(A) = r/n

Subjective Interpretation There are many situations in which there is no objective frequency interpretation: On a windy day, just before paragliding from the top of El Capitan, you say “there is probability 0.05 that I am going to die” You have worked hard on your AI class and you believe that the probability that you will get an A is 0.9

Random Variables A proposition that takes the value True with probability p and False with probability 1-p is a random variable with distribution (p,1-p) If a bag contains balls having 3 possible colors – red, yellow, and blue – the color of a ball picked at random from the bag is a random variable with 3 possible values The (probability) distribution of a random variable X with n values x 1, x 2, …, x n is: (p 1, p 2, …, p n ) with P(X=x i ) = p i and  i=1,…,n p i = 1

Expected Value Random variable X with n values x 1,…,x n and distribution (p 1,…,p n ) E.g.: X is the state reached after doing an action A under uncertainty Function U of X E.g., U is the utility of a state The expected value of U after doing A is E[U] =  i=1,…,n p i U(x i )

Joint Distribution k random variables X 1, …, X k The joint distribution of these variables is a table in which each entry gives the probability of one combination of values of X 1, …, X k Example: P(Cavity  Toothache) P(  Cavity  Toothache) Toothache  Toothache Cavity  Cavity

Joint Distribution Says It All P( Toothache ) = P( (Toothache  Cavity) v (Toothache  Cavity) ) = P( Toothache  Cavity ) + P( Toothache  Cavity ) = = 0.05 P( Toothache v Cavity ) = P( (Toothache  Cavity) v (Toothache  Cavity) v (  Toothache  Cavity) ) = = 0.11 Toothache  Toothache Cavity  Cavity

Conditional Probability Definition: P(A|B) =P(A  B) / P(B) Read P(A|B): probability of A given B can also write this as: P(A  B) = P(A|B) P(B) called the product rule

Example P(Cavity|Toothache) = P(Cavity  Toothache) / P(Toothache) P(Cavity  Toothache) = ? P(Toothache) = ? P(Cavity|Toothache) = 0.04/0.05 = 0.8 Toothache  Toothache Cavity  Cavity

Generalization P(A  B  C) = P(A|B,C) P(B|C) P(C)

Bayes’ Rule P(A  B) = P(A|B) P(B) = P(B|A) P(A) P(B|A) = P(A|B) P(B) P(A)

Example Toothache  Toothache Cavity  Cavity

Generalization P(A  B  C) = P(A  B|C) P(C) = P(A|B,C) P(B|C) P(C) P(A  B  C) = P(A  B|C) P(C) = P(B|A,C) P(A|C) P(C) P(B|A,C) = P(A|B,C) P(B|C) P(A|C)

Representing Probability Naïve representations of probability run into problems. Example: Patients in hospital are described by several attributes:  Background: age, gender, history of diseases, …  Symptoms: fever, blood pressure, headache, …  Diseases: pneumonia, heart attack, … A probability distribution needs to assign a number to each combination of values of these attributes 20 attributes require 10 6 numbers Real examples usually involve hundreds of attributes

Practical Representation Key idea -- exploit regularities Here we focus on exploiting conditional independence properties

A Bayesian Network The “ICU alarm” network 37 variables, 509 parameters (instead of 2 37 ) PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATIONPULMEMBOLUS PAPSHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTHTPR LVFAILURE ERRBLOWOUTPUT STROEVOLUMELVEDVOLUME HYPOVOLEMIA CVP BP

Independent Random Variables Two variables X and Y are independent if P(X = x|Y = y) = P(X = x) for all values x,y That is, learning the values of Y does not change prediction of X If X and Y are independent then P(X,Y) = P(X|Y)P(Y) = P(X)P(Y) In general, if X 1,…,X n are independent, then P(X 1,…,X n )= P(X 1 )...P(X n ) Requires O(n) parameters

Conditional Independence Propositions A and B are (conditionally) independent iff: P(A|B) = P(A)  P(A  B) = P(A) P(B) A and B are independent given C iff: P(A|B,C) = P(A|C)  P(A  B|C) = P(A|C) P(B|C)

Conditional Independence Unfortunately, random variables of interest are not independent of each other A more suitable notion is that of conditional independence Two variables X and Y are conditionally independent given Z if P(X = x|Y = y,Z=z) = P(X = x|Z=z) for all values x,y,z That is, learning the values of Y does not change prediction of X once we know the value of Z notation: Ind( X ; Y | Z )

Car Example Three propositions: Gas Battery Starts P(Battery|Gas) = P(Battery) Gas and Battery are independent P(Battery|Gas,Starts) ≠ P(Battery|Starts) Gas and Battery are not independent given Starts

Example: Naïve Bayes Model A common model in early diagnosis: Symptoms are conditionally independent given the disease (or fault) Thus, if X 1,…,X n denote whether the symptoms exhibited by the patient (headache, high-fever, etc.) and H denotes the hypothesis about the patients health then, P(X 1,…,X n,H) = P(H)P(X 1 |H)…P(X n |H), This naïve Bayesian model allows compact representation It does embody strong independence assumptions

Markov Assumption We now make this independence assumption more precise for directed acyclic graphs (DAGs) Each random variable X, is independent of its non- descendents, given its parents Pa(X) Formally, Ind(X; NonDesc(X) | Pa(X)) Descendent Ancestor Parent Non-descendent X Y1Y1 Y2Y2

Markov Assumption Example In this example: Ind( E; B ) Ind( B; E, R ) Ind( R; A, B, C | E ) Ind( A; R | B, E ) Ind( C; B, E, R | A) Earthquake Radio Burglary Alarm Call

I-Maps A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P Examples: XYXY

Factorization Given that G is an I-Map of P, can we simplify the representation of P ? Example: Since Ind(X;Y), we have that P(X|Y) = P(X) Applying the chain rule P(X,Y) = P(X|Y) P(Y) = P(X) P(Y) Thus, we have a simpler representation of P(X,Y) XY

Factorization Theorem Thm: if G is an I-Map of P, then

Factorization Example P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A) Earthquake Radio Burglary Alarm Call

Consequences We can write P in terms of “local” conditional probabilities If G is sparse, that is, | Pa(X i ) | < k,  each conditional probability can be specified compactly e.g. for binary variables, these require O(2 k ) params.  representation of P is compact linear in number of variables

Bayesian Networks A Bayesian network specifies a probability distribution via two components: A DAG G A collection of conditional probability distributions P(X i |Pa i ) The joint distribution P is defined by the factorization Additional requirement: G is a minimal I-Map of P

Bayesian Networks nodes = random variables edges = direct probabilistic influence Network structure encodes independence assumptions: XRay conditionally independent of Pneumonia given Infiltrates XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia

Bayesian Networks XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia Each node X i has a conditional probability distribution P(X i |Pa i ) If variables are discrete, P is usually multinomial P can be linear Gaussian, mixture of Gaussians, … p t p tp t t p T P P(I |P, T )

BN Semantics Compact & natural representation: nodes have  k parents  2 k n vs. 2 n params conditional independencies in BN structure + local probability models full joint distribution over domain = X I S TP

Queries Full joint distribution specifies answer to any query: P(variable | evidence about others) XRay Lung Infiltrates Sputum Smear TuberculosisPneumonia XRay Sputum Smear

BN Learning BN models can be learned from empirical data parameter estimation via numerical optimization structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure. Inducer Data X I S TP

Questions How to represent uncertainty in knowledge? How to perform inferences with uncertain knowledge? Which action to choose under uncertainty? If a goal is terribly important, an agent may be better off choosing a less efficient, but less uncertain action than a more efficient one But if the goal is also extremely urgent, and the less uncertain action is deemed too slow, then the agent may take its chance with the faster, but more uncertain action

Summary Types of uncertainty Default/worst-case/probabilistic reasoning Probability Theory Bayesian Networks Making decisions under uncertainty Exciting Research Area!

References Russell & Norvig, chapters 14, 15 Daphne Koller’s BN notes, available from the class web page Jean-Claude Latombe’s excellent lecture notes, inter02/home.htm inter02/home.htm Nir Friedman’s excellent lecture notes,

Questions How to represent uncertainty in knowledge? How to perform inferences with uncertain knowledge? When a doctor receives lab analysis results for some patient, how do they change his prior knowledge about the health condition of this patient?

Example: Robot Navigation Uncertainty in control Courtesy S. Thrun (Control uncertainty cone)

initial position uncertainty Worst-Case Planning power station control uncertainty cone robot

Target Tracking Example 1.Open-loop vs. closed-loop strategy target robot

Target Tracking Example 1.Open-loop vs. closed-loop strategy 2.Off-line vs. on-line planning/reasoning target robot

Target Tracking Example 1.Open-loop vs. closed-loop strategy 2.Off-line vs. on-line planning/reasoning 3.Maximization of worst-case value of utility vs. of expected value of utility target robot Utility = escape time Of the target