Representations for KBS: Uncertainty & Decision Support

Slides:

Advertisements

Similar presentations

Modelling uncertainty in 3APL Johan Kwisthout Master Thesis

Advertisements

FT228/4 Knowledge Based Decision Support Systems

BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Making Simple Decisions

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.

Lahore University of Management Sciences, Lahore, Pakistan Dr. M.M. Awais- Computer Science Department 1 Lecture 12 Dealing With Uncertainty Probabilistic.

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Uncertainty in Expert Systems (Certainty Factors).

Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)

5/17/20151 Probabilistic Reasoning CIS 479/579 Bruce R. Maxim UM-Dearborn.

Introduction of Probabilistic Reasoning and Bayesian Networks

Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Causal and Bayesian Network (Chapter 2) Book: Bayesian Networks and Decision Graphs Author: Finn V. Jensen, Thomas D. Nielsen CSE 655 Probabilistic Reasoning.

M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

AI - Week 24 Uncertain Reasoning (quick mention) then REVISION Lee McCluskey, room 2/07

1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.

CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.

CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.

Lecture 05 Rule-based Uncertain Reasoning

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

Does Naïve Bayes always work?

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 

A Brief Introduction to Graphical Models

Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))

Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.

Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Introduction to Bayesian Networks

1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.

Announcements Project 4: Ghostbusters Homework 7

Uncertainty in Expert Systems

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Bayesian Inference Artificial Intelligence CMSC February 26, 2002.

CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.

Review of Probability.

Does Naïve Bayes always work?

ECE457 Applied Artificial Intelligence Fall 2007 Lecture #10

ECE457 Applied Artificial Intelligence Spring 2008 Lecture #10

Read R&N Ch Next lecture: Read R&N

Reasoning Under Uncertainty: More on BNets structure and construction

Artificial Intelligence Chapter 19

Read R&N Ch Next lecture: Read R&N

Uncertainty in AI.

Representing Uncertainty

CSE-490DF Robotics Capstone

Propagation Algorithm in Bayesian Networks

Biointelligence Lab School of Computer Sci. & Eng.

Directed Graphical Probabilistic Models: the sequel

Belief Networks CS121 – Winter 2003 Belief Networks.

Read R&N Ch Next lecture: Read R&N

Read R&N Ch Next lecture: Read R&N

Chapter 14 February 26, 2004.

Presentation transcript:

Representations for KBS: Uncertainty & Decision Support 6.871 - Knowledge Based Systems Tuesday March 20, 2006 Howard Shrobe Randall Davis

Outline The Other Problem with Mycin Brief review of history of uncertainty in AI Bayes Theorem Some tractable Bayesian situations Bayes Nets Decision Theory and Rational Choice

The Other Problem with Mycin In an earlier class we argued that Mycin used an extremely impoverished language for stating facts and rules (A-O-V triples) Here we argue that it’s notion of uncertainty was broken: In mycin the certainty factor for OR is Max CF (OR A B) = (Max (Cf A) (Cf B)) Consider Rule-1 IF A then C, certainty factor 1 Rule-2 If B then C, certainty factor 1 This is logically the same as: If (Or A B) then C, certainty factor 1 I

More Problems If CF(A) = .8 and CF(B) = .3 Then CF (C ) = .8 + .3 * (1 - .8) = .8 + .06 = .86 CF (OR A B) = (Max .8 .3) = .8 and CF(C ) = .8 IF A -> B, A -> C, B ->D, C->D there will also be a mistake: (why?) B A D C

History of Uncertainty Representations Probability tried and Rejected too many numbers Focus on Logical, qualitative reasoning by cases non-monotonic reasoning Numerical Approaches retried Certainty factors Dempster-Schafer Fuzzy Bayes Networks

Understanding Bayes Theorem Has it and Tests for it 10 • .95 = 9.5 Positive: .95 Test? Yes: 010 Has it and Doesn’t Test for it Has Cancer? Doesn’t Have it But Tests for it 990 • .05 = 49.5 No: 990 Test? Doesn’t Have it and doesn’t Test for it Number that test positive = 9.5 + 49.5 = 59 If you test positive your probability of having Cancer is 9.5 / 59 = 16.1 %

Reviewing Bayes Theorem Symptom S U Conditional Probability of S given D D S

Independence & Conditional Independence P(A) • P(B) = P(A&B) A varies the same within B as it does in the universe Conditional Independence within C: P(A|C) • P(B|C) = P(A&B|C) When we restrict attention to C, A and B are independent

Examples D A B A’ C B A and B are Independent A’ and B are Dependent A and B are conditionally Dependent, given C A’ and B are Conditionally Independent, given C.

“IDIOT” BAYES Model Single Disease Conditionally Independent Symptoms: SK Single Disease Conditionally Independent Symptoms: P(S1,S2|D) = P(S1|D) * P(S2|D) N Symptoms means N probabilities Without conditional independence need joint probabilities 2^N

Using Multiple Pieces of Evidence P(H|E1,E2) = P(E1,E2|H) · P(H) P(E1,E2) H E2 If you assume "conditional independence" between the evidence then this takes on a nice multiplicative form. Conditional Independence is the notion that the various pieces of evidence are statistically independent of one another, given that the hypothesis obtains, i.e. the hypothesis "separates" the different pieces of evidence: P(E1,E2|H) = P(E1|H) · P(E2|H) P(~E1,~E2|H) = P(~E1|H) · P(~E2|H) Without conditional independence you need to build up a very large database of joint probabilities and joint conditional probabilities.

Sequential Bayesian Inference Consider symptoms one by one Prior Probabilities P(Di) Observe Symptom Sj Updates Priors using Bayes Rule: Repeat for Other Symptoms using the resulting Posterior as the new Prior If symptoms are conditionally independent, same as doing it all at once Allows choice of what symptom to observe (test to perform) next in terms of cost/benefit.

Bipartite Graphs Multiple Symptoms, multiple diseases Diseases are probabilistically independent Symptoms are conditionally independent Symptoms probabilities depend only the diseases causing them Symptoms with multiple causes require joint probabilities P(S2|D1,D2,D3) Information explosion D1 D2 D3 S1 S2 S3 S4

Noisy OR A useful element in the modeling vocabulary Make the simplifying assumption that only 1 disease is present at a time: Probability that all diseases cause the symptom is just the probability that at least 1 does Therefore: Symptom is absent only if no disease caused it. 1 - P(S2|D1,D2,D3) = (1 - P(S2|D1)) * (1 - P(S2|D2)) * (1 - P(S2|D3)) P(S2|D1) = P(S2|D1), all other Di absent c Use Causal Probabilities for the basic data Reduces probability table size: if n diseases and k symptoms, from k2^n to nk

Polytrees What if diseases cause or influence each other? Are there still well behaved versions? Yes, Polytrees: At most one path between any two nodes Don’t have to worry about “double-counting” Efficient Sequential updating is still possible D1 D2 D3 S1 S5 S2 S4 S3

Bayes Nets B A E D C Directed Acyclic Graphs Absence of link --> conditional independence P(X1,...,Xn) = Product P(Xi|{parents (Xi)}) Specify joint probability tables over parents for each node Probability A,B,C,D,E all present: P(A,B,C,D,E) = P(A) * P(B|A) * P(C|A) * P(D|B,C) * P(E|C) Probability A,C,D present B,E absent:

Example Burglary Earthquake Alarm Radio Report Phone Call P(Call|Alarm) t f P(RadioReport|Earthquake) t f t .9 .01 t 1 f .1 .99 f 1 P(Alarm|B,E) t,t t,f f,t f,f t .8 .99 .6 .01 f .2 .01 .4 .99 16 vs. 32 probabilites

Computing with Partial Information B A E D C Probability that A present and E absent: Graph separators (e.g. C) correspond to factorizations General problem of finding separators is NP-hard

Odds Likelihood Formulation Define Odds as Define Likelihood as: Divide complementary instances of Bayes Rule: Bayes Rule is Then: In Logarithmic Form: Log Odds = Log Odd + Log Likelihood

Certainty Factors Rules A C D B Parallel Combination If A, then C (x) z C D B y Parallel Combination Rules If A, then C (x) If B, then C (x) If C, then D(x) x+y -xy x,y>0 x+y +xy x,y<0 otherwise CF(C) = Series Combination CF(C) = z • max(0, CF(C))

Issues with Certainty Factors Results obtained depend on order in which evidence is considered in some cases Reasoning is often fairly insensitive to them. 20% variations yield no change in MYCIN What do they mean? (in some cases the answer is:) Conditional Probability Likelihood Certainty Factor

Decision Making So far, what we’ve considered is how to use evidence to evaluate a situation. In many cases, this is only the first part of the problem What we want to do is to take actions to improve the situation But which action should we take? The one which is most likely to leave us in the best condition Decision Analysis helps us calculate which action that is

A Decision Making Problem There are two types of Urn U1 and U2 (80% are U1) U1 contains 4 Red balls and 6 Black balls U2 contains nine Red balls and one Black ball An urn is selected at random and you are to guess which type it is. You have several courses of action: Refuse to play No Payoff no cost Guess it is of type 1 $40 Payoff if right, $20 penalty if wrong Guess it is of type 2 $100 Payoff if right, $5 penalty if wrong Sample a ball $8 payment for the right to sample

Decision Flow Diagrams Decision Fork Chance Fork (e1,R, a1) $40.00 a1 (e1,R) -$20.00 $0.00 Refuse to Play -$5.00 a2 -$8.00 R $100.00 Make an Observation $40.00 B a1 -$20.00 No Observation -$5.00 (e1,B) a2 (e1,R, a1) $100.00 $40.00 a1 -$20.00 -$5.00 a2 $100.00

Expected Monetary Value Suppose there are several possible outcomes Each has a monetary payoff or penalty Each has a probability The Expected Monetary Value is the sum of the products of the monetary payoffs times their corresponding probabilities. .8 $40 .2 -$20 EMV = .8 · $40 + .2 · -$20 = $32 + (-$4) = $28 EMV is a normative notion of what a person who has no other biases (risk aversion, e.g.) should be willing to accept in exchange for the situation facing him. In the picture above, you should be indifferent to the choice of taking $28 or playing the game. Most people have some extra biases and these can be incorporated in the form of a utility function applied to the calculated value. A rational person should choose that course of action which has the highest EMV.

Averaging Out and Folding Back EMV of Decision Node is Max over all branches EMV of Chance Node is Probability Weighted Sum over all branches .8 .2 $40.00 -$20.00 -$5.00 $100.00 Action State A1 A2 A3 Probability U1 40 -5 0 .8 U2 -20 100 0 .2 EMV 28 16 0 1 $32.00 $28.00 -$4.00 $28.00 -$4.00 $16.00 $20.00

The Effect of Observation Bayes theorem is used to calculate probabilities at chance nodes following decision nodes that provide relevant evidence. R B a1 a2 $40.00 -$20.00 -$5.00 $100.00 U1 U2 Action State A1 A2 A3 Probability U1 40 -5 0 .8 U2 -20 100 0 .2 EMV 28 16 0 1 P(R) = P(R|U1) • P(U1) + P(R|U2) • P(U2) P(U1|R) = P(R|U1) • P(U1) / P(R)

Calculating the Updated Probabilities Initial Probabilities P(Outcome|State) State Outcome U1 U2 Red .4 .9 Black .6 .1 .8 .2 Joint & Marginal Probabilities P(Outcome & State) State Marginal Probability Outcome U1 U2 of Outcome Red .8 • .4 = .32 .2 • .9 = .18 .50 Black .8 • .6 = .48 .2 • .1 = .02 .50 Updated Probabilities P(State |Outcome) State Outcome U1 U2 Red .64 .36 Black .96 .04

Illustrating Evaluation .5 R B a1 a2 $100.00 $40.00 -$20.00 -$5.00 .64 .36 .96 .04 -$8.00 +25.60 +18.40 -7.20 -3.20 +32.80 16.40 +36.00 27.20 35.20 38.40 18.80 37.60 -.80 U1 U2 R .64 .36 .5 B .96 .04 .5 -4.04 -0.04 4.00

Final Value of Decision Flow Diagram (e1,R, a1) $40.00 a1 (e1,R) -$20.00 $0.00 Refuse to Play -$5.00 a2 $28.00 27.20 -$8.00 R $100.00 Make an Observation $40.00 B a1 -$20.00 No Observation $28.00 -$5.00 (e1,B) a2 (e1,R, a1) $100.00 $40.00 a1 -$20.00 -$5.00 a2 $100.00

Maximum Entropy Suppose there are several tests you can make. .1 .2 .5 .2 Several Competing Hypotheses Each with a Probability rating. Suppose there are several tests you can make. Each test can change the probability of some (or all) of the hypotheses (using Bayes Theorem). Each outcome of the test has a probability. We’re only interested in gathering information at this point Which test should you make? Entropy = Sum -2 · P(i) · Log P(i) is a standard measure of Information. For each outcome of a test calculate the change in entropy. Weigh this by the probability of that outcome. Sum these to get an expected change of entropy for the test.

Maximum Entropy (2) Chose that test which has the greatest expected change in entropy. This is equivalent to choosing the test which is most likely to provide the most information. Tests have different costs (sometimes quite drastic ones like life and death). Normalize the benefits by the costs and then make choice.