Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
10/10 Mid-term will be on 10/26 –Homework will be due 10/19 Project 2 is due 10/17 –At least one recitation session will be held before midterm –People.
Solving problems using propositional logic Need to write what you know as propositional formulas Theorem proving will then tell you whether a given new.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
CPSC 422 Review Of Probability Theory.
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Inference rules Sound (but incomplete) –Modus Ponens A=>B, A |= B –Modus tollens A=>B,~B |= ~A –Abduction (??) A => B,~A |= ~B –Chaining A=>B,B=>C |= A=>C.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
10/8. Complexity of Propositional Inference Any sound and complete inference procedure has to be Co-NP- Complete (since model-theoretic entailment computation.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
11/18 Everything is fine.. Everything is fine… Everything is fine…
10/15  Homework 3 due 10/17  Mid-term on 10/24  Project 2 accepted until 10/26 (4 days automatic extension)
17 th October --Project 2 can be submitted until Thursday (in-class) --Homework 3 due Thursday --Midterm next Thursday (10/26)
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
Probabilistic Propositional Logic Nov 6 th. Need for modeling uncertainity Consider a simple scenario: You know that rain makes grass wet. Sprinklers.
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Advanced Artificial Intelligence
Bayes Nets and Probabilities
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Announcements Project 4: Ghostbusters Homework 7
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 188: Artificial Intelligence Spring 2007
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Uncertainty Chapter 13.
Conditional Probability, Bayes’ Theorem, and Belief Networks
Uncertainty.
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Probabilistic Reasoning
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Bayesian Networks CSE 573.
Uncertainty Chapter 13.
Presentation transcript:

Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you) So, Joint Probability Distribution is all that you ever need! In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers) --In general 2 n separate numbers (which should add up to 1) If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with? --Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2 n numbers ---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take! Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls Only 10 (instead of 32) numbers to specify!

Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you) So, Joint Probability Distribution is all that you ever need! In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers) --In general 2 n separate numbers (which should add up to 1) If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with? --Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2 n numbers ---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take! Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls Only 10 (instead of 32) numbers to specify!

Easy Special Cases If in addition, each proposition is equally likely to be true or false, –Then the joint probability distribution can be specified without giving any numbers! All worlds are equally probable! If there are n props, each world will be 1/2 n probable –Probability of any propositional conjunction with m (< n) propositions will be 1/2 m If there are no relations between the propositions (i.e., they can take values independently of each other) –Then the joint probability distribution can be specified in terms of probabilities of each proposition being true –Just n numbers instead of 2 n

Will we always need 2 n numbers? If every pair of variables is independent of each other, then – P(x1,x2…xn)= P(xn)* P(xn-1)*…P(x1) –Need just n numbers! –But if our world is that simple, it would also be very uninteresting & uncontrollable (nothing is correlated with anything else!) We need 2 n numbers if every subset of our n-variables are correlated together –P(x1,x2…xn)= P(xn|x1…xn-1)* P(xn-1|x1…xn-2)*…P(x1) –But that is too pessimistic an assumption on the world If our world is so interconnected we would’ve been dead long back…  A more realistic middle ground is that interactions between variables are contained to regions. --e.g. the “school variables” and the “home variables” interact only loosely (are independent for most practical purposes) -- Will wind up needing O(2 k ) numbers (k << n)

Directly using Joint Distribution Directly using Bayes rule Using Bayes rule With bayes nets Takes O(2 n ) for most natural queries of type P(D|Evidence) NEEDS O(2 n ) probabilities as input Probabilities are of type P(w k )—where w k is a world Can take much less than O(2 n ) time for most natural queries of type P(D|Evidence) STILL NEEDS O(2 n ) probabilities as input Probabilities are of type P(X 1..X n |Y) Can take much less than O(2 n ) time for most natural queries of type P(D|Evidence) Can get by with anywhere between O(n) and O(2 n ) probabilities depending on the conditional independences that hold. Probabilities are of type P(X 1..X n |Y)

Prob. Prop logic: The Game plan We will review elementary “discrete variable” probability We will recall that joint probability distribution is all we need to answer any probabilistic query over a set of discrete variables. We will recognize that the hardest part here is not the cost of inference (which is really only O(2 n ) –no worse than the (deterministic) prop logic –Actually it is Co-#P-complete (instead of Co-NP-Complete) (and the former is believed to be harder than the latter) The real problem is assessing probabilities. – You could need as many as 2 n numbers (if all variables are dependent on all other variables); or just n numbers if each variable is independent of all other variables. Generally, you are likely to need somewhere between these two extremes. –The challenge is to Recognize the “conditional independences” between the variables, and exploit them to get by with as few input probabilities as possible and Use the assessed probabilities to compute the probabilities of the user queries efficiently.

Two ways of specifying world knowledge Extensional Specification (“possible worlds”) –[prop logic] Enumerate all worlds consistent with what you know (models of KB) –[prob logic] Provide likelihood of all worlds given what you know Intensional (implicit) specification –[prop logic] Just state the local propositional constraints that you know (e.g. p=>q which means no world where p is true and q is false is a possible world) –[prop logic] Just state the local probabilistic constraints that you know (e.g. P(q|p) =.99) The local knowledge implicitly defines the extensional specification. Local knowledge acts as a constraint on the possible worlds –As you find out more about the world you live in, you eliminate possible worlds you could be in (or revise their likelihood)

Propositional Probabilistic Logic

If B=>A then P(A|B) = ? P(B|~A) = ? P(B|A) = ?

CONDITIONAL PROBABLITIES Non-monotonicity w.r.t. evidence– P(A|B) can be either higher, lower or equal to P(A)

Most useful probabilistic reasoning involves computing posterior distributions Probability Variable values P(A) P(A|B=T) P(A|B=T;C=False) Important: Computing posterior distribution is inference; not learning

If you know the full joint, You can answer ANY query

& Marginalization

TA~TA CA ~CA P(CA & TA) = P(CA) = P(TA) = P(CA V TA) = P(CA|~TA) =

TA~TA CA ~CA P(CA & TA) = 0.04 P(CA) = = 0.1 (marginalizing over TA) P(TA) = = 0.05 P(CA V TA) = P(CA) + P(TA) – P(CA&TA) = = 0.11 P(CA|~TA) = P(CA&~TA)/P(~TA) = 0.06/( ) =.06/.95=.0631 Think of this as analogous to entailment by truth-table enumeration!

DNF form AVB =>C &D ~(AVB) V (C &D) [~A & ~B] [C &D]

Problem: --Need too many numbers… --The needed numbers are harder to assess You can avoid assessing P(E=e) if you assess P(~Y|E=e) since it must add up to 1

Digression: Is finding numbers the really hard “learning” problem? We are making it sound as if assessing the probabilities is a big deal In doing so, we are taking into account model acquisition/learning costs. How come we didn’t care about these issues in logical reasoning? Is it because acquiring logical knowledge is easy? Actually—if we are writing programs for worlds that we (the humans) already live in, it is easy for us (humans) to add the logical knowledge into the program. It is a pain to give the probabilities.. On the other hand, if the agent is fully autonomous and is bootstrapping itself, then learning logical knowledge is actually harder than learning probabilities.. –For example, we will see that given the bayes network topology (“logic”), learning its CPTs is much easier than learning both topology and CPTs

Happy Spring Break!

3/5  Homework 2 due today  Mid-term on Tuesday after Springbreak  Will cover everything upto (but not including) probabilistic reasoning

Mid-term Syllabus

Reviewing Last Class What is the difficult part of reasoning with uncertainty? –Complexity of reasoning? –Assessing Probabilities? If Joint distribution is enough for answering all queries, what exactly is the role of domain knowledge? What, if any, is the difference between probability and statistics? –Statistics  Model-finding (learning) –Probability  Model-using (inference) –Given your midterm marks, I am interested in finding the model of the generative process that generated those marks. I will go ahead and use the assumption that each of your performance is independent of others in the class (clearly bogus since I taught you all), and another huge bias that the individual distributions are all Gaussian. Then I just need to find the mean and standard deviation for the data to give the full distribution. –Given the distribution, now I can compute random queries the probability of more than 10 people getting more than 50 marks on the test!

Relative ease/utility of Assessing various types of probabilities Joint distribution requires us to assess probabilities of type P(x1,~x2,x3,….~xn) This means we have to look at all entities in the world and see which fraction of them have x1,~x2,x3….~xm true Difficult experiment to setup.. Conditional probabilities of type P(A|B) are relatively easier to assess –You just need to look at the set of entities having B true, and look at the fraction of them that also have A true –Eventually, they too can get baroque P(x1,~x2,…xm|y1..yn) Among the conditional probabilities, causal probabilities of the form P(effect|cause) are better to assess than diagnostic probabilities of the form P(cause|effect) –Causal probabilities tend to me more stable compared to diagnostic probabilities –(for example, a text book in dentistry can publish P(TA|Cavity) and hope that it will hold in a variety of places. In contrast, P(Cavity|TA) may depend on other fortuitous factors—e.g. in areas where people tend to eat a lot of icecream, many tooth aches may be prevalent, and few of them may be actually due to cavities. Doc, Doc, I have flu. Can you tell if I have a runny nose?

A be Anthrax; Rn be Runny Nose P(A|Rn) = P(Rn|A) P(A)/ P(Rn) Get by with easier to assess numbers Generalized bayes rule P(A|B,e) = P(B|A,e) P(A|e) P(B|e) Think of this as analogous to inference rules (like modus-ponens)

Can we avoid assessing P(S)? P(M|S) = P(S|M) P(M)/P(S) P(~M|S) = P(S|~M) P(~M)/P(S) = 1/P(S) [ P(S|M) P(M) + P(S|~M) P(~M) ] So, if we assess P(S|~M), then we don’t need to assess P(S) “Normalization”

Is P(S|~M) any easier to assess than P(~S)? P(S|M) is clearly easy to assess (just look at the fraction of meningitis patients that have stiff neck P(S) seems hard to assess—you need to ask random people whether they have stiff neck or not P(S|~M) seems just as hard to assess… –And in general there seems to be no good argument that it is always easier to assess than P(S) In fact they are related in a quite straightforward way – P(S) =P(S|M)*P(M) + P(S|~M)*P(~M) »(To see this, note that P(S)= P(S&M)+P(S&~M) and then use product rule) The real reason we assess P(S|~M) is that often we need the posterior distribution rather than just the single probability –For boolean variables, you can get the distribution given one value –But for multi-valued variables, we need to assess P(D=di|S) for all values di of the variable D. To do this, we need P(S|D=di) type probabilities anyway…

What happens if there are multiple symptoms…? Patient walked in and complained of toothache You assess P(Cavity|Toothache) Now you try to probe the patients mouth with that steel thingie, and it catches… How do we update our belief in Cavity? P(Cavity|TA, Catch) = P(TA,Catch| Cavity) * P(Cavity) P(TA,Catch) =  P(TA,Catch|Cavity) * P(Cavity) Need to know this! If n evidence variables, We will need 2 n probabilities! Conditional independence To the rescue Suppose P(TA,Catch|cavity) = P(TA|Cavity)*P(Catch|Cavity)

Written as A||B

Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables Y given evidence on the set of variables Z (where X,Y,Z are subsets of the set of all random variables in the domain model) We saw that Bayes Rule computations can exploit conditional independence assertions. Specifically, –X || Y| Z implies P(X & Y|Z) = P(X|Z) * P(Y|Z) P(X|Y, Z) = P(X|Z) P(Y|X,Z) = P(Y|Z) –If A||B|C then P(A,B,C)=P(A|B,C)P(B,C) =P(A|B,C)P(B|C)P(C) =P(A|C)P(B|C)P(C) (Can get by with 1+2+2=5 numbers instead of 8) Why not write down all conditional independence assertions that hold in a domain?

Cond. Indep. Assertions (Contd) Idea: Why not write down all conditional independence assertions (CIA) (X || Y | Z) that hold in a domain? Problem: There can be exponentially many conditional independence assertions that hold in a domain (recall that X, Y and Z are all subsets of the domain variables). Many of them might well be redundant –If X||Y|Z, then X||Y|Z+U for all U Brilliant Idea: May be we should implicitly specify the CIA by writing down the “local dependencies” between variables using a graphical model –A Bayes Network is a way of doing just this. The Bayes Net is a Directed Acyclic Graph whose nodes are random variables, and the immediate dependencies between variables are represented by directed arcs –The topology of a bayes network shows the inter-variable dependencies. Given the topology, there is a way of checking if any Cond. Indep. Assertion. holds in the network (the Bayes Ball algorithm and the D-Sep idea)

CIA implicit in Bayes Nets So, what conditional independence assumptions are implicit in Bayes nets? –Local Markov Assumption: A node N is independent of its non-descendants (including ancestors) given its immediate parents. (So if P are the immediate paretnts of N, and A is a subset of of Ancestors and other non-descendants, then {N} || A| P ) (Equivalently) A node N is independent of all other nodes given its markov blanket (parents, children, children’s parents) –Given this assumption, many other conditional independencies follow. For a full answer, we need to appeal to D-Sep condition and/or Bayes Ball reachability

Topological Semantics Independence from Non-descedants holds Given just the parents Independence from Every node holds Given markov blanket These two conditions are equivalent Many other conditional indepdendence assertions follow from these Markov Blanket Parents; Children; Children’s other parents