10/8. Complexity of Propositional Inference Any sound and complete inference procedure has to be Co-NP- Complete (since model-theoretic entailment computation.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 8. The Resolution Method
Advertisements

Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.
Inference and Reasoning. Basic Idea Given a set of statements, does a new statement logically follow from this. For example If an animal has wings and.
We have seen that we can use Generalized Modus Ponens (GMP) combined with search to see if a fact is entailed from a Knowledge Base. Unfortunately, there.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
For Friday No reading Homework: –Chapter 9, exercise 4 (This is VERY short – do it while you’re running your tests) Make sure you keep variables and constants.
Logic Use mathematical deduction to derive new knowledge.
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
Resolution Theorem Proving
10/10 Mid-term will be on 10/26 –Homework will be due 10/19 Project 2 is due 10/17 –At least one recitation session will be held before midterm –People.
Logic CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Logic.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
F22H1 Logic and Proof Week 7 Clausal Form and Resolution.
Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus Artificial Intelligence Chapter 14. Resolution in the Propositional Calculus.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Lecture of 11/13 Resolution theorem proving (end) Propositional Probabilistic Logic (start) Announcements: 1. Homework 4 socket closed; Due next week 2.
Inference rules Sound (but incomplete) –Modus Ponens A=>B, A |= B –Modus tollens A=>B,~B |= ~A –Abduction (??) A => B,~A |= ~B –Chaining A=>B,B=>C |= A=>C.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
11/18 Everything is fine.. Everything is fine… Everything is fine…
Knowledge Representation I (Propositional Logic) CSE 473.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Methods of Proof Chapter 7, second half.
Knoweldge Representation & Reasoning
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Prop logic First order predicate logic (FOPC) Prob. Prop. logic Objects, relations Degree of belief First order Prob. logic Objects, relations.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Probabilistic Propositional Logic Nov 6 th. Need for modeling uncertainity Consider a simple scenario: You know that rain makes grass wet. Sprinklers.
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
Propositional Logic Reasoning correctly computationally Chapter 7 or 8.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Proof Systems KB |- Q iff there is a sequence of wffs D1,..., Dn such that Dn is Q and for each Di in the sequence: a) either Di is in KB or b) Di can.
Propositional Resolution Computational LogicLecture 4 Michael Genesereth Spring 2005.
Python logic Tell me what you do with witches? Burn And what do you burn apart from witches? More witches! Shh! Wood! So, why do witches burn? [pause]
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CHAPTERS 7, 8 Oliver Schulte Logical Inference: Through Proof to Truth.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Logical Inference 2 rule based reasoning
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
Logical Agents Chapter 7. Knowledge bases Knowledge base (KB): set of sentences in a formal language Inference: deriving new sentences from the KB. E.g.:
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
Propositional Logic or how to reason correctly Chapter 8 (new edition) Chapter 7 (old edition)
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Knowledge Repn. & Reasoning Lecture #9: Propositional Logic UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Resolution in the Propositional Calculus
Logical Inference 2 Rule-based reasoning
Probabilistic Reasoning; Network-based reasoning
CS 188: Artificial Intelligence Fall 2008
Methods of Proof Chapter 7, second half.
Presentation transcript:

10/8

Complexity of Propositional Inference Any sound and complete inference procedure has to be Co-NP- Complete (since model-theoretic entailment computation is Co-NP- Complete (since model-theoretic satisfiability is NP-complete)) Given a propositional database of size d –Any sentence S that follows from the database by modus ponens can be derived in linear time If the database has only HORN sentences (sentences whose CNF form has at most one +ve clause; e.g. A & B => C), then MP is complete for that database. –PROLOG uses (first order) horn sentences –Deriving all sentences that follow by resolution is Co-NP- Complete (exponential) Anything that follows by unit-resolution can be derived in linear time. –Unit resolution: At least one of the clauses should be a clause of length 1 (used in Davis-Putnam)

Search in Resolution Convert the database into clausal form D c Negate the goal first, and then convert it into clausal form D G Let D = D c + D G Loop –Select a pair of Clauses C1 and C2 from D Different control strategies can be used to select C1 and C2 to reduce number of resolutions tries –Idea 1: Set of Support: Either C1 or C2 must be either the goal clause or a clause derived by doing resolutions on the goal clause (*COMPLETE*) –Idea 2: Linear input form: Either C1 or C2 must be one of the clauses in the input KB (*INCOMPLETE*) –Idea 3: Linear resolution: Pick the most recent resolvent as one of the pair of clauses (so resolution tree looks like a “line”). (*INCOMPLETE* in general, but COMPLETE for HORN caluses) –Resolve C1 and C2 to get C12 –If C12 is empty clause, QED!! Return Success (We proved the theorem; ) –D = D + C12 –End loop If we come here, we couldn’t get empty clause. Return “Failure” –Finiteness is guaranteed if we make sure that: we never resolve the same pair of clauses more than once; AND we use factoring, which removes multiple copies of literals from a clause (e.g. QVPVP => QVP)

Mad chase for the empty clause… You must have everything in CNF clauses before you can resolve –Goal must be negated first before it is converted into CNF form Goal (the fact to be proved) may become converted to multiple clauses (e.g. if we want to prove P V Q, then we get two clauses ~P ; ~Q to add to the database Resolution works by resolving away a single literal and its negation –PVQ resolved with ~P V ~Q is not empty! In fact, these clauses are not inconsistent (P true and Q false will make sure that both clauses are satisfied) –PVQ is negation of ~P & ~Q. The latter will become two separate clauses--~P, ~Q. So, by doing two separate resolutions with these two clauses we can derive empty clause

Solving problems using propositional logic Need to write what you know as propositional formulas Theorem proving will then tell you whether a given new sentence will hold given what you know Three kinds of queries –Is my knowledge base consistent? (i.e. is there at least one world where everything I know is true?) Satisfiability –Is the sentence S entailed by my knowledge base? (i.e., is it true in every world where my knowledge base is true?) –Is the sentence S consistent/possibly true with my knowledge base? (i.e., is S true in at least one of the worlds where my knowledge base holds?) S is consistent if ~S is not entailed But cannot differentiate between degrees of likelihood among possible sentences

Example Pearl lives in Los Angeles. It is a high-crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth-quake prone. Alarm goes off when there is an earth- quake. Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B

Example (Real) Pearl lives in Los Angeles. It is a high- crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth- quake prone. Alarm goes off when there is an earth-quake. Pearl lives in real world where (1) burglars can sometimes disable alarms (2) some earthquakes may be too slight to cause alarm (3) Even in Los Angeles, Burglaries are more likely than Earth Quakes (4) John and Mary both have their own lives and may not always call when the alarm goes off (5) Between John and Mary, John is more of a slacker than Mary.(6) John and Mary may call even without alarm going off Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B John already called. If Mary also calls, is it more likely that Burglary occurred? You now also hear on the TV that there was an earthquake. Is Burglary more or less likely now?

Example (Real) Pearl lives in Los Angeles. It is a high- crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth- quake prone. Alarm goes off when there is an earth-quake. Pearl lives in real world where (1) burglars can sometimes disable alarms (2) some earthquakes may be too slight to cause alarm (3) Even in Los Angeles, Burglaries are more likely than Earth Quakes (4) John and Mary both have their own lives and may not always call when the alarm goes off (5) Between John and Mary, John is more of a slacker than Mary.(6) John and Mary may call even without alarm going off Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls If there is a burglary, will Mary call? Check KB & E |= M If Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B John already called. If Mary also calls, is it more likely that Burglary occurred? You now also hear on the TV that there was an earthquake. Is Burglary more or less likely now?

How do we handle Real Pearl? Eager way: – Model everything! –E.g. Model exactly the conditions under which John will call He shouldn’t be listening to loud music, he hasn’t gone on an errand, he didn’t recently have a tiff with Pearl etc etc. A & c1 & c2 & c3 &..cn => J (also the exceptions may have interactions c1&c5 => ~c9 ) Ignorant (non-omniscient) and Lazy (non- omnipotent) way: –Model the likelihood –In 85% of the worlds where there was an alarm, John will actually call –How do we do this? Non-monotonic logics “certainty factors” “probability” theory? Qualification and Ramification problems make this an infeasible enterprise

Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you) So, Joint Probability Distribution is all that you ever need! In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers) --In general 2 n separate numbers (which should add up to 1) If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with? --Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2 n numbers ---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take! Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls Only 10 (instead of 32) numbers to specify!

10/10

Blog discussion Number of clauses with n-variables (3 n --a variable can be present positively in a clause, negatively in a clause or not present in a clause—so 3 possibilities per variable) –Number of KBs with n variables ( 2 (3^n) ) –But number of sets of worlds is only 2 (2^n) –..more KBs than there are sets of worlds! –……because multiple syntactically different KBs can refer to the same set of worlds (e.g. the KB ={(P,Q); (P, -Q)} is equivalent to the KB {(P,R),(P,-R)} Non-monotonicity of probabilistic reasoning Parameterized distributions Probability vs. Statistics –Is Inference/Reasoning (with the model) vs. Learning/acquiring (getting the model) Computing the posterior distribution given the current model and evidence is not learning…

Easy Special Cases If in addition, each proposition is equally likely to be true or false, –Then the joint probability distribution can be specified without giving any numbers! All worlds are equally probable! If there are n props, each world will be 1/2 n probable –Probability of any propositional conjunction with m (< n) propositions will be 1/2 m If there are no relations between the propositions (i.e., they can take values independently of each other) –Then the joint probability distribution can be specified in terms of probabilities of each proposition being true –Just n numbers instead of 2 n

Will we always need 2 n numbers? If every pair of variables is independent of each other, then – P(x1,x2…xn)= P(xn)* P(xn-1)*…P(x1) –Need just n numbers! –But if our world is that simple, it would also be very uninteresting (nothing is correlated with anything else!) We need 2 n numbers if every subset of our n-variables are correlated together –P(x1,x2…xn)= P(xn|x1…xn-1)* P(xn-1|x1…xn-2)*…P(x1) –But that is too pessimistic an assumption on the world If our world is so interconnected we would’ve been dead long back…  A more realistic middle ground is that interactions between variables are contained to regions. --e.g. the “school variables” and the “home variables” interact only loosely (are independent for most practical purposes) -- Will wind up needing O(2 k ) numbers (k << n)

Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you) So, Joint Probability Distribution is all that you ever need! In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers) --In general 2 n separate numbers (which should add up to 1) If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with? --Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2 n numbers ---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take! Burglary => Alarm Earth-Quake => Alarm Alarm => John-calls Alarm => Mary-calls Only 10 (instead of 32) numbers to specify!

Prob. Prop logic: The Game plan We will review elementary “discrete variable” probability We will recall that joint probability distribution is all we need to answer any probabilistic query over a set of discrete variables. We will recognize that the hardest part here is not the cost of inference (which is really only O(2 n ) –no worse than the (deterministic) prop logic –Actually it is Co-#P-complete (instead of Co-NP-Complete) (and the former is believed to be harder than the latter) The real problem is assessing probabilities. – You could need as many as 2 n numbers (if all variables are dependent on all other variables); or just n numbers if each variable is independent of all other variables. Generally, you are likely to need somewhere between these two extremes. –The challenge is to Recognize the “conditional independences” between the variables, and exploit them to get by with as few input probabilities as possible and Use the assessed probabilities to compute the probabilities of the user queries efficiently.

Directly using Joint Distribution Directly using Bayes rule Using Bayes rule With bayes nets Takes O(2 n ) for most natural queries of type P(D|Evidence) NEEDS O(2 n ) probabilities as input Probabilities are of type P(w k )—where w k is a world Can take much less than O(2 n ) time for most natural queries of type P(D|Evidence) STILL NEEDS O(2 n ) probabilities as input Probabilities are of type P(X 1..X n |Y) Can take much less than O(2 n ) time for most natural queries of type P(D|Evidence) Can get by with anywhere between O(n) and O(2 n ) probabilities depending on the conditional independences that hold. Probabilities are of type P(X 1..X n |Y)

Blog questions (…if the mountain wont’t come to Mohammad…) 1. We saw that propositional logic is monotonic and that real world requried "defeasible" or "non-monotonic" reasoning. Is probabilistic reasoning monotonic or non-monotonic? Explain. 2. What is the difference between "Probability" and "Statistics"? 3. We made a big point about the need for representing joint distribution compactly. Much of elementary probability/statistics handles continuous and multi-valued variables, where specifying the distribution of the single variable itself will need a huge number of numbers. How is this normally punted in elementary probability?

Two ways of specifying world knowledge Extensional Specification (“possible worlds”) –[prop logic] Enumerate all worlds consistent with what you know (models of KB) –[prob logic] Provide likelihood of all worlds given what you know Intensional (implicit) specification –[prop logic] Just state the local propositional constraints that you know (e.g. p=>q which means no world where p is true and q is false is a possible world) –[prop logic] Just state the local probabilistic constraints that you know (e.g. P(q|p) =.99) The local knowledge implicitly defines the extensional specification. Local knowledge acts as a constraint on the possible worlds –As you find out more about the world you live in, you eliminate possible worlds you could be in (or revise their likelihood)

Propositional Probabilistic Logic

CONDITIONAL PROBABLITIES Non-monotonicity w.r.t. evidence– P(A|B) can be either higher, lower or equal to P(A)

Most useful probabilistic reasoning involves computing posterior distributions Probability Variable values P(A) P(A|B=T) P(A|B=T;C=False) Important: Computing posterior distribution is inference; not learning

If B=>A then P(A|B) = ? P(B|~A) = ? P(B|A) = ?

If you know the full joint, You can answer ANY query

& Marginalization

TA~TA CA ~CA P(CA & TA) = P(CA) = P(TA) = P(CA V TA) = P(CA|~TA) =

TA~TA CA ~CA P(CA & TA) = 0.04 P(CA) = = 0.1 (marginalizing over TA) P(TA) = = 0.05 P(CA V TA) = P(CA) + P(TA) – P(CA&TA) = = 0.11 P(CA|~TA) = P(CA&~TA)/P(~TA) = 0.06/( ) =.06/.95=.0631 Think of this as analogous to entailment by truth-table enumeration!

DNF form AVB =>C &D ~(AVB) V (C &D) [~A & ~B] [C &D]

Problem: --Need too many numbers… --The needed numbers are harder to assess You can avoid assessing P(E=e) if you assess P(Y|E=e) since it must add up to 1

Digression: Is finding numbers the really hard assessement problem? We are making it sound as if assessing the probabilities is a big deal In doing so, we are taking into account model acquisition/learning costs. How come we didn’t care about these issues in logical reasoning? Is it because acquiring logical knowledge is easy? Actually—if we are writing programs for worlds that we (the humans) already live in, it is easy for us (humans) to add the logical knowledge into the program. It is a pain to give the probabilities.. On the other hand, if the agent is fully autonomous and is bootstrapping itself, then learning logical knowledge is actually harder than learning probabilities.. –For example, we will see that given the bayes network topology (“logic”), learning its CPTs is much easier than learning both topology and CPTs