For Friday Read Homework: –Chapter 10, exercise 22 I strongly encourage you to tackle this together. You may work in groups of up to 4 people.
Program 2 Any questions?
Homework
Special rules for handling belief: –If I believe something, I believe that I believe it. –Need to still provide a way to indicate that two names refer to the same thing.
Knowledge and Belief How are they related? Knowing whether something is true Knowing what
And Besides Logic? Semantic networks Frames
Semantic Networks Use graphs to represent concepts and the relations between them. Simplest networks are ISA hierarchies Must be careful to make a type/token distinction: Garfield isa Cat Cat(Garfield) Cat isa Feline x (Cat (x) Feline(x)) Restricted shorthand for a logical representation.
Semantic Nets/Frames Labeled links can represent arbitrary relations between objects and/or concepts. Nodes with links can also be viewed as frames with slots that point to other objects and/or concepts.
First Order Representation Rel(Alive,Animals,T) Rel(Flies,Animals,F) Birds Animals Mammals Animals Rel(Flies,Birds,T) Rel(Legs,Birds,2) Rel(Legs,Mammals,4) Penguins Birds Cats Mammals Bats Mammals Rel(Flies,Penguins,F) Rel(Legs,Bats,2) Rel(Flies,Bats,T) Opus Penguins Bill Cats Pat Bats Name(Opus,"Opus") Name(Bill,"Bill") Friend(Opus,Bill) Friend(Bill,Opus) Name(Pat,"Pat")
Inheritance Inheritance is a specific type of inference that allows properties of objects to be inferred from properties of categories to which the object belongs. –Is Bill alive? –Yes, since Bill is a cat, cats are mammals, mammals are animals, and animals are alive. Such inference can be performed by a simple graph traversal algorithm and implemented very efficiently. However, it is basically a form of logical inference x (Cat(x) Mammal(x)) x (Mammal(x) Animal(x)) x (Animal(x) Alive(x)) Cat(Bill) |- Alive(Bill)
Backward or Forward Can work either way Either can be inefficient Usually depends on branching factors
Semantic of Links Must be careful to distinguish different types of links. Links between tokens and tokens are different than links between types and types and links between tokens and types.
Link Types
Inheritance with Exceptions Information specified for a type gives the default value for a relation, but this may be overridden by a more specific type. –Tweety is a bird. Does Tweety fly? Birds fly. Yes. –Opus is a penguin. Does Opus fly? Penguins don't fly. No.
Multiple Inheritance If hierarchy is not a tree but a directed acyclic graph (DAG) then different inheritance paths may result in different defaults being inherited. Nixon Diamond
Nonmonotonicity In normal monotonic logic, adding more sentences to a KB only entails more conclusions. if KB |- P then KB {S} |- P Inheritance with exceptions is not monotonic (it is nonmonotonic) –Bird(Opus) –Fly(Opus)? yes –Penguin(Opus) –Fly(Opus)? no
Nonmonotonic logics attempt to formalize default reasoning by allow default rules of the form: –If P and concluding Q is consistent, then conclude Q. –If Bird(X) then if consistent Fly(x)
Defaults with Negation as Failure Prolog negation as failure can be used to implement default inference. fly(X) : bird(X), not(ab(X)). ab(X) : penguin(X). ab(X) : ostrich(X). bird(opus). ? fly(opus). Yes penguin(opus). ? fly(opus). No
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly true or strictly false We need to account for this uncertainty and the need to weigh and combine conflicting evidence.
Coping with Uncertainty Straightforward application of probability theory is impractical since the large number of conditional probabilities required are rarely, if ever, available. Therefore, early expert systems employed fairly ad hoc methods for reasoning under uncertainty and for combining evidence. Recently, methods more rigorously founded in probability theory that attempt to decrease the amount of conditional probabilities required have flourished.
Probability Probabilities are real numbers 01 representing the a priori likelihood that a proposition is true. P(Cold) = 0.1 P(¬Cold) = 0.9 Probabilities can also be assigned to all values of a random variable (continuous or discrete) with a specific range of values (domain), e.g. low, normal, high. P(temperature=normal)=0.99 P(temperature=98.6) = 0.99
Probability Vectors The vector form gives probabilities for all values of a discrete variable, or its probability distribution. P(temperature) = This indicates the prior probability, in which no information is known.
Conditional Probability Conditional probability specifies the probability given that the values of some other random variables are known. P(Sneeze | Cold) = 0.8 P(Cold | Sneeze) = 0.6 The probability of a sneeze given a cold is 80%. The probability of a cold given a sneeze is 60%.
Cond. Probability cont. Assumes that the given information is all that is known, so all known information must be given. P(Sneeze | Cold Allergy) = 0.95 Also allows for conditional distributions P(X |Y) gives 2D array of values for all P(X=x i |Y=y j ) Defined as P (A | B) = P (A B) P(B)
Axioms of Probability Theory All probabilities are between 0 and 1. 0 P(A) 1 Necessarily true propositions have probability 1, necessarily false have probability 0. P(true) = 1 P(false) = 0 The probability of a disjunction is given by P(A B) = P(A) + P(B) - P(A B)
Joint Probability Distribution The joint probability distribution for a set of random variables X 1 …X n gives the probability of every combination of values (an ndimensional array with vn values if each variable has v values) P(X 1,...,X n ) Sneeze ¬Sneeze Cold ¬Cold The probability of all possible cases (assignments of values to some subset of variables) can be calculated by summing the appropriate subset of values from the joint distribution. All conditional probabilities can therefore also be calculated
Bayes Theorem P(H | e) = P(e | H) P(H) P(e) Follows from definition of conditional probability: P (A | B) = P (A B) P(B)
Other Basic Theorems If events A and B are independent then: P(A B) = P(A)P(B) If events A and B are incompatible then: P(A B) = P(A) + P(B)
Simple Bayesian Reasoning If we assume there are n possible disjoint diagnoses, d 1 … d n P(d i | e) = P(e | d i ) P(d i ) P(e) P(e) may not be known but the total probability of all diagnoses must always be 1, so all must sum to 1 Thus, we can determine the most probable without knowing P(e).
Efficiency This method requires that for each disease the probability it will cause any possible combination of symptoms and the number of possible symptom sets, e, is exponential in the number of basic symptoms. This huge amount of data is usually not available.
Bayesian Reasoning with Independence (“Naïve” Bayes) If we assume that each piece of evidence (symptom) is independent given the diagnosis (conditional independence), then given evidence e as a sequence {e 1,e 2,…,e d } of observations, P(e | d i ) is the product of the probabilities of the observations given d i. The conditional probability of each individual symptom for each possible diagnosis can then be computed from a set of data or estimated by the expert. However, symptoms are usually not independent and frequently correlate, in which case the assumptions of this simple model are violated and it is not guaranteed to give reasonable results.
Bayes Independence Example Imagine there are diagnoses ALLERGY, COLD, and WELL and symptoms SNEEZE, COUGH, and FEVER Prob Well Cold Allergy P(d) P(sneeze|d) P(cough | d) P(fever | d)
If symptoms sneeze & cough & no fever: P(well | e) = (0.9)(0.1)(0.1)(0.99)/P(e) = /P(e) P(cold | e) = (.05)(0.9)(0.8)(0.3)/P(e) = 0.01/P(e) P(allergy | e) = (.05)(0.9)(0.7)(0.6)/P(e) = 0.019/P(e) Diagnosis: allergy P(e) = =.0379 P(well | e) =.23 P(cold | e) =.26 P(allergy | e) =.50
Problems with Probabilistic Reasoning If no assumptions of independence are made, then an exponential number of parameters is needed for sound probabilistic reasoning. There is almost never enough data or patience to reliably estimate so many very specific parameters. If a blanket assumption of conditional independence is made, efficient probabilistic reasoning is possible, but such a strong assumption is rarely warranted.