Expert System Seyed Hashem Davarpanah Davarpanah@usc.ac.ir University of Science and Culture
Inexact Reasoning References: Jackson, Chapter 19, Truth Maintenance Systems Giarratano and Riley, Chapters 4 and 5 Luger and Stubblefield 'Artificial Intelligence', Addison-Wesley, 2002, Chapter 7
Knowledge & Inexact Reasoning inexact knowledge (truth of not clear) incomplete knowledge (lack of knowledge about ) defaults, beliefs (assumption about truth of ) contradictory knowledge ( true and false) vague knowledge (truth of not 0/1)
Inexact Reasoning CF Theory - uncertainty Fuzzy - vagueness uncertainty about facts and conclusions Fuzzy - vagueness truth not 0 or 1 but graded (membership fct.) Truth Maintenance - beliefs, defaults assumptions about facts, can be revised Probability Theory - likelihood of events statistical model of knowledge
Inexact Reasoning not necessary ... NOT necessary when assuming: complete knowledge about the "world" no contradictory facts or rules everything is either true or false This corresponds formally to a complete consistent theory in First-Order Logic, i.e. everything you have to model is contained in the theory, i.e. your theory or domain model is complete facts are true or false (assuming your rules are true) your sets of facts and rules contain no contradiction (are consistent)
Exact Reasoning: Theories in First-Order Predicate Logic Theory (Knowledge Base) given as a set of well-formed formulae. Formulae include facts like mother (Mary, Peter) and rules like mother (x, y) child (y, x) Reasoning based on applying rules of inference of first-order predicate logic, like Modus Ponens: If p and pq given then q can be inferred (proven) p, pq q
Forms of Inexact Knowledge uncertainty (truth not clear) probabilistic models, multi-valued logic (true, false, don't know,...), certainty factor theory incomplete knowledge (lack of knowledge) P true or false not known ( defaults) defaults, beliefs (assumptions about truth) assume P is true, as long as there is no counter-evidence (i.e. that ¬P is true) assume P is true with Certainty Factor contradictory knowledge (true and false) inconsistent fact base; somehow P and ¬P true vague knowledge (truth value not 0/1; not crisp sets) graded truth; fuzzy sets
Inexact Knowledge - Example Person A walks on Campus towards the bus stop. A few hundred yards away A sees someone and is quite sure that it's his next-door neighbor B who usually goes by car to the University. A screams B's name. Q: Which forms of inexact knowledge and reasoning are involved here? default - A wants to take a bus belief, (un)certainty - it's the neighbor B probability, default, uncertainty - the neighbor goes home by car default - A wants to get a lift default - A wants to go home
Examples of Inexact Knowledge Person A walks on Campus towards the bus stop. A few hundred yards away A sees someone and is quite sure that it's his next-door neighbor B who usually goes by car to the University. A screams B's name. Fuzzy - a few hundred yards define a mapping from "#hundreds" to 'few', 'many', ... not uncertain or incomplete but graded, vague Probabilistic - the neighbor usually goes by car probability based on measure of how often he takes car; calculates always p(F) = 1 - p(¬F) Belief - it's his next-door neighbor B "reasoned assumption", assumed to be true Default - A wants to take a bus assumption based on commonsense knowledge
Dealing with Inexact Knowledge Methods for representing and handling: incomplete knowledge: defaults, beliefs Truth Maintenance Systems (TMS); non-monotonic reasoning contradictory knowledge: contradictory facts or different conclusions, based on defaults or beliefs TMS, Certainty Factors, ... , multi-valued logics uncertain knowledge: hypotheses, statistics Certainty Factors, Probability Theory vague knowledge: "graded" truth Fuzzy, rough sets inexact knowledge and reasoning involves 1-4; clear 0/1 truth value cannot be assigned
Truth Maintenance Systems
Truth Maintenance Necessary when changes in the fact-base lead to inconsistency / incorrectness among the facts non-monotonic reasoning A Truth Maintenance System tries to adjust the Knowledge Base or Fact Base upon changes to keep it consistent and correct. A TMS uses dependencies among facts to keep track of conclusions and allow revision / retraction of facts and conclusions.
Non-monotonic Reasoning The set of currently valid (believed) facts does NOT increase monotonically. Adding a new fact might lead to an inconsistency which requires the removal of one of the contradictory facts. Thus, the set of true (or: believed as true) facts can shrink and grow with reasoning. This is why it’s called “non-monotonic reasoning”. In classical logic (first-order predicate logic) this does not happen. Once a fact is asserted, it’s forever true.
Non-monotonic Reasoning - Example Example: non-monotonic reasoning Your are a student, it's 8am , you are in bed. You slip out of your dreams and think: Today is Sunday. No classes today. l don't have to get up. You go back to sleep. You wake up again. It's 9:30am now and it is slowly coming to your mind: Today is Tuesday. What an unpleasant surprise. P1 = today-is-Tuesday P2 = today-is-Sunday P3 = have-class-at-10am P4 = no-classes P5 = have-to-get-up P6 = can-stay-in-bed
Non-monotonic Reasoning - Example P1 = today-is-Tuesday P2 = today-is-Sunday P3 = have-class-at-10am P4 = no-classes P5 = have-to-get-up P6 = can-stay-in-bed P1 P3 P5 P2 P4 P6 Assume: P1 and P2, P3 and P4, P5 and P6 are mutually exclusive, i.e. P1 P2, P3 P4, P5 P6 assume P2; conclude P1 ; P4 ; P3 ; P6 ; P5 assume P1; conclude P2 ; P3 ; P4 ; P5 ; P6
Truth Maintenance Theories TMS are often based on dependency-directed backtracking to the point in reasoning where a wrong assumption was used. McAllester (1978,1980) “propositional constraint propagation” employs a dependency network which reflects the justification of conclusions of new facts Doyle (1979) justification based Truth Maintenance System
Truth Maintenance Theories - McAllester McAllester “propositional constraint propagation” network representing conclusions, where proposition-nodes are connected if one of the nodes is a reason for concluding the other node. Example: pq (pq) If p is known to be true, q can be concluded. Connections from p and pq to q mean that p and pq are reasons to conclude p.
Truth Maintenance Theories - McAllester proposition-nodes are connected if one of the nodes is a reason for concluding the other node (simplified version). Example: Connections from p and pq to combination and then to q represent justification for q p q p q p q p
Truth Maintenance Theories - Doyle deals with beliefs as justified assumptions. As long as there is no contra-evidence for a fact (belief) we can assume that it is true. INp facts which support P; OUTp facts which prevent P. Distinguishes: Premises - always true (INp = OUTp = ) Deductions - derived (INp ; OUTp = ) Assumptions – depends (INp = ; OUTp )
Truth Maintenance Theories - Doyle As long as there is no contra-evidence for a fact (belief) we can assume that it is true. Theory is based on the concept of Support-Lists (SL). A Support-List of a Fact (Belief) P specifies Facts (Beliefs) which support the conclusion of the Fact P or prevent its conclusion. The TMS maintains and updates the set of current Facts/Beliefs if changes occur. Uses justification networks, similar to McAllester’s dependency networks.
Truth Maintenance in CLIPS 1 logical logical connection between condition- and action-part of a rule if logical-part of condition is not true anymore, consequence-fact in action-part is retracted When fire-present is true, alarm-on can be concluded. When fire-present is retracted, alarm-on will also be retracted. (defrule fire-reaction (logical (fire-present)) => (assert (alarm-on)))
Truth Maintenance in CLIPS 2 Dependencies (dependents <fact-index>) prints all current facts which depend on the indexed fact (are concluded from that fact) (dependencies <fact-index>) prints all current facts on which the indexed fact depends (from which the indexed fact can was concluded) dependents of fire-present dependencies of fire-present alarm-on none
Certainty Factor Theory
Certainty Factor Theory Certainty Factor CF of Hypothesis H ranges between -1 (denial of H) and +1 (confirmation of H) allows the ranking of hypotheses Based on measures of belief MB and disbelief MD MB is expressing the belief that H is true MD is expressing the belief that H is not true MB is not 1-MD - it’s not like probabilities Experts determine values for MB, MD of H based on given evidence E subjective
Stanford Certainty Factor Theory Certainty Factor CF of Hypothesis H is based on difference between Measure of Belief MB and Measure of Disbelief MD in hypothesis H, given evidence E. Certainty Factor of hypothesis H given evidence E: CF (H|E) = MB(H|E) – MD(H|E) -1 CF(H) 1 Can integrate different experts’ assessments. Basis to combine support/rejection for H within one rule and using different rules.
Stanford Certainty Factor Theory Remember the base rule for Certainty Factor CF (H|E) : CF (H|E) = MB(H|E) – MD(H|E) -1 CF(H) 1 Integrate Certainty Factors into reasoning. CF-value for H calculated using CFs of premises P in rule CF(H) = CF(P1 and P2) = min (CF(P1),CF(P2)) CF(H) = CF(P1 or P2) = max (CF(P1),CF(P2)) CF-value for H combined from different rules, experts, ... CF(H) = CF1 + CF2 – CF1∙ CF2 if both CF1,CF2 > 0 CF(H) = CF1 + CF2 + CF1∙ CF2 if both CF1,CF2 0 CF(H) = CF1 + CF2 else 1 – min ( |CF1|,|CF2| )
Characteristics of Certainty Factors Aspect (Believed) Probability MB MD CF Certainly true P(H|E) = 1 1 Certainly false P(H|E) = 1 -1 No evidence P(H|E) = P(H) Ranges measure of belief 0 ≤ MB ≤ 1 measure of disbelief 0 ≤ MD ≤ 1 certainty factor -1 ≤ CF ≤ +1
Probability Theory
Basics of Probability Theory mathematical approach to process uncertain information sample space (event) set: S = {x1, x2, …, xn} collection of all possible events probability p(xi) is likelihood that the event xiS occurs non-negative values in [0,1] total probability of the sample space is 1, p(xi , xiS) = 1 experimental probability based on the frequency of events subjective probability (CF Theories, like Dempster-Shafer, ...) based on expert assessment
Compound Probabilities for independent events do not affect each other in any way example: cards and events “hearts” and “queen” joint probability of independent events A and B P(A B) = |A B| / |S| = P(A) * P(B) where |S| is the number of elements in S union probability of independent events A and B P(A B) = P(A) + P(B) - P(A B) = P(A) + P(B) - P(A) * P (B) Situation in which either event occurs. Subtract probability of their accidental co-occurrence - P(A B) is already included in P(A)+P(B) and would otherwise be counted twice.
Compound Probabilities For mutually exclusive events can not occur together at the same time Examples: one dice and events “1” and “6”; one coin and events “heads” and “tail” joint probability of two different events A and B P(A B) = 0 Throw dice and show both “1” and “6” cannot happen. union probability of two events A and B P(A B) = P(A) + P(B) Throw coin and show either “heads” or “tail”. This is also called “special addition”.
Conditional Probabilities describes dependent events affect each other in some way Example: Throw dice twice; second throw has to give larger value than first throw. conditional probability of event A given that event B has already occurred P(A|B) = P(A B) / P(B) example: B = throw(x); A = throw(y>x) See next slide.
Conditional Probabilities Example: B = throw(x); A = throw(y>x) P(A|B) = P(throw x and then throw y with y>x) P(A|B) = P(A B) / P(B) P(A B) = P(throw x) P(throw y, y>x) = 1/6 (1/6 (6-x)) If x=5 then P(AB) = 1/6 1/6 (6-5) = 1/36 If x=1 then P(AB) = 1/6 1/6 5 = 5/36 P(B) = P(throw x) = 1/6 If x=1 then P(A|B) = 5/36*6 = 5/6 0.8... If x=5 then P(A|B) = 5/36*1 = 5/36 0.14
Bayesian Approaches derive the probability of a cause given a symptom has gained importance recently due to advances in efficiency more computational power available better methods especially useful in diagnostic systems medicine, computer help systems inverse or a posteriori probability inverse to conditional probability of an earlier event given that a later one occurred
Bayes’ Rule for Single Event single hypothesis H, single event E P(H | E) = (P(E | H) * P(H)) / P(E) or P(H | E) = (P(E | H) * P(H) / (P(E | H) * P(H) + P(E | H) * P(H) )
Example
Fred and the Cookie Bowls Suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1? From: http://en.wikipedia.org/wiki/Bayes'_theorem
The Cookie Bowl Problem “What’s the probability that Fred picked bowl #1, given that he has a plain cookie?” Event A is that Fred picked bowl #1. Event B is that Fred picked a plain cookie. Compute P(A|B). We need: P(A) - the probability that Fred picked bowl #1 regardless of any other information. Since Fred is treating both bowls equally, it is 0.5. P(B) is the probability of getting a plain cookie regardless of any information on the bowls. It is computed as the sum of the probability of getting a plain cookie from a bowl multiplied by the probability of selecting this bowl. We know that the probability of getting a plain cookie from bowl #1 is 0.75, and the probability of getting one from bowl #2 is 0.5. Since Fred is treating both bowls equally the probability of selecting any one of the bowls is 0.5 (see next slide). Thus, the probability of getting a plain cookie overall is 0.75×0.5 + 0.5×0.5 = 0.625. P(B|A) is the probability of getting a plain cookie given that Fred has selected bowl #1. From the problem statement, we know this is 0.75, since 30 out of 40 cookies in bowl #1 are plain.
The Cookie Bowls Number of cookies in each bowl by type of cookie Relative frequency of cookies in each bowl by type of cookie Bowl #1 Bowl #2 Totals Chocolate 10 20 30 Plain 50 Total 40 80 Bowl #1 Bowl #2 Totals Chocolate 0.125 0.250 0.375 Plain 0.625 Total 0.500 1.000 The table on the right is derived from the table on the left by dividing each entry by the total
Fred and the Cookie Bowl Given all this information, we can compute the probability of Fred having selected bowl #1 (event A) given that he got a plain cookie (event B), as such: As we expected, it is more than half. http://en.wikipedia.org/wiki/Bayes'_theorem
Fuzzy Set Theory
Fuzzy Set Theory (Zadeh) Aimed to model and formalize "vague" Natural Language terms and expressions. Example: Peter is relatively tall. Define a set of fuzzy sets (predicates or categories), like tall, small. Each fuzzy subset has an associated membership function mapping (exact) domain values into a (graded) membership value. tall would be one fuzzy subset defined by such a function which takes the height (e.g. in inches) as input, and determines a fuzzy membership-value (between 0 and 1) for tall and small as output.
Fuzzy Set Membership Function If Peter is 6' high, and the fuzzy membership value of tall for 6' is 0.9, then Peter is quite tall.
Review Inexact Reasoning uncertain reasoning – uncertainty about facts and/or rules – CF Theory vagueness – truth not 0 or 1 - Fuzzy sets and Fuzzy logic beliefs, defaults – assumptions about truth, can be revised – non-monotonic reasoning, Truth Maintenance System likelihood of event – statistical model of knowledge - Probability Theory
Other Forms of Representing and Reasoning with Inexact Knowledge Logics Explicit modeling of Belief- and Knows-Operators in Modal Logic or Autoepistemic Logic. Probabilistic Reasoning Bayes’ Theory Dempster-Shafer Theory