Download presentation
Presentation is loading. Please wait.
1
Artificial Intelligence and Lisp #5 Causal Nets (continued) Learning in Decision Trees and Causal Nets Lab Assignment 3
2
Causal Nets A causal net consists of: A set of independent terms A partially ordered set of dependent terms An assignment of a dependency expression to each dependent term. (These may be decision trees) The dependency expression for a term may use independent terms, and also dependent terms that are lower than the term at hand. This means the dependency graph is not cyclic
3
Example of Causal Net Battery charged Headlights on Car moves Starting motor runs Gas in tank Main engine runs Key turnedFuse OK Clutch engaged
4
Causal Nets II A causal net is an acyclic graph where each node (called a term) represents some condition in the world (i.e., a feature) and each link indicates a dependency relationship between two terms Terms in a causal net that do not have any predecessor are called independent terms A dependence specification for a causal net is an assignment, to each dependent term, of an expression using its immediate predecessors A causal net is exhaustive iff all actual dependencies are represented by links in it.
5
Dependence specification for one of the terms using discrete term values Headlights-on = [fuse-ok? [battery-charged? true false] [battery-charged? false false]] [fuse-ok? [battery-charged? true false] false ] [ ? true false false false :range > ]
6
Observations on previous slide A decision tree, where the same term is used throughout on each level, may become unnecessarily large A decision tree can always be converted to an equivalent tree of depth 1 by introducing sequences of terms, and corresponding sequences of values for the terms
7
Main topic for first part of lecture Given: An exhaustive causal net equipped with dependence specifications that may also use probabilities A specification of a priori probabilities for each one of the independent terms (more exactly, a probability distribution over its admitted values) A value for one of the dependent terms Desired: Inferred probabilities for the independent terms, alone or in combination, based on this information.
8
Inverse operation (from previous lecture) Consider this simple case first: lights-are-on [noone-home? ] If it is known that lights-are-on is true, what is the probability for noone-home ? Possible combinations: lights-are-on noone-home true 0.70 0.30 false 0.20 0.80 Suppose noone-home is true in 20% of overall cases, obtain: lights-are-on noone-home true 0.14 0.06 false 0.16 0.64 Given lights-are-on, noone-home has 14/30 = 46.7% probability..
9
Inverse operation (from previous lecture) Consider this simple case first: lights-are-on [noone-home? ] If it is known that lights-are-on is true, what is the probability for noone-home ? Possible combinations: lights-are-on noone-home 0.70 0.30 false 0.20 0.80 Suppose noone-home is true in 20% of overall cases, obtain: lights-are-on noone-home 0.14 0.06 false 0.16 0.64 Given lights-are-on, noone-home has 14/30 = 46.7% probability. The probability estimate has changed from 20% to 46.7% according to the additional information.
10
Redoing the example systematically lights-are-on noone-home true 0.70 0.30 probabilities cond'l ------------ on noone-home false 0.20 0.80 Suppose noone-home is true in 20% of overall cases, i.e. the a priori probabillity for noone-home is 0.20 lights-are-on noone-home true 0.14 0.06 a priori probabilities false 0.16 0.64 lights-are-on noone-home true 14/30 | 6/70 probabilities cond'l false 16/30 | 64/70 on lights-are-on
11
Bayes' Rule E = lights-are-on noone-home true 0.70 0.30 P(A|E) = ------------ P(E|A)*P(A)/P(E) false 0.20 0.80 noone-home is true in 20% of overall cases: a priori probabillity for noone-home is 0.20 lights-are-on noone-home true 0.14 0.06 14/30 = false 0.16 0.64 0.70*0.20/0.30 E = lights-are-on noone-home true 14/30 | 6/70 false 16/30 | 64/70
12
Bayes' Rule E = lights-are-on noone-home true 0.70 0.30 P(A|E) = ------------ P(E|A)*P(A)/P(E) false 0.20 0.80 Known: noone-home is true in 20% of overall cases: P(A) = 0.20, P(~A) = 0.80 P(E|A) = 0.70 P(~E|A) = 0.30 P(E|~A) = 0.20 P(~E|~A) = 0.80 P(E) = P(E|A)*P(A) + P(E|~A)*P(~A) = 0.70 * 0.20 + 0.20 * 0.80 = 0.30 P(A|E) = 0.70 * 0.20 / 0.30 = 14/30
13
Derivation of Bayes' Rule To prove: P(A|E) = P(E|A)*P(A)/P(E) P(E&A) = P(E|A) * P(A) P(A&E) = P(A|E) * P(E) P(A|E)*P(E) = P(E|A)*P(A) By a similar proof (exercise!) P(A|E&B) = P(E|A&B)*P(A|B)/P(E|B)
14
More than two term values E = lights-are-on 0 home 0.70 0.30 P(A|E) = ------------ P(E|A)*P(A)/P(E) 1 home 0.20 0.80 ------------ >1 home 0.05 0.95 Only difference: we need P(A) for each one of the three possible outcomes for A, i.e., we need a probability distribution over the possible values of A
15
Two-level Decision Tree dog-outside [noone-home? [dog-sick? ] [dog-sick? ] ] E [A? [B? 0.80 0.70] [B? 0.70 0.30] ] E [ ? 0.80 0.70 0.70 0.30 :range > ]
16
Two-level Decision Tree P(A) = 0.20 E [ ? 0.80 0.70 0.70 0.30 ] P(A|E) = P(E|A)*P(A)/P(E) which means that P(A&B|E) = P(E|A&B)*P(A&B)/P(E) P(A|E) can be obtained as P(A&B|E) + P(A&~B|E) P(E|A&B) = 0.80 1. What is P(A&B) ? 2. What is P(E) ? Before it was obtained as P(E) = P(E|A)*P(A) + P(E|~A)*P(~A)
17
Two-level Decision Tree P(A) = 0.20 P(B) = 0.05 apriori E 0.80 0.01 0.008 0.70 0.19 0.133 0.70 0.04 0.028 0.30 0.76 0.228 P(E) = 0.397 P(A&B|E) = P(E|A&B)*P(A&B)/P(E) P(A|E) can be obtained as P(A&B|E) + P(A&~B|E) P(E|A&B) = 0.80 1. P(A&B) = 0.01 2. P(E) = 0.397 P(A&B|E) = 0.80 * 0.01 / 0.397 ~ 0.02
18
Two-level Decision Tree Explanation of the second line in the table P(A) = 0.20 P(B) = 0.05 apriori 0.80 0.01 0.008 0.70 0.19 0.133... P(E|A&~B) = 0.70 conditional probability, given in the decision tree P(A&~B) = 0.19 a priori probability (using independence) P(E&A&~B) = P(E|A&~B)*P(A&~B) = 0.133 a priori probability P(E) = P(E&A&B) + P(E&A&~B) + P(E&~A&B) + P(E&~A&~B) = 0.397 a priori probability
19
Re-view assumptions made above Given: An exhaustive decision tree using probabilities, so that P(A&B) = P(A) * P(B) for each combination of independent terms A specification of a priori probabilities for each one of the independent terms (more exactly, a probability distribution over its admitted values) A value for one of the dependent terms Desired: Inferred probabilities for the independent terms, alone or in combination, based on this information.
20
Inverse evaluation across causal net Battery charged Headlights on Car moves Starting motor runs Gas in tank Main engine runs Key turnedFuse OK Clutch engaged Observed feature
21
1. Remove irrelevant terms Battery charged Starting motor runs Gas in tank Main engine runs Key turned Observed feature
22
Inverse evaluation across causal net 1. Remove irrelevant terms (both “sideways” and “upward”; also “downward” if apriori probabilities are available anyway) 2. Calculate apriori probabilities “upward” from independent terms to the observed one 3. Calculate inferred probabilities “downward” from observed term to combinations of independent ones 4. Add up probabilities for combinations of independent terms
23
Learning in Decision Trees and Causal Nets Obtaining a priori probabilities for given terms Obtaining conditional probabilities in a decision tree with a given set of independent terms, based on a set of observed cases Choosing the structure of a decision tree using a given set of terms (assuming there is a cost for obtaining the value of a term) Identifying the structure of a causal net using a given set of terms
24
Choosing (the structure of) a decision tree Also applicable for trees without probabilities Given a set of independent variables A, B,... and a large number of instances of the values of these + value of E Consider the decision tree for E only having the node A, and similarly for B, C, etc. Calculate P(E,A) and P(E,~A), and similarly for the other alternative trees Favor the choice that costs the least and that gives the most information in the sense of information theory (the “difference” between P(E,A) and P(E,~A) is as big as possible) Form subtrees recursively as long as it is worthwhile This produces both structure and probabilities for the decision tree
25
Assessing the precision of the decision tree Obtain a sufficiently large set of instances of the problem Divide it into a training set and a test set Construct a decision tree using the training set Evaluate the elements of the test set in the decision tree and check how well predicted values match actual values
26
Roll-dice scenario Roll a number of 6-sided dice and register the following independent variables: The color of the dice (ten different colors) The showing of the 'seconds' dial on the watch, ranging from 0 to 59 The showing of another dice that is thrown at the same time A total of 3600 different combinations Consider a training set where no combination occurs more than once
27
Roll-dice scenario Conclusion from this scenario: It is important to have a way of determining whether the size of the (remaining) training set at a particular node in the decision tree being designed, is at all significant This may be done by testing it against a null hypothesis: could the training set at hand have been obtained purely by chance? It may also be done using human knowledge of the domain at hand Finally it can be done using a test set
28
Continuous-valued terms and terms with a large number of discrete values In order to be used in a decision tree, one must aggregate the range of values into a limited number of cases, for example by introducing intervals (for value domains having a natural ordering)
29
Identifying the structure of a causal net This is very often done manually and using the human knowledge about the problem domain. Other possibility: select or generate a number of alternative causal nets, learn dependence specifications (e.g. decision trees) for each of them using training sets, and assess their precision using test sets
30
There is much more to learn about Learning in A.I. Statistically oriented learning: major part of the field at present. Based on Bayesian methods and/or on neural networks Logic-oriented learning: identifying compact representations of observed phenomena and behavior patterns in terms of logic formulas Case-based learning: the agent maintains a case base of previously encountered situations, the actions it took then, and the outcome of those actions. New situations are addressed by finding a similar case that was successful and adapting the actions that were used then.
31
Lab 3: Using Decision Trees and Causal Nets – the Scenario Three classes of terms (features): illnesses, symptoms, and cures Cures include both use of medicines and other kinds of cures Causal net can model the relation from illness to symptom Another causal net can model the relation from current illness + cure to updated illness Both of these make use of dependency expressions that are probabilistic decision trees
32
Milestone 3a Downloaded lab materials will contain the causal net going from disease to symptom, but without the dependency expressions It will also contain operations for direct evaluation and inverse evaluation of decision trees The task will be to define plausible dependency expressions for this causal net, and to run test examples on it. This includes both test examples given by us, and test examples that you write yourself.
33
Milestone 3b Additional downloaded materials will contain a set of terms for medicines and cures, but without the causal net, and a generator for (animals with) illnesses The first part of the task is to define a plausible causal net and associated dependency expressions for the step from cures to update of illnesses The second part of the task is to run a combined system where animals with illnesses are diagnosed and treated, and the outcome is observed.
34
Updated Plan for the Course Please check the course webpage where the plan for lectures and labs has been modified. Lab 2 has been given one more week in the tutorial sessions, and labs 4 and 5 have been commuted.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.