Concept Learning and Version Spaces Based Ch.2 of Tom Mitchell’s Machine Learning and lecture slides by Uffe Kjaerulff
Presentation Overview Concept learning as boolean function approximation Ordering of hypothesis Version spaces and candidate-elimination algorithm The role of bias
A Concept Learning Task Inferring boolean-valued functions from training examples; Inductive learning. Example Given: Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water, Forecast; Target concept c: Enjoy-Sport: Day t {Yes,No}; Hypothesis H: described by a conjunction of attributes, e.g. Water=Warm Sky=Sunny; Training examples D: positive and negative examples of target function, <x1, c(x1),…, xm, c(xm)>. Determine: A hypothesis h from H such that h(x)=c(x) for all x in X. Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool
The Inductive Learning Hypothesis Note: the only information available about c is c(x) for each <x, c(x)> in D. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other observed example.
Concept Learning as Search Some notation for hypothesis representation: “?” means that any value is acceptable as an attribute; “0” means that no value is acceptable. In our example Sky {Sunny, Cloudy, Rainy}; AirTemp {Warm, Cold}; Humidity {Normal, High}; Wind {Strong, Weak}; Water {Warm, Cold}; Forecast {Same, Change}. The instance space contains 3*2*2*2*2*2=96 distinct instances. The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct hypothesis More realistic learning tasks contain much larger H. Efficient strategies are crucial.
More-General-Than Let hj and hk be boolean functions over X, then More-General-Than-Or-Equal(hj,hk)(x X) [hk(x) hj(x)] Establishes partial order on the hypothesis space.
Find-S Algorithm Initialize h to the most specific hypothesis in H; For each positive training instance x For each attribute ai in h If the constraint ai in h is not satisfied by x then replace ai in h by the most general constraint that is satisfied by x Output hypothesis h. Note: Assume that H contains c and that D contains no errors; Otherwise this technique does not work. Limitations: Can’t tell if it’s learned the concept: Other consistent hypothesis? Fails if training data is inconsistent; Picks maximally specific h; Depending on H there might be several.
Version Spaces A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x, c(x)> in D: Consistent(h,D) ( <x, c(x)> D) [ h(x) = c(x) ] A version space VSH,D wrt H and D is the subset of hypothesis from H consistent with all training examples in D: VSH,D { h H: Consistent(h, D) }
The List-Then-Eliminate Algorithm VersionSpace a list containing every hypothesis in H; For each training example <x, c(x)> in D Remove from VersionSpace any h for which h(x)c(x) Output the list of hypothesis. Maintains a list of all hypothesis in VSH,D. Unrealistic for most H. More compact (regular) representation of VSH,D is needed.
Example Version Space Idea: VSH,D can be represented by the set of most general and most specific consistent hypothesis.
Representing Version Spaces The general boundary G of version space VSH,D is the set of its most general members. The specific boundary S of version space VSH,D is the set of its most specific members. Version Space Representation Theorem Let X be an arbitrary set of instances and let H be a set of boolean-valued hypothesis defined over X. Let c: X {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined VSH,D { h H s S g G g h s }.
Candidate-Elimination Algorithm G maximally general hypothesis in H S maximally specific hypothesis in H For each training example d If d is a positive example Remove from G any hypothesis that does not cover d For each hypothesis s in S that does not cover d Remove s from S Add to S all minimal generalizations h of s such that h covers d and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis that covers d For each hypothesis g in G that covers d Remove g from G Add to G all minimal specializations h of g such that h does not cover d and some member of S is more specific than h Remove from G any hypothesis that is more specific than another hypothesis in G
Some Notes on Candidate-Elimination Algorithm Positive examples make S become increasingly general. Negative examples make G become increasingly specific. Candidate-Elimination algorithm will converge toward the hypothesis that correctly describes the target concept provided that There are no errors in the training example; There is some hypothesis in H that correctly describes the target concept. The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis. Under the above assumptions, new training data can be used to resolve ambiguity. The algorithm beaks down if the data is noisy(inconsistent); Inconsistency can be eventually detected given sufficient training data is given: S and G converge to an empty version space. The target concept is a disjunction of feature attributes.
A Biased Hypothesis Space Bias: Each h H given by a conjunction of attribute values Unable to represent disjunctive concepts: Sky=Sunny Sky=Cloudy Most specific hypothesis consistent with 1 and 2 and representable in H is (?,Warm, Normal, Strong, Cool, Change). But it is too general: Covers 3. Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 2 Sunny Warm Normal Strong Cool Change Yes 3 Cloudy 4 Rainy No
Unbiased Learner Idea: Choose H that expresses every teachable concept; H is is a power set of X; Allow disjunction and negation. For our example we get 296 possible hypothesis. What is G and S? S becomes a disjunction of positive examples; G becomes a negated disjunction of negative examples. Only training examples will be unambiguously classified. The algorithm cannot generalize!
Inductive Bias Let L be a concept learning algorithm; X be a set instances; c be the target concept; Dc={<x, c(x)>} be the set of training examples; L(xi,Dc) denote the classification assigned to the instance xi by L after training on Dc. The inductive bias of L is any minimal set of assertions B such that for the target concept c and corresponding training examples Dc: xi X: (BDc xi) L(xi, Dc) Inductive bias of Candidate-Elimination algorithm: The target concept c is contained in the given hypothesis space H.
Summary Points Concept learning as search through H Partial ordering of H Version space candidate elimination algorithm S and G characterize learner’s uncertainty Inductive leaps are possible only if the learner is biased