Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concept Learning and Version Spaces

Similar presentations


Presentation on theme: "Concept Learning and Version Spaces"— Presentation transcript:

1 Concept Learning and Version Spaces
Based Ch.2 of Tom Mitchell’s Machine Learning and lecture slides by Uffe Kjaerulff

2 Presentation Overview
Concept learning as boolean function approximation Ordering of hypothesis Version spaces and candidate-elimination algorithm The role of bias

3 A Concept Learning Task
Inferring boolean-valued functions from training examples; Inductive learning. Example Given: Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water, Forecast; Target concept c: Enjoy-Sport: Day t {Yes,No}; Hypothesis H: described by a conjunction of attributes, e.g. Water=Warm  Sky=Sunny; Training examples D: positive and negative examples of target function, <x1, c(x1),…, xm, c(xm)>. Determine: A hypothesis h from H such that h(x)=c(x) for all x in X. Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool

4 The Inductive Learning Hypothesis
Note: the only information available about c is c(x) for each <x, c(x)> in D. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other observed example.

5 Concept Learning as Search
Some notation for hypothesis representation: “?” means that any value is acceptable as an attribute; “0” means that no value is acceptable. In our example Sky  {Sunny, Cloudy, Rainy}; AirTemp  {Warm, Cold}; Humidity  {Normal, High}; Wind  {Strong, Weak}; Water  {Warm, Cold}; Forecast  {Same, Change}. The instance space contains 3*2*2*2*2*2=96 distinct instances. The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct hypothesis More realistic learning tasks contain much larger H. Efficient strategies are crucial.

6 More-General-Than Let hj and hk be boolean functions over X, then More-General-Than-Or-Equal(hj,hk)(x  X) [hk(x)  hj(x)] Establishes partial order on the hypothesis space.

7 Find-S Algorithm Initialize h to the most specific hypothesis in H;
For each positive training instance x For each attribute ai in h If the constraint ai in h is not satisfied by x then replace ai in h by the most general constraint that is satisfied by x Output hypothesis h. Note: Assume that H contains c and that D contains no errors; Otherwise this technique does not work. Limitations: Can’t tell if it’s learned the concept: Other consistent hypothesis? Fails if training data is inconsistent; Picks maximally specific h; Depending on H there might be several.

8 Version Spaces A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x, c(x)> in D: Consistent(h,D)  ( <x, c(x)>  D) [ h(x) = c(x) ] A version space VSH,D wrt H and D is the subset of hypothesis from H consistent with all training examples in D: VSH,D  { h  H: Consistent(h, D) }

9 The List-Then-Eliminate Algorithm
VersionSpace  a list containing every hypothesis in H; For each training example <x, c(x)> in D Remove from VersionSpace any h for which h(x)c(x) Output the list of hypothesis. Maintains a list of all hypothesis in VSH,D. Unrealistic for most H. More compact (regular) representation of VSH,D is needed.

10 Example Version Space Idea: VSH,D can be represented by the set of most general and most specific consistent hypothesis.

11 Representing Version Spaces
The general boundary G of version space VSH,D is the set of its most general members. The specific boundary S of version space VSH,D is the set of its most specific members. Version Space Representation Theorem Let X be an arbitrary set of instances and let H be a set of boolean-valued hypothesis defined over X. Let c: X  {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined VSH,D  { h  H s  S g  G g  h  s }.

12 Candidate-Elimination Algorithm
G  maximally general hypothesis in H S  maximally specific hypothesis in H For each training example d If d is a positive example Remove from G any hypothesis that does not cover d For each hypothesis s in S that does not cover d Remove s from S Add to S all minimal generalizations h of s such that h covers d and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis that covers d For each hypothesis g in G that covers d Remove g from G Add to G all minimal specializations h of g such that h does not cover d and some member of S is more specific than h Remove from G any hypothesis that is more specific than another hypothesis in G

13 Some Notes on Candidate-Elimination Algorithm
Positive examples make S become increasingly general. Negative examples make G become increasingly specific. Candidate-Elimination algorithm will converge toward the hypothesis that correctly describes the target concept provided that There are no errors in the training example; There is some hypothesis in H that correctly describes the target concept. The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis. Under the above assumptions, new training data can be used to resolve ambiguity. The algorithm beaks down if the data is noisy(inconsistent); Inconsistency can be eventually detected given sufficient training data is given: S and G converge to an empty version space. The target concept is a disjunction of feature attributes.

14 A Biased Hypothesis Space
Bias: Each h  H given by a conjunction of attribute values Unable to represent disjunctive concepts: Sky=Sunny  Sky=Cloudy Most specific hypothesis consistent with 1 and 2 and representable in H is (?,Warm, Normal, Strong, Cool, Change). But it is too general: Covers 3. Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 2 Sunny Warm Normal Strong Cool Change Yes 3 Cloudy 4 Rainy No

15 Unbiased Learner Idea: Choose H that expresses every teachable concept; H is is a power set of X; Allow disjunction and negation. For our example we get 296 possible hypothesis. What is G and S? S becomes a disjunction of positive examples; G becomes a negated disjunction of negative examples. Only training examples will be unambiguously classified. The algorithm cannot generalize!

16 Inductive Bias Let L be a concept learning algorithm; X be a set instances; c be the target concept; Dc={<x, c(x)>} be the set of training examples; L(xi,Dc) denote the classification assigned to the instance xi by L after training on Dc. The inductive bias of L is any minimal set of assertions B such that for the target concept c and corresponding training examples Dc:  xi  X: (BDc xi)  L(xi, Dc) Inductive bias of Candidate-Elimination algorithm: The target concept c is contained in the given hypothesis space H.

17 Summary Points Concept learning as search through H
Partial ordering of H Version space candidate elimination algorithm S and G characterize learner’s uncertainty Inductive leaps are possible only if the learner is biased


Download ppt "Concept Learning and Version Spaces"

Similar presentations


Ads by Google