Computational Learning Theory Part 1: Preliminaries 1
2 Much of human learning involves acquiring general concepts from specific training examples (this is called inductive learning) Example: Concept of ball * red, round, small * green, round, small * red, round, medium Complicated concepts: “situations in which I should study more to pass the exam” VERSION SPACE Concept Learning by Induction
3 Each concept can be thought of as a Boolean-valued function whose value is true for some inputs and false for all the rest (e.g. a function defined over all the animals, whose value is true for birds and false for all the other animals) This lecture is about the problem of automatically inferring the general definition of some concept, given examples labeled as members or nonmembers of the concept. This task is called concept learning, or approximating (inferring) a Boolean valued function from examples VERSION SPACE Concept Learning by Induction
4 Target Concept to be learnt: “Days on which Aldo enjoys his favorite water sport” Training Examples present are: VERSION SPACE Concept Learning by Induction
5 The training examples are described by the values of seven “Attributes” The task is to learn to predict the value of the attribute EnjoySport for an arbitrary day, based on the values of its other attributes VERSION SPACE Concept Learning by Induction
6 The possible concepts are called Hypotheses and we need an appropriate representation for the hypotheses Let the hypothesis be a conjunction of constraints on the attribute-values VERSION SPACE Concept Learning by Induction: Hypothesis Representation
7 If sky = sunny temp = warm humidity = ? wind = strong water = ? forecast = same then Enjoy Sport = Yes else Enjoy sport = No Alternatively, this can be written as: { sunny, warm, ?, strong, ?, same} VERSION SPACE Concept Learning by Induction: Hypothesis Representation
8 For each attribute, the hypothesis will have either ?Any value is acceptable ValueAny single value is acceptable No value is acceptable VERSION SPACE Concept Learning by Induction: Hypothesis Representation
9 If some instance (example/observation) satisfies all the constraints of a hypothesis, then it is classified as positive (belonging to the concept) The most general hypothesis is {?, ?, ?, ?, ?, ?} It would classify every example as a positive example The most specific hypothesis is { , , , , , } It would classify every example as negative VERSION SPACE Concept Learning by Induction: Hypothesis Representation
10 Alternate hypothesis representation could have been Disjunction of several conjunction of constraints on the attribute-values Example: {sunny, warm, normal, strong, warm, same} {sunny, warm, high, strong, warm, same} {sunny, warm, high, strong, cool, change} VERSION SPACE Concept Learning by Induction: Hypothesis Representation
11 Another alternate hypothesis representation could have been Conjunction of constraints on the attribute-values where each constraint may be a disjunction of values Example: {sunny, warm, normal high, strong, warm cool, same change} VERSION SPACE Concept Learning by Induction: Hypothesis Representation
12 Yet another alternate hypothesis representation could have incorporated negations Example: { sunny, warm, (normal high), ?, ?, ?} VERSION SPACE Concept Learning by Induction: Hypothesis Representation
13 By selecting a hypothesis representation, the space of all hypotheses (that the program can ever represent and therefore can ever learn) is implicitly defined In our example, the instance space X can contain = 96 distinct instances There are = 5120 syntactically distinct hypotheses. Since every hypothesis containing even one classifies every instance as negative, hence semantically distinct hypotheses are: = 973 VERSION SPACE Concept Learning by Induction: Hypothesis Representation
14 Most practical learning tasks involve much larger, sometimes infinite, hypothesis spaces VERSION SPACE Concept Learning by Induction: Hypothesis Representation
15 Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation The goal of this search is to find the hypothesis that best fits the training examples VERSION SPACE Concept Learning by Induction: Search in Hypotheses Space
16 Once a hypothesis that best fits the training examples is found, we can use it to predict the class label of new examples The basic assumption while using this hypothesis is: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples VERSION SPACE Concept Learning by Induction: Basic Assumption
17 If we view learning as a search problem, then it is natural that our study of learning algorithms will examine different strategies for searching the hypothesis space Many algorithms for concept learning organize the search through the hypothesis space by relying on a general to specific ordering of hypotheses VERSION SPACE Concept Learning by Induction: General to Specific Ordering
18 Example: Consider h1 = {sunny, ?, ?, strong, ?, ?} h2 = {sunny, ?, ?, ?, ?, ?} any instance classified positive by h1 will also be classified positive by h2 (because it imposes fewer constraints on the instance) Hence h2 is more general than h1 and h1 is more specific than h2 VERSION SPACE Concept Learning by Induction: General to Specific Ordering
19 Consider the three hypotheses h1, h2 and h3 VERSION SPACE Concept Learning by Induction: General to Specific Ordering Neither h1 nor h3 is more general than the other h2 is more general than both h1 and h3
20 How to find a hypothesis consistent with the observed training examples? - A hypothesis is consistent with the training examples if it correctly classifies these examples One way is to begin with the most specific possible hypothesis, then generalize it each time it fails to cover a positive training example (i.e. classifies it as negative) The algorithm based on this method is called Find-S VERSION SPACE Find-S Algorithm
21 We say that a hypothesis covers a positive training example if it correctly classifies the example as positive A positive training example is an example of the concept to be learnt Similarly a negative training example is not an example of the concept VERSION SPACE Find-S Algorithm
22 VERSION SPACE Find-S Algorithm
23 VERSION SPACE Find-S Algorithm
24 The nodes shown in the diagram are the possible hypotheses allowed by our hypothesis representation scheme Note that our search is guided by the positive examples and we consider only those hypotheses which are consistent with the positive training examples The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses VERSION SPACE Find-S Algorithm
25 At each step, the hypothesis is generalized only as far as necessary to cover the new positive example Therefore, at each stage the hypothesis is the most specific hypothesis consistent with the training examples observed up to this point Hence, it is called Find-S VERSION SPACE Find-S Algorithm
26 Note that the algorithm simply ignores every negative example However, since at each step our current hypothesis is maximally specific it will never cover (falsely classify) any negative example. In other words, it will be always consistent with each negative training example However the data must be noise free and our hypothesis representation should be such that the true concept can be described by it VERSION SPACE Find-S Algorithm
27 Version Space is the set of hypotheses consistent with the training examples of a problem Find-S algorithm finds one hypothesis present in the Version Space, however there may be others VERSION SPACE Definition: Version Space
28 Version Space is the set of hypotheses consistent with the training examples of a problem VERSION SPACE Definition: Version Space
29 This algorithm first initializes the version space to contain all hypotheses possible, then eliminate any hypothesis found inconsistent with any training example The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples VERSION SPACE List-then-Eliminate Algorithm
30 For the Enjoy Sport data we can list 973 possible hypotheses Then we can test each hypothesis to see whether it confirms with our training data set or not VERSION SPACE List-then-Eliminate Algorithm
31 For this data we will be left with the following hypotheses h1 = {Sunny, Warm, ?, Strong, ?, ?} h2 = {Sunny, ?, ?, Strong, ?, ?} h3 = {Sunny, Warm, ?, ?, ?, ?} h4 = {?, Warm, ?, Strong, ?, ?} h5 = {Sunny, ?, ?, ?, ?, ?} h6 = {?, Warm, ?, ?, ?, ?} Note that the Find-S algorithm is able to find only h1 VERSION SPACE List-then-Eliminate Algorithm
32 If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire set of hypotheses consistent with the observed data It has the advantage that it guarantees to output all the hypotheses consistent with the training data Unfortunately, it requires exhaustive listing of all hypotheses – an unrealistic requirement for practical problems VERSION SPACE List-then-Eliminate Algorithm
33 The Candidate Elimination algorithm instead of listing all the possible members of the version space, employs a much more compact representation The version space is represented by its most general (maximally general) and most specific (maximally specific) members These members form the general and specific boundary sets that delimit the version space. Every other member of the version space lies between these boundaries VERSION SPACE Candidate Elimination Algorithm
34 VERSION SPACE Candidate Elimination Algorithm
35 It begins by initializing the version space to the set of all hypotheses, by initializing the G & S boundary sets as {?, ?, …, ?, ?} and { , , …, , } respectively As each training example is considered, the S boundary is generalized and the G boundary is specialized, to eliminate from the version space any hypotheses found inconsistent with the new training example VERSION SPACE Candidate Elimination Algorithm
36 VERSION SPACE Candidate Elimination Algorithm
37 VERSION SPACE Candidate Elimination Algorithm
38 VERSION SPACE Candidate Elimination Algorithm
39 S0 = { , , , , , } G0 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example
40 S0 = { , , , , , } S1 = {sunny, warm, normal, strong, warm, same} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example
41 S0 = { , , , , , } S1 = {sunny, warm, normal, strong, warm, same} S2 = {sunny, warm, ?, strong, warm, same} G0 = G1 = G2 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example
42 S0 = { , , , , , } S1 = {sunny, warm, normal, strong, warm, same} S2 = S3 = {sunny, warm, ?, strong, warm, same} G3 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} {?, ?, ?, ?, same} {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example
43 S0 = { , , , , , } S1 = {sunny, warm, normal, strong, warm, same} S2 = S3 = {sunny, warm, ?, strong, warm, same} S4 = {sunny, warm, ?, strong, ?, ?} G4 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} G3 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} {?, ?, ?, ?, same} {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example
44 VERSION SPACE Candidate Elimination Algorithm: Example
45 Suppose the 2 nd example is presented as negative S0 = { , , , , , } S1 = S2 = {sunny, warm, normal, strong, warm, same} G2 = {?, ?, Normal, ?, ?, ?} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Noisy Data
46 When the 4 th example arrives the only general hypothesis will be wiped off G2 = G3 = {?, ?, Normal, ?, ?, ?} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Noisy Data
47 The algorithm will fail if the target concept cannot be described in the hypothesis representation Our hypothesis representation: conjunction of attribute values Example sunny warm normal strong cool changeYes cloudy warm normal strong cool change Yes rainy warm normal strong cool change No VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?
48 Example sunny warm normal strong cool changeYes cloudy warm normal strong cool change Yes rainy warm normal strong cool change No Our representation is unable to represent disjunctive target concepts such as: sunny or cloudy VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?
49 The obvious solution to the problem is to provide a hypothesis space capable of representing every teachable concept (by allowing arbitrary disjunctions, conjunctions, negations of attributes and hypotheses to form new hypotheses) However this raises the problem that the algorithm is not able to generalize beyond the training instances Example: Let there be three positive instances x1, x2, & x3 and 2 negative instances x4 and x5 VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?
50 Sections – 2.6 of T. Mitchell VERSION SPACE Reference