Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces Examples of PAC Learnable ConceptsExamples of PAC Learnable Concepts

Sample Complexity for Infinite Hypothesis Spaces m >= 1/ ε (ln |H| + ln (1/ δ )) What happens if the size of the hypothesis space is infinite? We now define another measure of the complexity of the hypothesis space called the Vapnik Chervonenkis Dimension or VC(H). In the case of hypothesis spaces that are finite we came up with a bound on the number of examples necessary to guarantee PAC learning, based on the size of the hypothesis space |H|:

The Vapnik Chervonenkis Dimension Can we tell how expressive a hypothesis space is? Consider the following three examples and the possible hypothesis on such sample: Total hypothesis = 8

The Vapnik Chervonenkis Dimension If a set of examples can be partitioned in all possible ways, we say the hypothesis space H shatters that set of examples. If we have a set of s examples, we need all possible 2 s hypotheses to shatter the set of examples. Definition. The VC dimension is the size of the largest finite subset of examples in the input space X shattered by H. (can be infinite).

Example: Intervals in R Consider the input space is the set of real numbers R. Our hypothesis space H is the set of intervals. As an illustration: We measure the speed of several racing cars. We want to distinguish between the speed of race car type A and race car type B. Let’s suppose we have recorded two speeds x1 and x2: x1 x2 Can H shatter these two examples?

Example: Intervals in R x1 x2 We need 2 2 = 4 hypothesis. They are all shown above. So the VC dimension is at least two. What if we have three speeds?: x1 x2 x3 We can’t represent the hypothesis that covers x1 and x3 but not x2. So the VC(H) = 2.

Example: Points in Plane Now consider the input space is the set of coordinates in R 2. Our hypothesis space H is the set of lines. x1 x3 x2 Some of the 8 possible hypotheses

Example: Points in Plane Can H shatter these three examples? Yes, as long as the point are not colinear. In general, the VC dimension of the space of hyperplanes in r dimensions is r + 1.

The Vapnik Chervonenkis Dimension Properties of the VC dimension. The VC dimension is always less than log 2 |H| Why? Suppose that VC(H) = d. This means we need 2 d different hypothesis to shatter the d examples. Therefore 2 d <= |H| and d = VC(H) <= log 2 |H|.

Sample Complexity and the VC dimension How many examples we need to decide a class of concepts C is PAC learnable? If we use the VC dimension for sample complexity, there is a proof that shows that m >= 1/ ε (4 log 2 (2/ δ ) + 8 VC(H) log 2 (13/ ε )) So we can use the VC dimension to prove a class of concepts Is PAC learnable.

Mistake Bound Model We try to answer the question: How many mistakes we need to make before converging to a correct hypothesis? We receive a sequence of learning examples: e1, e2, e3, …, en

Mistake Bound Model For each example we use our current hypothesis to predict the class of that example. How many examples do we need to make before finding the “right” hypothesis? (right means a hypothesis identical to the true concept).

Example: Conjunctions of Literals As an illustration consider the problem of learning conjunctions of literals over n Boolean attributes. Assume again the Find-S algorithm that outputs the most specific hypothesis consistent with the training data.

Example: Conjunctions of Literals Find-S: Initialize h to the most specific hypothesis l 1 ^ ~l 1 ^ … ^ l n ^ ~l n 1. Initialize h to the most specific hypothesis l 1 ^ ~l 1 ^ … ^ l n ^ ~l n 2. For each positive training example X Remove from h any literal not satisfied by X Remove from h any literal not satisfied by X 3. Output hypothesis h

Example: Conjunctions of Literals How many mistakes can we make assuming the target concept is a conjunction of literals? Notice that with the first mistake we eliminate n terms. With every subsequent mistake we eliminate at least one term. Therefore, the maximum number of mistakes we can make is n+1.

Example: Halving Algorithm What is the Halving algorithm? Imagine the candidate elimination algorithm that keeps in the Version Space the set of all consistent hypotheses. Let’s assume the hypothesis output for every example is a majority voting over all hypotheses in the version space. For the two-class problem: Class(x) = + if more than half of the hypotheses vote for + Class(X) = - otherwise

Example: Halving Algorithm How many mistakes (upper bound) can we make before Converging to the right hypothesis? Every time we make a mistake at least half of the hypotheses are wrong. So we need at most log 2 |H| mistakes to converge to the right hypothesis Hypotheses voting for + Hypotheses voting for -

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Similar presentations

Presentation on theme: "Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Similar presentations

Presentation on theme: "Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces."— Presentation transcript:

Similar presentations

About project

Feedback