Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker

2 Learning  Learning is essential for unknown environments –i.e., when designer lacks omniscience  Learning is useful as a system construction method –i.e., expose the agent to reality rather than trying to write it down  Learning modifies the agent's decision mechanisms to improve performance

3 Learning Agents

4 Learning Element  Design of a learning element is affected by: – Which components of the performance element are to be learned – What feedback is available to learn these components – What representation is used for the components  Type of feedback: – Supervised learning: correct answers for each example – Unsupervised learning: correct answers not given – Reinforcement learning: occasional rewards

5 Inductive Learning  Simplest form: learn a function from examples - f is the target function - an example is a pair (x, f(x))  Problem: find a hypothesis h such that h ≈ f given a training set of examples  This is a highly simplified model of real learning: - ignores prior knowledge - assumes examples are given

6 Inductive Learning Method  Construct/adjust h to agree with f on training set  (h is consistent if it agrees with f on all examples)  E.g., curve fitting:

11 Inductive Learning Method  Construct/adjust h to agree with f on training set  (h is consistent if it agrees with f on all examples)  E.g., curve fitting: Occam’s razor: prefer the simplest hypothesis consistent with data

12 Occam’s Razor William of Occam (1285-1349, England)  “If two theories explain the facts equally well, then the simpler theory is to be preferred.”  Rationale:  There are fewer short hypotheses than long hypotheses.  A short hypothesis that fits the data is unlikely to be a coincidence.  A long hypothesis that fits the data may be a coincidence.  Formal treatment in computational learning theory

13 The Problem Why does learning work? How do we know that the learned hypothesis h is close to the target function f if we do not know what f is? answer provided by computational learning theory

14 The Answer Any hypothesis h that is consistent with a sufficiently large number of training examples is unlikely to be seriously wrong. Therefore it must be: P robably A pproximately C orrect PAC

15 The Stationarity Assumption The training and test sets are drawn randomly from the same population of examples using the same probability distribution. Therefore training and test data are I ndependently and I dentically D istributed IID “the future is like the past”

16 How many examples are needed? Number of examples Probability that h and f disagree on an example Probability of existence of a wrong hypothesis consistent with all examples Size of hypothesis space Sample complexity

17 Formal Derivation H (the set of all possible hypothese) f  H BAD (the set of “wrong” hypotheses)

18 What if hypothesis space is infinite?  Can’t use our result for finite H  Need some other measure of complexity for H –Vapnik-Chervonenkis dimension

22 Shattering two binary dimensions over a number of classes  In order to understand the principle of shattering sample points into classes we will look at the simple case of  two dimensions  of binary value

23 2-D feature space 0 0 1 1 f1f1 f2f2

24 2-D feature space, 2 classes 0 0 1 1 f1f1 f2f2

25 the other class… 0 0 1 1 f1f1 f2f2

26 2 left vs 2 right 0 0 1 1 f1f1 f2f2

27 top vs bottom 0 0 1 1 f1f1 f2f2

28 right vs left 0 0 1 1 f1f1 f2f2

29 bottom vs top 0 0 1 1 f1f1 f2f2

30 lower-right outlier 0 0 1 1 f1f1 f2f2

31 lower-left outlier 0 0 1 1 f1f1 f2f2

32 upper-left outlier 0 0 1 1 f1f1 f2f2

33 upper-right outlier 0 0 1 1 f1f1 f2f2

34 etc. 0 0 1 1 f1f1 f2f2

38 XOR configuration A 0 0 1 1 f1f1 f2f2

39 XOR configuration B 0 0 1 1 f1f1 f2f2

40 2-D feature space, two classes: 16 hypotheses f 1 =0 f 1 =1 f 2 =0 f 2 =1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 “hypothesis” = possible class partioning of all data samples

41 2-D feature space, two classes, 16 hypotheses f 1 =0 f 1 =1 f 2 =0 f 2 =1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 two XOR class configurations: 2/16 of hypotheses requires a non-linear separatrix

42 XOR, a possible non-linear separation 0 0 1 1 f1f1 f2f2

43 XOR, a possible non-linear separation 0 0 1 1 f1f1 f2f2

44 2-D feature space, three classes, # hypotheses? f 1 =0 f 1 =1 f 2 =0 f 2 =1 0 1 2 3 4 5 6 7 8 ……………………

45 2-D feature space, three classes, # hypotheses? f 1 =0 f 1 =1 f 2 =0 f 2 =1 0 1 2 3 4 5 6 7 8 …………………… 3 4 = 81 possible hypotheses

46 Maximum, discrete space  Four classes: 4 4 = 256 hypotheses  Assume that there are no more classes than discrete cells  Nhypmax = ncells nclasses

47 2-D feature space, three classes… 0 0 1 1 f1f1 f2f2 In this example,   is linearly separatable  from the rest, as is .  But  is not linearly separatable from the rest of the classes.

48 2-D feature space, four classes… 0 0 1 1 f1f1 f2f2 Minsky & Papert: simple table lookup or logic will do nicely.

49 2-D feature space, four classes… 0 0 1 1 f1f1 f2f2 Spheres or radial-basis functions may offer a compact class encapsulation in case of limited noise and limited overlap (but in the end the data will tell: experimentation required!)

50 SVM (1): Kernels Complicated separation boundary Simple separation boundary: Hyperplane f1f1 f2f2 f1f1 f2f2 f3f3 Kernels  Polynomial  Radial basis  Sigmoid  Implicit mapping to a higher dimensional space where linear separation is possible.

51 SVM (2): Max Margin Support vectors Max Margin “Best” Separating Hyperplane  From all the possible separating hyperplanes, select the one that gives Max Margin.  Solution found by Quadratic Optimization – “Learning”. f1f1 f2f2 Good generalization

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Similar presentations

Presentation on theme: "Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Similar presentations

Presentation on theme: "Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker."— Presentation transcript:

Similar presentations

About project

Feedback