Computational Learning Theory PAC IID VC Dimension SVM Marius Bulacu Kunstmatige Intelligentie / RuG
computational learning theory The Problem Why does learning work? How do we know that the learned hypothesis h is close to the target function f if we do not know what f is? answer provided by computational learning theory
Probably Approximately Correct The Answer Any hypothesis h that is consistent with a sufficiently large number of training examples is unlikely to be seriously wrong. Therefore it must be: Probably Approximately Correct PAC
The Stationarity Assumption The training and test sets are drawn randomly from the same population of examples using the same probability distribution. Therefore training and test data are Independently and Identically Distributed IID “the future is like the past”
How many examples are needed? Probability of existence of a wrong hypothesis consistent with all examples Size of hypothesis space Number of examples Probability that h and f disagree on an example Sample complexity
Formal Derivation e f H (the set of all possible hypothese) HBAD (the set of “wrong” hypotheses) e f
What if hypothesis space is infinite? Can’t use our result for finite H Need some other measure of complexity for H Vapnik-Chervonenkis dimension
SVM (1): Kernels Kernels Polynomial Radial basis Sigmoid Implicit mapping to a higher dimensional space where linear separation is possible. f3 f2 f2 f1 f1 Complicated separation boundary Simple separation boundary: Hyperplane
SVM (2): Max Margin Good generalization Support vectors f2 “Best” Separating Hyperplane f1 Max Margin From all the possible separating hyperplanes, select the one that gives Max Margin. Solution found by Quadratic Optimization – “Learning”.