PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp. 1134-1142.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
VC Dimension – definition and impossibility result
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
HW HW1: Let us know if you have any questions. ( the TAs) HW2:
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Decision List LING 572 Fei Xia 1/18/06. Outline Basic concepts and properties Case study.
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
1 Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Prof. Giancarlo Mauri Lezione 4 - Computational.
Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Chapter 11: Limitations of Algorithmic Power
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
CS 4700: Foundations of Artificial Intelligence
Decision Trees and more!. Learning OR with few attributes Target function: OR of k literals Goal: learn in time – polynomial in k and log n –  and 
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
A New Linear-threshold Algorithm Anna Rapoport Lev Faivishevsky.
On Complexity, Sampling, and є-Nets and є-Samples. Present by: Shay Houri.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Machine Learning Algorithms in Computational Learning Theory
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Department of Statistics University of Rajshahi, Bangladesh
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
1 Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Prof. Giancarlo Mauri Lezione 5-6 – Teoria Computazionale.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
HW HW1: Let us know if you have any questions. ( the TAs)
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
Introduction to Machine Learning
Web-Mining Agents Computational Learning Theory
CH. 2: Supervised Learning
Vapnik–Chervonenkis Dimension
Alternating tree Automata and Parity games
Computational Learning Theory
Computational Learning Theory
Chapter 11 Limitations of Algorithm Power
The probably approximately correct (PAC) learning model
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CS344 : Introduction to Artificial Intelligence
Machine Learning: UNIT-3 CHAPTER-2
Presentation transcript:

PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp

PAC learning Basic notions X - instance space (in general, an arbitrary set) c  X- concept (each subset of X is a concept) C  {c|c  X}- class of concepts to be learned T  C- the unknown concept to be learned Examples XR 2 {0,1} n Cset of rectanglesCNF with n variables Ta given rectanglea given CNF It may be more convenient for the learning algorithm to represent T in other way, and not simply as a subset of X.

PAC learning Learning algorithm - inputs P- fixed probability distribution defined on X Learning algorithm receives a sequence of examples (x 0,v 0 ),(x 1,v 1 ),(x 2,v 2 ),(x 3,v 3 ),... where x i  X and v i = “+”, if x i  T, v i = “–”, if x i  T, and the probability that x i appears as a current element in the sequence is in accordance with P.  - accuracy parameter  - confidence parameter

PAC learning Learning algorithm - inputs A model where all v i = “+” can be considered, in which case we say that an algorithm is learning from positive examples. In general case we can say that algorithm is learning from positive and negative examples. A learning from only negative examples can also be considered.

PAC learning Learning algorithm - outputs After the processing a finite sequence of inputs a learning algorithm outputs a concept S  C. S  T- the symmetric difference of S and T P(S  T) - the error rate of concept S, i.e. the probability that T and S classify a random example differently.

PAC learning Learnability - definition A concept S is approximately correct, if P(S  T)  The learning algorithm is probably approximately correct, if the probability that the output S is approximately correct is at least 1–  In this case we say that a learning algorithm pac-learns the concept class C, and that the concept class C is pac-learnable

PAC learning Polynomial learnability 1 A learning algorithm L is a polynomial PAC-learning algorithm for C, and C is polynomially PAC-learnable, if L PAC-learns C with time complexity (and sample complexity) which are polynomial in 1/  and 1/ . It is useful to consider similar classes of concepts with different sizes n, and require polynomial complexity also in n, but then we must focus on specific instance spaces dependent from parameter n (eg. X= {0,1} n ).

PAC learning Polynomial learnability 2 We consider C = {(X n,C n )|n>0} A learning algorithm L is a polynomial PAC-learning algorithm for C, and C is polynomially PAC-learnable, if L PAC-learns C with time complexity (and sample complexity) which are polynomial in n, 1/  and 1/ .

PAC learning Learnability - simple observation The (polynomial) PAC-learnability from positive (or negative) examples implies (polynomial) PAC-learnability from positive andnegative examples. The converse may not be necessarily true.

PAC learning Valiant’s results - k-CNF X = {0,1} n (n boolean variables) C = set of k-CNF formulas (ie, CNF formulas with at most k literals in each clause) Theorem For any positive integer k the class of k-CNF formulas is polynomially PAC-learnable from positive examples. (Even when partial examples (with some variables missing) are allowed.)

PAC learning Valiant’s results - k-CNF - function L(r,m) Definition For any real number r >1 and for any positive integer m the value L(r,m) is the smallest integer y, such that in y Bernoulli trials with probability of success at least 1/r, the probability of having fewer than m successes is less than 1/r. Proposition L(r,m)  2r(m+ln r)

PAC learning Valiant’s results - k-CNF - algorithm start with formula containing all possible clauses with k literals (there are less than (2n) k+1 such clauses) when receiving current example x i, delete all clauses not consistent with x i Lemma The algorithm above PAC-learns the class of k-CNF formulas from L(h,(2n) k+1 ) examples, where h = max{1/ ,1/  }.

PAC learning Valiant’s results - k-CNF - algorithm g- the initial k-CNF f(i)- the k-CNF produced by algorithm after i examples Consider P(g  f(i)) At each step P(g  f(i)) can only decrease, the probability that this will happen will be exactly P(g  f(i)) After l=L(h,(2n) k+1 ) examples there may be two situations: P(g  f(l))  1/hOK, f(l) is h approximation P(g  f(l)) > 1/h

PAC learning Valiant’s results - k-CNF - algorithm After l=L(h,(2n) k+1 ) examples there may be two situations: P(g  f(l))  1/hOK, f(l) is h approximation P(g  f(l)) > 1/h The probability of the second situation is at most 1/h (we have L(h,(2n) k+1 ) Bernoulli experiments with probability at least 1/h of success for each)

PAC learning Valiant’s results - k-term-DNF X = {0,1} n (n boolean variables) C = set of k-term-DNF formulas(ie, DNF formulas with at most k monomials) Theorem For any positive integer k the class of k-DNF formulas is polynomially PAC-learnable. (Only, when total examples (with all variables present) are allowed.)

PAC learning VC (Vapnik - Chervonenkis) dimension C- nonempty concept class s  X- set  C (s) = {s  c | c  C} - all subsets of s, that can be obtaining by intersecting s with a concept from c We say that s is shattered by C, if  C (s) = 2 s. The VC (Vapnik-Chervonenkis) dimension of C is the cardinality of the largest finite set of points s  X that is shattered by C (if arbitrary large finite sets are shattered, we say that VC dimension of C is infinite).

PAC learning VC dimension - examples Example 1 X = R C = set of all (open or closed) intervals If s = {x 1,x 2 }, then there exist c 1, c 2, c 3, c 4  C, such that c 1  s = {x 1 }, c 2  s = {x 2 }, c 3  s = , and c 14  s = s. If s = {x 1,x 2,x 3 }, x 1  x 2  x 3, then there is no concept c  C, such that x 1  c, x 3  c and x 2  c. Thus the VC dimension of C is 2.

PAC learning VC dimension - examples Example 2 C = any finite concept class It requires 2 d distinct concepts to shatter a set of d points, therefore no set of cardinality larger that log|C| can be shattered. Hence the VC dimension of C is at most log |C|.

PAC learning VC dimension - examples Example 3 X = R C = set of all finite unions of (open or closed) intervals Any finite s  X can be shattered, thus the VC dimension of C is infinite.

PAC learning VC dimension - relation to PAC Theorem [A.Blumer, A.Ehrenfeucht, D.Haussler, M.Warmuth] C is PAC-learnable if and only if the VC dimension of C is finite. Theorem also gives an upper and lower bounds of number of examples needed for learning. Learnability and the Vapnik-Chervonenkis dimension, Journal of the ACM, vol. 36, 4, 1989, pp

PAC learning VC dimension - relation to PAC Let d be the VC dimension of C, then: no algorithm can PAC-learn class C with less than max((1  )/  ln1/ ,d(1  2(  (1   )+  ))) examples. any consistent algorithm can PAC-learn class C with max(4/  log 2/ ,8d/  log 13/  ) examples.

PAC learning VC dimension - relation to PAC - example X = R 2 C = set of all triangles in X VC dimension of C is 4, thus C is PAC-learnable.

PAC learning VC dimension - relation to polynomial PAC We consider C = {(X n,C n )|n>0} A randomised polynomial hypothesis finder (RPHF) for C is a randomised polynomial time algorithm that takes as input a sample of a concept in C and for some  > 0, with probability at least  produces a hypothesis in C that is consistent with this sample.  is called the success rate of RPHF.

PAC learning VC dimension - relation to polynomial PAC Theorem [A.Blumer, A.Ehrenfeucht, D.Haussler, M.Warmuth] For any concept class C = {(X n,C n )|n>0}, C is polynomially PAC-learnable in hypothesis space C, if and only if there is an RPHF for C and the VC dimension of C n grows polynomially in n.

PAC learning VC dimension - relation to polynomial PAC Example C n = class of all axis-parallel rectangles in R n. VC of C n does not exceed 2n. 2n/  ln(2n/  ) examples are sufficient (or use the upper bound from the previous theorem).

PAC learning Learning of decision lists - decision trees x1x1 x2x2 x4x4 x3x Decision tree for formula ¬x 1 ¬x 3 + x 1 ¬x 2 x 4 + x 1 x 2

PAC learning Learning of decision lists - decision trees Let k-DT be the set of all boolean functions defined by decision trees of depth at most k

PAC learning Learning of decision lists - decision lists x 1 ¬x 3 ¬x 1 x 2 x 5 ¬x 3 ¬x 4 true false

PAC learning Learning of decision lists - decision lists Let k-DL be the set of all boolean functions defined by decision lists containing clauses contatining at most k literals. Theorem [R.Rivest] For any positive integer k the class of k-DL formulas is polynomially PAC-learnable.

PAC learning Some proper subsets of class k-DL k-CNF k-DNF k-CNF  k-DNF k-clause-CNF k-term-DNF k-DT k-clause-CNF and k-term-DNF are not learnable in their “own” hypotheses space!

PAC learning Some other facts about k-DL Whilst k-DL are polynomially PAC-learnable, some subclasses may be learnable more efficiently in their “own” hypotheses space (eg. k-DNF). It is likely that unbounded DT are not polynomially PAC-learnable (the converse will imply some NP related result that is unlikely to be true).