Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
VC Dimension – definition and impossibility result
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lower bounds for epsilon-nets
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
Machine Learning Week 3 Lecture 1. Programming Competition
BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
Machine Learning Week 2 Lecture 2.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Active Learning of Binary Classifiers
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
1 Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Prof. Giancarlo Mauri Lezione 4 - Computational.
Learning to Identify Winning Coalitions in the PAC Model A. D. Procaccia & J. S. Rosenschein.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Computational Learning Theory
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
On Complexity, Sampling, and є-Nets and є-Samples. Present by: Shay Houri.
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Conditional Probability and Independence If A and B are events in sample space S and P(B) > 0, then the conditional probability of A given B is denoted.
Basics Set systems: (X,F) where F is a collection of subsets of X. e.g. (R 2, set of half-planes) µ: a probability measure on X e.g. area/volume is a.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Summary of the Last Lecture This is our second lecture. In our first lecture, we discussed The vector spaces briefly and proved some basic inequalities.
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Prof. Giancarlo Mauri Lezione 5-6 – Teoria Computazionale.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Existence of Non-measurable Set
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
Existence of Non-measurable Set
Spectral Clustering.
Vapnik–Chervonenkis Dimension
CONCEPTS OF ESTIMATION
CSCI B609: “Foundations of Data Science”
Computational Learning Theory
Partly Verifiable Signals (c.n.)
Computational Learning Theory
The probably approximately correct (PAC) learning model
Topic 3: Prob. Analysis Randomized Alg.
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CSCI B609: “Foundations of Data Science”
CS344 : Introduction to Artificial Intelligence
Machine Learning: UNIT-3 CHAPTER-2
Counting Elements of Disjoint Sets: The Addition Rule
Presentation transcript:

Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds

PAC Learning model There exists a distribution D over domain X Examples: Goal: –With high probability (1-  ) –find h in H such that –error(h,c ) < 

Definitions: Projection Given a concept c over X –associate it with a set (all positive examples) Projection (sets) –For a concept class C and subset S –  C (S) = { c  S | c  C} Projection (vectors) –For a concept class C and S = {x 1, …, x m } –  C (S) = { | c  C}

Definition: VC-dim Clearly |  C (S) |  2 m C shatters S if |  C (S) | =2 m VC dimension of a class C: –The size d of the largest set S that shatters C. –Can be infinite. For a finite class C –VC-dim(C)  log |C|

Lower bounds: Setting Static learning algorithm: –asks for a sample S of size m(  ) –Based on S selects a hypothesis

Lower bounds: Setting Theorem: –If VC-dim(C) =  then C is not learnable. Proof: –Let m = m(0.1,0.1) –Find 2m points which are shattered (set T) –Let D be the uniform distribution on T –Set c t (x i )=1 with probability ½. Expected error ¼. Finish proof!

Lower Bound: Feasible Theorem –VC-dim(C)=d+1, then m(  )=  (d/  ) Proof: –Let T be a set of d+1 points which is shattered. –Let the distribution D be: z 0 with prob. 1-8  z i with prob. 8  /d

Continue –Set c t (z 0 )=1 and c t (z i )=1 with probability ½ Expected error 2  Bound confidence –for accuracy 

Lower Bound: Non-Feasible Theorem –For two hypotheses m(  )=  ((log 1  )    ) Proof: –Let H={h 0, h 1 }, where h b (x)=b –Two distributions: –D 0 : Pr[ ]= ½ -  and Pr[ ]= ½ +  –D 1 : Pr[ ]= ½ +  and Pr[ ]= ½ - 

Epsilon net Epsilon bad concepts –B  ( c ) = { h | error(h,c) >  } A set of points S is an  -net w.r.t. D if –for every h  in B  ( c ) –there exists a point x in S –such that h(x)  c(x)

Sample size Event A: –The sample S 1 is not an epsilon net, |S 1 |=m. Assume A holds –Let h be a epsilon-bad consistent hypothesis. Sample an additional sample S 2 –with probability at least 1/2 –the errors of h on S 2 is  m/2 –for m=|S 2 |= O(1/ 

continues Event B –There exists h in B  ( c ) –and h consistent with S 1 –h has  m/2 errors on S 2 Pr[ B | A ]  1/2 –2 Pr[B]  P[A] Let F be the projection of C to S 1  S 2 –F=  C (S 1  S 2 )

Error set ER(h)={ x : x  S 1  S 2 and c(x)=h(x)} |ER(h)|   m/2 Event A: –ER(h)  S 1 =  Event B: –ER(h)  S 1 =  –ER(h)  S 2 = ER(h)

Combinatorial problem 2m black and white balls –exactly l black balls Consider a random partition to S 1 and S 2 The probability that all the black balls in S 2

Completing the proof Probability of B –Pr[B]  |F| 2 -l  |F| 2 -  m/2 Probability of A –Pr[A]  Pr[B]  |F| 2 -  m/2 Confidence  Pr[A] Sample –m=O( (1/  ) log 1/  (1/  ) log |F| ) Need to bound |F| !!!

Bounding |F| Define: –J(m,d)=J(m-1,d) + J(m-1,d-1) –J(m,0)=1 and J(0,d)=1 Solving the recursion Claim: –Let VC-dim(C)=d and |S|=m, –then |  C (S)|  J(m,d)