VC Dimension – definition and impossibility result

Slides:

Advertisements

Similar presentations

Computational Learning Theory

Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate

Inapproximability of Hypergraph Vertex-Cover. A k-uniform hypergraph H= : V – a set of vertices E - a collection of k-element subsets of V Example: k=3.

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.

VC theory, Support vectors and Hedged prediction technology.

Price Of Anarchy: Routing

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Machine Learning Week 3 Lecture 1. Programming Competition

Convex Hulls in Two Dimensions Definitions Basic algorithms Gift Wrapping (algorithm of Jarvis ) Graham scan Divide and conquer Convex Hull for line intersections.

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.

By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik

Machine Learning Week 2 Lecture 2.

Computational Geometry The art of finding algorithms for solving geometrical problems Literature: –M. De Berg et al: Computational Geometry, Springer,

An Algorithm for Polytope Decomposition and Exact Computation of Multiple Integrals.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

1.4 Exercises (cont.) Definiton: A set S of points is said to be affinely (convex) independent if no point of S is an affine combination of the others.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Computational Learning Theory

Probably Approximately Correct Model (PAC)

Vapnik-Chervonenkis Dimension

Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.

Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.

Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

1 Separator Theorems for Planar Graphs Presented by Shira Zucker.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

On Complexity, Sampling, and є-Nets and є-Samples. Present by: Shay Houri.

Support Vector Machines

PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp

Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.

CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.

Ch. 6 - Approximation via Reweighting Presentation by Eran Kravitz.

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Arrangements and Duality Motivation: Ray-Tracing Fall 2001, Lecture 9 Presented by Darius Jazayeri 10/4/01.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Summary of the Last Lecture This is our second lecture. In our first lecture, we discussed The vector spaces briefly and proved some basic inequalities.

Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.

Computational Geometry

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

On Complexity, Sampling, and ε-Nets and ε-Samples

Computational Learning Theory

Computational Learning Theory

Computational Learning Theory

Introduction to Machine Learning

CH. 2: Supervised Learning

Vapnik–Chervonenkis Dimension

Depth Estimation via Sampling

The Curve Merger (Dvir & Widgerson, 2008)

Computational Learning Theory

Computational Learning Theory

The probably approximately correct (PAC) learning model

Computational Learning Theory Eric Xing Lecture 5, August 13, 2010

CSCI B609: “Foundations of Data Science”

CS344 : Introduction to Artificial Intelligence

Quantum Foundations Lecture 2

Machine Learning: UNIT-3 CHAPTER-2

Supervised machine learning: creating a model

Convex Hull - most ubiquitous structure in computational geometry

Presentation transcript:

VC Dimension – definition and impossibility result Lecturer: Yishay Mansour Eran Nir and Ido Trivizki

VC Dimension – Lecture Overview PAC Model – Review VC dimension – motivation Definitions Some examples of geometric concepts Sample size lower bounds More examples

The PAC Model - Review A fixed, unknown distribution D from which the examples are chosen independently. The target concept is a computable function Our goal – finding h such that: - accuracy parameter; - confidence parameter. An algorithm A learns a family of concepts C if for any and any distribution D , A outputs a function such that .

VC Dimension - Motivation Question: How many examples does a learning algorithm need? For PAC and a finite concept class C we proved: We would like to be able to handle infinite concept classes – VC Dimensions will provide us a substitute to for infinite concept classes.

VC Dimension - Definitions Given a concept class C defined over the instance space X, let The projection of C on S is all the possible functions that C induces on S : A concept class C shatters S if In other words: a class shatters a set if every possible function on the set is in the class.

VC Dimension – Definitions Cont. VCdim (Vapnik-Chervonenkis dimension) of C: The maximum size of a set shattered by C: If a maximum value doesn’t exist then For a finite class C:

VC Dimension – Examples In order to show that the VCdim of a class is d we have to show: : find some shattered set of size d. : show that no set of size d+1 is shattered

VC Dimension – Examples: Half Lines (C1) The concepts are for where:

VC Dimension – Examples: Half Lines (C1) Cont. Claim: : , , thus . : for any set of size 2 there is an assignment which is not in the concept class: for the assignment which lets x be 1 and y be 0 is impossible.

VC Dimension – Examples: Linear halfspaces (C2) The concepts are where for let . are lines in the plane where positive points above or on the line, and negative points are below.

VC Dimension – Examples: Linear halfspaces (C2) Cont. Claim: : Any three points that are not collinear can be shattered. : No set of four points can be shattered: Generally: Half spaces in have VCdim of .

VC Dimension – Examples: Axis-aligned rectangles in the plane (C3) Positive examples are points inside the rectangle, and negative examples are points outside the rectangle.

VC Dimension – Examples: Axis-aligned rectangles in the plane (C3) Claim: : a for points set in the following shape can be shattered:

VC Dimension – Examples: Axis-aligned rectangles in the plane (C3) Claim: : Given a set of five points in the plane, there must be some point that is neither the extreme left, right, top or bottom point of the five. If we label this non-extermal point negative and the remaining four extermal points positive, no rectangle can satisfy the assignment.

VC Dimension – Examples: A finite union of intervals (C4) For any set of points we could cover the positive points by choosing the intervals small enough so

VC Dimension – Examples: Convex Polygons on the plane (C5) Points inside the convex polygon are positive and outside are negative. There is no bound on the number of edges. Claim:

VC Dimension – Examples: Convex Polygons on the plane (C5) Proof: For every labeling of d points on the circle perimeter, there exists that is consistent with the labeling. This is a polygon which includes all the positive examples and none of the negative. Thus the group of points is shuttered. This holds for every d, and so

Sample Size Lower Bounds Goal: we want to show that for a concept class with a finite VCdim d there is a function m of such that if we sample less than points, any PAC learning algorithm would fail. Theorem: If a concept class C has VCdim d+1 then:

Sample Size Lower Bounds - Proof For contradiction: let such that C shatters T (possible because ). Let D(x) be Choose randomly so that it’s

Sample Size Lower Bounds – Proof Cont. is in C because C shatters T. Claim: if we sample less than points out of then the error is at least . Proof: Let RARE be Sample size: the expected number of points we sample from RARE is at most Error: This implies that with probability of at least 0.5 we sample at most points of RARE and thus have error of at least .

VC Dimension – Examples: Parity (C6) Let . The concept class is where . Claim: : Let . For any bits assignment for the vectors we choose the set . We get: and so is shattered. : There are parity functions, thus

VC Dimension – Examples: OR of n literals (C7) Let . The concept class is Claim: : use n unit vectors (see prev. proof). : Use ELIM algorithm to show . Show the (n+1) vector cannot be assigned 1, thus no set of (n+1) vectors can be shuttered.

Radon Theorem Definitions: Convex Set: A is convex if for every the line connecting is in A. Convex Hull: The Convex Hull of S is the smallest convex set which contains all the points of S. We denote it as conv(S). Theorem (Radon): Let E be a set of d+2 points in . There is a subset S of E such that .

VC Dimension – Examples: Hyper-Planes (C8) The concept class assigns 1 to a point if it’s above or on a corresponding hyper-plane, 0 otherwise. Claim: : use n unit vectors and the zero vector to form a n+1 set that can be shuttered. : use Radon theorem (next page)

VC Dimension – Examples: Hyper-Planes (C8) Cont. Assume a set of size d+2 points can be shattered. Use Radon Theorem to find S such that Assume there is a separating hyper-plane that classifies points in S as ‘1’, points not in S as 0. No way to classify points in .