On Complexity, Sampling, and ε-Nets and ε-Samples

Slides:



Advertisements
Similar presentations
VC Dimension – definition and impossibility result
Advertisements

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lower bounds for epsilon-nets
Lecture 24 MAS 714 Hartmut Klauck
Machine Learning Week 3 Lecture 1. Programming Competition
Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.
Tools from Computational Geometry Bernard Chazelle Princeton University Bernard Chazelle Princeton University Tutorial FOCS 2005.
A (1+  )-Approximation Algorithm for 2-Line-Center P.K. Agarwal, C.M. Procopiuc, K.R. Varadarajan Computational Geometry 2003.
Convexity of Point Set Sandip Das Indian Statistical Institute.
Vapnik-Chervonenkis Dimension
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
On Complexity, Sampling, and є-Nets and є-Samples. Present by: Shay Houri.
Krakow, Summer 2011 Circle and Sphere Orders William T. Trotter
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Basics Set systems: (X,F) where F is a collection of subsets of X. e.g. (R 2, set of half-planes) µ: a probability measure on X e.g. area/volume is a.
Chapter 2: Vector spaces
Chapter 3 – Set Theory  .
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Ch. 6 - Approximation via Reweighting Presentation by Eran Kravitz.
Chapter 9: Geometric Selection Theorems 11/01/2013
Monochromatic Boxes in Colored Grids Joshua Cooper, USC Math Steven Fenner, USC CS Semmy Purewal, College of Charleston Math.
Chap. 4 Vector Spaces 4.1 Vectors in Rn 4.2 Vector Spaces
I.4 Polyhedral Theory 1. Integer Programming  Objective of Study: want to know how to describe the convex hull of the solution set to the IP problem.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
Foundations-1 The Theory of the Simplex Method. Foundations-2 The Essence Simplex method is an algebraic procedure However, its underlying concepts are.
1 Chapter 4 Geometry of Linear Programming  There are strong relationships between the geometrical and algebraic features of LP problems  Convenient.
Hubert Chan (Chapters 1.6, 1.7, 4.1)
7.3 Linear Systems of Equations. Gauss Elimination
Chapter 5 Limits and Continuity.
Vector Spaces B.A./B.Sc. III: Mathematics (Paper II) 1 Vectors in Rn
Lap Chi Lau we will only use slides 4 to 19
Chapter 1 Linear Equations and Vectors
Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.
Advanced Algorithms Analysis and Design
Topics in Algorithms Lap Chi Lau.
Markov Chains Mixing Times Lecture 5
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Hubert Chan (Chapters 1.6, 1.7, 4.1)
Computational Molecular Biology
Spectral Clustering.
Vapnik–Chervonenkis Dimension
Background: Lattices and the Learning-with-Errors problem
Additive Combinatorics and its Applications in Theoretical CS
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Basis and Dimension Basis Dimension Vector Spaces and Linear Systems
Depth Estimation via Sampling
Basis Hung-yi Lee.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Memoryless Determinacy of Parity Games
The probably approximately correct (PAC) learning model
Computing Nash Equilibrium
2.III. Basis and Dimension
System of Linear Inequalities
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
I.4 Polyhedral Theory (NW)
CSCI B609: “Foundations of Data Science”
Elementary Linear Algebra Anton & Rorres, 9th Edition
Quantum Foundations Lecture 3
Quantum Foundations Lecture 2
Vector Spaces 1 Vectors in Rn 2 Vector Spaces
I.4 Polyhedral Theory.
Chapter 11: Further Topics in Algebra
Chapter 4 Sequences.
Chapter 2. Simplex method
Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.
Linear Equations in Linear Algebra
Convex Hull - most ubiquitous structure in computational geometry
Chapter 2. Simplex method
Presentation transcript:

On Complexity, Sampling, and ε-Nets and ε-Samples Matan Liber

Overview 1. VC Dimension 1.1 Range Space 1.2 Measure 1.3 Estimate 1.4 Radon’s Theorem 2. Shattering Dimension and Dual Range Space 2.1 Growth Function 2.2 Sauer’s Lemma 2.3 Shatter Function 2.4 Dual Range Space 3. ε-Nets and ε-Sampling 3.1 ε-Sampling Theorem 3.2 ε-Net Theorem

Motivation Understanding geometrical complexity. Quantify geometrical complexity. Capturing the complexity of a set by a small subset.

Range Space A range space S is a pair (X,R). X is the ground set (finite or infinite). R is a (finite or infinite) family of subsets of X. Elements in X are points. Elements in R are ranges.

Examples S = (ℝ, {[a,b] | a ≤ b ∈ ℝ}) S = (People in Tel Aviv, {Age(x,y) | 0 ≤ x ≤y ≤ 120}) S = (ℝ², {D | D is a rectangle in the plane})

Measure Let S = (X,R). Let x ⊆ X (x is finite). For r ∈ R, its measure is 𝑚 (r) = |r∩𝒙| |𝒙| 𝑚 (r) = 2 8 = 1 4

Estimate Let S = (X,R). Let x ⊆ X (x is finite). For N ⊆ x , its estimate for 𝑚 (r) (for some r ∈ R) is 𝑠 (r) = |r∩N| |𝐍| We want to generate N such that 𝑚 (r) ≈ 𝑠 (r) for all r ∈ R. 𝑠 (r) = 1 4 = 𝑚 (r)

Projection and VC Dimension Let S = (X,R). Let Y ⊆ X. R|Y = {r∩Y | r∈R} is the projection of R on Y. p s R|Y={p,q,s} = {∅,{s},{p, s}} q

Shattering If R|Y contains all subsets of Y (for finite Y, |R|Y| = 2|Y|) We say that Y is shattered by R.

VC Dimension Let S = (X,R), the VC Dimension (Vapnik and Chervonenkis) of S is dimvc(S) = max({k∈ℕ | ∃B⊆X,|B|=k, B is shattered by R}) 1 p q s 2

VC Dimension Let S = (X,R). dimvc(S) = ∞∀ k∈ℕ ∃ B⊆X,|B|=k, B is shattered by R

Examples dimvc(S) = ∞ dimvc(S) = 3 dimvc(S) < 4

Complement Space Let S = (X,R) with dimvc(S) = δ. S = (X,R) is the complement space where R = {X∖r | r∈R}

Complement Space: VC Dimension Let S = (X,R) with dimvc(S) = δ. S = (X,R) is the complement space. Claim: dimvc(S) = dimvc(S).

Complement Space VC Dimension Proof: If S shatters B then ∀ Z⊆B, ∃ r∈R, r∩B = B∖Z. So for r = X∖r, r∩B = Z. We get that S shatters B.

Halfspaces

Range Space example: Halfspaces Let P = {p1,…., pd+2} ⊆ ℝd. Claim: ∃β1,…., βd+2 ∈ℝ not all 0. ∑i βi·pi = 0 and ∑i βi = 0.

Range Space example: Halfspaces Proof: Set Q = {qi | qi = (pi,1)∈ℝd+1}. q1,….,qd+2 are linearly dependent (|Q| > d+1).

Range Space example: Halfspaces So ∃β1,…., βd+2 ∈ℝ not all 0 ∑i=1 (βi·qi) = ∑i=1 (βi·(pi,1)) = (0,….,0). So , ∑i=1 (βi·pi) = (0,….,0). And ∑i)βi1·) = 0. d+2 d+2 d+1 d

Convex Hull Let P = {p1,…., pk} ⊆ ℝd. CH(P) = {q | ∃β1,…., βk ≥ 0, ∑iβi = 1, ∑i(βi·pi) = q}

Radon’s Theorem Let P = {p1,…., pd+2} ⊆ ℝd. ∃ C,D⊂P, C∩D=∅, C∪D=P and CH(C)∩CH(D) ≠ ∅. c1 c1 d1 d1 d2 c2 c3 c2

Radon’s Theorem Proof: By previous claim ∃β1,…., βd+2 ∈ℝ not all 0. ∑i (βi·pi) = 0 and ∑i βi = 0. Assume β1,…., βk ≥ 0, and βk+1,…., βd+2 < 0.

Radon’s Theorem Let μ = ∑i=1 βi = -∑i=k+1 βi. Also, ∑i=1 (βi·pi) = -∑i=k+1 (βi·pi). k d+2 k d+2

Radon’s Theorem If we take v = ∑i=1 ((βi/μ)· pi) then v∈CH({p1,…., pk}). Also, v = ∑i=k+1 (-(βi/μ)· pi) and v∈CH({pk+1,…., pd+2}). So for C = {p1,…., pk}, D = {pk+1,…., pd+2} C∩D=∅, C∪D=P, and v∈CH(C)∩CH(D). k d+2

Lemma Let P⊆ℝd ,|P| < ∞. Let s∈CH(P). Let h+ be a halfspace, s∈h+. Then ∃p∈P, p∈h+. . s .p

VC Dimension of Halfspaces Let S = (ℝd,R) where R is all (closed) halfspaces in ℝd. dimvc(S) = d+1.

VC Dimension of Halfspaces Simplex: (convex hull of) d+1 points in ℝd. d=1 d=2 d=3

VC Dimension of Halfspaces Proof: dimvc(S) ≥ d+1.

VC Dimension of Halfspaces By Radon’s Theorem if Q⊆ℝd, |Q| = d+2 ∃ C,D⊂P, C∩D=∅, C∪D=P and CH(C)∩CH(D) ≠ ∅. Let v∈CH(C)∩CH(D). If ∀c∈C, c∈h+ then CH(C) ⊆ h+. So, v∈h+.

VC Dimension of Halfspaces Also, v∈h+∩CH(D). By previous claim ∃d∈D, d∈h+. So ∄ h+∈R, h+∩Q=C. Which means Q is not shattered by S. So, dimvc(S) ≥ d+1 and dimvc(S) > d+2 ⇒ dimvc(S) = d+1. c1 v d2 d1 c2

Growth Function Define the growth function gδ(n) = 𝑖=0 δ 𝑛 𝑖 ≤ 𝑖=0 δ 𝑛 𝑖 𝑖! ≤ nδ From Pascal’s rule we get gδ(n) = gδ(n-1) + gδ-1(n-1). Pascal’s rule: 𝑛 𝑘 = 𝑛−1 𝑘 + 𝑛−1 𝑘−1 .

Sauer’s Lemma Let S = (Y,R) with dimvc(S) = δ. |Y| = n. Where Y ⊆ X and R = R’|Y for some S’ = (X,R’), . Then |R| ≤ gδ(n).

Sauer’s Lemma Proof: Easy for δ = 0 or n = 0 (0 ≤ 0). Let x ∈ Y.

Sauer’s Lemma Rx = {r ∖{x} | r∪{x} ∈ R and r∖{x} ∈ R} R∖{x} = {r ∖{x} | r ∈ R} |R| = |Rx| + |R∖{x}| (explanation on board). B⊆Y∖{x} is shattered by Rx ⇒ B∪{x} is shattered by R. dimvc(S) = δ ⇒ dimvc((Y ∖{x}, Rx)) = δ-1.

Sauer’s Lemma |R| = |Rx| + |R∖{x}| ≤ gδ-1(n-1) + gδ(n-1) = gδ(n). We get that for |Y| = n, |R| ≤ nδ. Including x Not including x by induction

Growth Function Bounds For n ≥ 2δ and δ ≥1 ( 𝑛 δ )δ ≤ gδ(n) ≤ 2( 𝑛𝑒 δ )δ

Shatter Function Let S = (X,R). πs(m) = max|R|B|. B⊆X |B|=m

Shattering Dimension Let S = (X,R). The shattering dimension of S is the smallest d such that πs(m) = O(md).

VC vs. Shattering Dimension Let S = (X,R) with dimvc(S) = δ. B⊆X, |B| ≤ ∞. |R|B| ≤ πs(|B|) ≤ gδ(|B|) That is, the shattering dimension ≤ δ.

VC vs. Shattering Dimension Proof: Let n = |B|. |R|B| ≤ πs(n) (= the maximum for any subset of size n of X) |R|B| ≤ gδ(n) ≤ nδ πs(n) = |R|Bmax| ≤ gδ(n) = O(nδ) ⇒ shattering dimension ≤ δ.

Lemma: VC Dimension Bounds Let S = (X,R) with shattering dimension d. Then dimvc(S) = O(d·log(d)).

Shattering Dimension Example S = (X,R) where X = ℝ2, R = {D | D is a disk in the plane} The shattering dimension of S is 3.

Shattering Dimension Example Proof: Let P = {p1,…., pn} ⊆ ℝ2. F = R|P, we will show |F| ≤ 4n3.

Shattering Dimension Example F contains at most n sets of a single point ({pi}). F contains at most 𝑛 2 sets of two points ({pi, pj}). We still have n + 𝑛 2 = O(n3). Let’s fix Q ∈ F, |Q| ≥ 3.

Shattering Dimension Example

Shattering Dimension Example We can describe Q = P∩D by (p,q,s,xp,xq,xs). p, q and s are the points defining D, and x* ∈ {0,1} states whether the point * is in Q or not ((p,q,s,1,1,0) in our case). So F contains at most 8· 𝑛 3 sets with more than 3 points.

Shattering Dimension Example Similar argumentation implies F contains at most 4· 𝑛 2 sets defined by a pair of points (p,q, xp,xq) realizing the diameter of the disk. |F| ≤ 1 + n + 4· 𝑛 2 + 8· 𝑛 3 ≤ 4n3. p p q q

Corollary This geometric argumentation gives us a powerful tool. The shattering dimension of S = (X,R) where R is a family of shapes ≤ # points that determine a shape in the family.

Corollary Example: S = (ℝ², {D | D is a rectangle in the plane}) shattering dimension of S ≤ (=) 5.

Dual Range Space Let S = (X,R), p ∈ X. Rp = {r | r∈R, the range r contains p}

Dual Range Space X* = {Rp | p ∈ X}. The dual range space to S = (X,R) is S* = (R,X*). Ranges become points and points become ranges.

Dual Range Space Claim: Let S = (X,R), R is a set of shapes whose boundaries can intersect at most s times. The complexity of the arrangement of n shapes is O(sn2).

Dual Range Space Proof: Explanation on board O(2· 𝑛 2 ) = O(n2)

Dual Range Space To maximize |X*|, we need at least one point in every intersection combination of ranges in R. So the number of ranges in X* ≤ the complexity of the arrangement of ranges in R (O(2· 𝑛 2 ) = O(n2) with disks).

Dual Shattering Function Let the dual shattering function of a range space S be π*s(m) = πs*(m) where S* is the dual range space to S.

Dual Shattering Dimension The dual shattering dimension of a range space S = the shattering dimension of S*.

Dual VC Dimension Bounds Let S = (X,R) with dimvc(S) = δ. dimvc(S*) ≤ 2δ+1.

Dual VC Dimension Bounds Proof: Assume S* shatters a set F = {r1,…., rk} ⊆ R. So, ∃ P⊆X of m = 2k points that shatters F. Formally ∀ V⊆F ∃ p∈P, Fp = V. r1 r2

Dual VC Dimension Bounds Consider M a matrix (k x 2k). M[i,j] = 1 ⇔ ri contains pj (0 otherwise). Since P shatters F ∀ e∈{0,1}2k ∃ 1≤j≤ 2k, so that the j-th column in M is e.

Dual VC Dimension Bounds Let k’ = 2[log(k)] ≤ k. Consider M’ a matrix (k’ x log(k’)). The i-th row in M’ is i-1 in binary representation. For every column in M’ exists a column in M (corresponding to a point pt) , identical to it in the top k’ bits.

Dual VC Dimension Bounds Q = {The set of all points pt representing a column in M’}. |Q| = log(k’). ∀ Z⊆Q ∃ rz∈F, rz∩Q = Z (since M and M’ are identical in the relevant log(k’) columns of M’.

Dual VC Dimension Bounds So, F shatters Q ⇒ |Q| ≤ δ (The orginal dimvc(S)). |Q| = log(k’) = [log(k)] ≤ δ ⇒ log(k) ≤ δ+1 ⇒ k ≤ 2δ+1.

Dimensional Bounds Let S = (X,R) with dual shattering dimension d. dimvc(S) ≤ dO(d).

Dimensional Bounds Proof: The shattering dimension of S* is d ⇒ dimvc(S*) ≤ d’. d’ = O(d·log(d)) (by a previous claim). The dual range space to S* is S ⇒ dimvc(S) ≤ 2d’+1 = dO(d).

Mixing Range Spaces Let S = (X,R), T = (X,R’) with dimvc(S) = δ, dimvc(T) = δ’. Let 𝑹 = {r∪r’ | r∈R and r’∈R’}. Then dimvc( 𝑺 ) = O(δ+δ’) where 𝑺 = (X, 𝑹 ).

Mixing Range Spaces Let S1 = (X,R1),…., Sk= (X,Rk) with dimvc(S1) = δ1,…., dimvc(Sk) = δk. Let 𝑓: R1 x .... x Rk → P(X) (𝑓 can be union, intersection….) R’ = {𝑓(r1,….,rk) | r1∈R1,...., rk∈Rk}. T = (X,R’). Then dimvc(T) ≤ O(kδ·log(k)), where δ = maxi (δi).

Mixing Range Spaces Proof: Let Y⊆X a set of size t that is shattered by R’. |R’|Y| ≤ |{(r1,….,rk) | r1∈R1|Y,...., rk∈Rk|Y}| ≤ |R1|Y| · · · ·|Rk|Y| ≤ gδ1(t) · · · ·gδk(t) ≤ (gδ (t))k ≤ (2·( 𝑡𝑒 𝛿 ) 𝛿 ) 𝑘 . (1) |R| ≤ gδ(n) (2) gδ(n) ≤ 2( ne δ )δ (1) (2)

Mixing Range Spaces Since Y is shattered by R’, |R’|Y| = 2t. After a bit of algebra we get t ≤12kδ·ln(6k) = O(kδ·log(k)).

Corollary Any finite sequence of combining range spaces with finite VC Dimension (by intersecting, complementing, or taking their union) results in a range space with a finite VC Dimension.

Motivation (now smarter) Why do we care about finite VC Dimension? It the right condition for an efficient sampling. We can represent the behavior of a big set with a smaller sample.

ε-Sample Let S = (X,R) and x⊆X, |x| < ∞. For 0≤ε≤1, a subset C⊆x is an ε-Sample for x if: ∀ r∈R, | 𝑚 (r) - 𝑠 (r)| ≤ ε. Reminder: 𝑚 (r) = |r∩𝒙| |𝒙| and 𝑠 (r) = |r∩C| |𝐂| . r

ε-Sample Theorem (Vapnik - Chervonenkis) ∃ c≥0 so that for any S= (X,R) with dimvc(S) ≤ δ, x⊆X, |x| < ∞ and ε,φ > 0, a random subset C⊆x where |C| = s = 𝑐 𝜀2 (δlog( δ 𝜀 ) + log( 1 𝜑 )) is an ε-Sample for x with probability at least 1-φ. If s > |x|, then we take C = x.

ε-Net A set N⊆x is an ε-Net for x if ∀r∈R, 𝑚 (r) ≥ ε ⇒ r∩N ≠ ∅.

ε-Net Theorem (Haussler – Welzl) Let S = (X,R) with dimvc(S) = δ. Let x⊆X, |x| < ∞, 0 < ε ≤ 1 and φ < 1. Let N a subset obtained by m random independent draws from x, where m ≥ max( 4 ɛ log( 4 𝜑 ), 8𝛿 ɛ log( 16 ɛ )). Then N is an ε-Net for x with probability at least 1-φ.

To be continued…