On the Complexity of Approximating the VC Dimension Chris Umans, Microsoft Research joint work with Elchanan Mossel, Microsoft Research June 2001.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Low-End Uniform Hardness vs. Randomness Tradeoffs for Arthur-Merlin Games. Ronen Shaltiel, University of Haifa Chris Umans, Caltech.
PRG for Low Degree Polynomials from AG-Codes Gil Cohen Joint work with Amnon Ta-Shma.
Invertible Zero-Error Dispersers and Defective Memory with Stuck-At Errors Ariel Gabizon Ronen Shaltiel.
Linear-Degree Extractors and the Inapproximability of Max Clique and Chromatic Number David Zuckerman University of Texas at Austin.
Average-case Complexity Luca Trevisan UC Berkeley.
A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.
Shortest Vector In A Lattice is NP-Hard to approximate
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Simple extractors for all min- entropies and a new pseudo- random generator Ronen Shaltiel Chris Umans.
Quantum Information and the PCP Theorem Ran Raz Weizmann Institute.
Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.
Lecture 24 MAS 714 Hartmut Klauck
Locally Decodable Codes from Nice Subsets of Finite Fields and Prime Factors of Mersenne Numbers Kiran Kedlaya Sergey Yekhanin MIT Microsoft Research.
Complexity class NP Is the class of languages that can be verified by a polynomial-time algorithm. L = { x in {0,1}* | there exists a certificate y with.
Dana Moshkovitz. Back to NP L  NP iff members have short, efficiently checkable, certificates of membership. Is  satisfiable?  x 1 = truex 11 = true.
Umans Complexity Theory Lectures Lecture 15: Approximation Algorithms and Probabilistically Checkable Proofs (PCPs)
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CS151 Complexity Theory Lecture 6 April 15, 2015.
CS151 Complexity Theory Lecture 7 April 20, 2004.
On Uniform Amplification of Hardness in NP Luca Trevisan STOC 05 Paper Review Present by Hai Xu.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
–Def: A language L is in BPP c,s ( 0  s(n)  c(n)  1,  n  N) if there exists a probabilistic poly-time TM M s.t. : 1.  w  L, Pr[M accepts w]  c(|w|),
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
CS151 Complexity Theory Lecture 10 April 29, 2004.
CS151 Complexity Theory Lecture 13 May 11, CS151 Lecture 132 Outline Natural complete problems for PH and PSPACE proof systems interactive proofs.
CS151 Complexity Theory Lecture 15 May 18, CS151 Lecture 152 Outline IP = PSPACE Arthur-Merlin games –classes MA, AM Optimization, Approximation,
CS151 Complexity Theory Lecture 6 April 15, 2004.
CS151 Complexity Theory Lecture 16 May 25, CS151 Lecture 162 Outline approximation algorithms Probabilistically Checkable Proofs elements of the.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
CS151 Complexity Theory Lecture 9 April 27, 2004.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Simulating independence: new constructions of Condensers, Ramsey Graphs, Dispersers and Extractors Boaz Barak Guy Kindler Ronen Shaltiel Benny Sudakov.
CS151 Complexity Theory Lecture 9 April 27, 2015.
Sub-Constant Error Low Degree Test of Almost-Linear Size Dana Moshkovitz Weizmann Institute Ran Raz Weizmann Institute.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Why Extractors? … Extractors, and the closely related “Dispersers”, exhibit some of the most “random-like” properties of explicitly constructed combinatorial.
CS151 Complexity Theory Lecture 13 May 11, Outline proof systems interactive proofs and their power Arthur-Merlin games.
1 Interactive Proofs proof systems interactive proofs and their power Arthur-Merlin games.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
1 New Coins from old: Computing with unknown bias Elchanan Mossel, U.C. Berkeley
Fall 2013 CMU CS Computational Complexity Lectures 8-9 Randomness, communication, complexity of unique solutions These slides are mostly a resequencing.
Data Stream Algorithms Lower Bounds Graham Cormode
1 How to establish NP-hardness Lemma: If L 1 is NP-hard and L 1 ≤ L 2 then L 2 is NP-hard.
CS151 Complexity Theory Lecture 16 May 20, The outer verifier Theorem: NP  PCP[log n, polylog n] Proof (first steps): –define: Polynomial Constraint.
List Decoding Using the XOR Lemma Luca Trevisan U.C. Berkeley.
Fidelity of a Quantum ARQ Protocol Alexei Ashikhmin Bell Labs  Classical Automatic Repeat Request (ARQ) Protocol  Quantum Automatic Repeat Request (ARQ)
Error-Correcting Codes and Pseudorandom Projections Luca Trevisan U.C. Berkeley.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Complexity 24-1 Complexity Andrei Bulatov Interactive Proofs.
Comparing Notions of Full Derandomization Lance Fortnow NEC Research Institute With thanks to Dieter van Melkebeek.
TU/e Algorithms (2IL15) – Lecture 9 1 NP-Completeness NOT AND OR AND NOT AND.
CS151 Complexity Theory Lecture 15 May 18, Gap producing reductions Main purpose: –r-approximation algorithm for L 2 distinguishes between f(yes)
Pseudorandomness: New Results and Applications Emanuele Viola IAS April 2007.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Information Complexity Lower Bounds
Pseudorandomness when the odds are against you
Background: Lattices and the Learning-with-Errors problem
Computability and Complexity
Lecture 10: Sketching S3: Nearest Neighbor Search
Lecture 24 NP-Complete Problems
Umans Complexity Theory Lectures
The Curve Merger (Dvir & Widgerson, 2008)
Chapter 34: NP-Completeness
Imperfectly Shared Randomness
On Derandomizing Algorithms that Err Extremely Rarely
Presentation transcript:

On the Complexity of Approximating the VC Dimension Chris Umans, Microsoft Research joint work with Elchanan Mossel, Microsoft Research June 2001

The VC Dimension C collection of subsets of universe U VC(C) = VC dimension of C: size of largest subset T  U shattered by C T shattered if every subset T’  T expressible as T  (an element of C) Example: C = {{a}, {a, c}, {a, b, c}, {b, c}, {b}} VC(C) = 2{b, c} shattered by C Plays important role in learning theory, finite automata, comparability theory, computational geometry

Complexity Questions Given C, compute VC(C) since VC(C)  log |C|, can compute in O(n log n ) time (Linial-Mansour-Rivest 88) probably can’t do better: problem is LOGNP-complete (Papadimitriou-Yannakakis 96) Often C has a small implicit representation: C(i, x) is a polynomial-size circuit such that C(i, x) = 1 iff x belongs to set i implicit version is  3 -complete (Schaefer 99) (as hard as  a  b  c  (a, b, c) for CNF formula  )

Approximation Given C (circuit with N inputs), approximate VC(C) approximation within N 1-  NP-hard (Schaefer 99) this paper:  3 -hard to approximate to within 2-  (for any   0) approximable to within 2 in AM AM-hard to approximate to within N  (for some   0) (any   1 if optimal explicit dispersers exist) PSPACE  3  2 AM NP P

Why Interesting first constant approximability threshold for optimization problem in the Polynomial Hierarchy we locate threshold with unusual accuracy:  3 -hard to within 2-  (N -(1/4 -  ) ) (for any   0) approximable in AM to within 2-O(N -1/2 ) main idea in  3 -hardness result: desired reduction is essentially a randomness extraction problem AM-hardness result requires strong amplification of AM using dispersers constant in disperser seed length matters

Outline for Rest of Talk Arthur-Merlin protocol for 2-approximation  3 -hardness Schaefer’s  3 -completeness proof why we need “randomness extraction” dispersers for simple distributions using list- decodable codes AM-hardness conclusions

2-approximation in AM Sauer-Shelah(-Perles) Lemma: Let C be a collection of subsets of [n] such that. Then VC(C)  m+1.  ( ) j=0 m njnj |C|  Mutual input: circuit C(i, x) (= 1 iff x in set i) Merlin sends set of k elements X = {x 0, x 1,..., x k-1 } Arthur replies with a random k-bit string s Merlin sends an index i Accept iff C(i, x j ) = s j for j = 0, 1, 2,..., k-1 VC(C)  k/2 ⇒ VC(C ∩ X)  k/2 ⇒ Pr[C accepted]  1/2

Schaefer’s Reduction  (a, b, c) an instance of QSAT 3 with |a| = |b| = |c| = n Circuit C encodes these sets over universe {0,1} n x [n]: S ( , v, w) = {  } x v if  ( , v, w)=1 n if  b  c  (a, b, c) = 1 then C includes sets: {a} x {a} x  {a} x and so set{a} x is shattered by C ⇒ VC(C)  n

Schaefer’s Reduction Circuit C encodes these sets over universe {0,1} n x [n]: S ( , v, w) = {  } x v if  ( , v, w)=1 n In general, set of form: {a} x VC(C) onesis shattered if VC(C)  n, then {a} x is shattered, which implies  b  c  (a, b, c) = 1 For inapproximability, we want relaxed version of this statement to hold for some  close to ½. if VC(C)   n, then {a} x  n ones is shattered, which implies  b  c  (a, b, c) = 1 ???

A Randomness Extraction Problem we have a distribution X ⊂ {0,1} n with  n entropy we need: this gives us:  (x ∈ X)  c ∧  (a, EXT(x, y), c) ⇔  b  c  (a, b, c) Note: BUT, X is in a special class of distributions... x ∈ X seed EXT n bits O(log n) bits m = (  n)  (1) bits uniform y we only need a disperser we need zero-error ! (need to hit all of {0,1} m )

Generalized Bit-Fixing Sources From what class of distributions is X ? recall a set of form {a} x is shattered ↑ ↑ ↑ ↑ implies C includes following sets: {a} x ??0??0?0???0 {a} x ??0??0?0???1  {a} x ??1 ??1 ?1???1 projecting onto  n red positions, we get uniform distr. “generalized bit-fixing source of dimension  n” ( Kahn-Kalai-Linial 88: no det. extraction if (1 -  )n  Ω(n/log n) )  n ones = {a} x X

Dispersers from Codes x ∈ X seed EXT n bits t = O(log n) bits m = (  n)  (1) bits uniform we need: so that EXT(X, U t ) = {0,1} m, for all generalized bit- fixing sources X of dimension  n binary list-decodable code ECC:{0,1} m → {0,1} n Decode(R, i) gives i th codeword within distance at most (1-  )n from R EXT(x, i) ≝ Decode(x, i) Proof: ∀ z ∈ {0,1} m ∃ x ∈ X s.t. dist(ECC(z), x)  (1-  )n therefore, EXT(x, U t ) hits every z.

Dispersers from Codes x ∈ X seed EXT n bits t = O(log n) bits m = (  n)  (1) bits uniform using Guruswami-Sudan 00: Theorem: for all 1    ¾, exists an explicit zero- error disperser EXT:{0,1} n x {0,1} 2(1-  )log n + O(1) → {0,1} m for generalized bit-fixing sources of dimension  n = n/2 + n  with m = n  (1). ( notice degree is ≈ n 1/2 instead of ≈ n)

(2-  ) Approximation is  3 -Hard  (a, b, c) an instance of QSAT 3 with |a| = |b| = |c| = m Circuit C encodes these sets over universe {0,1} m x [n]: S ( , v, w) = {  } x v if ( ∧  ( , EXT(v,y), w y ) )=1 if  b  c  (a, b, c) = 1, then {a} x shattered ⇒ VC(C)  n if VC(C)  (1/2 +  )n =  n, {a} x  n ones shattered, C contains sets {a} x x ∈ X, for some generalized bit-fixing source X of dimension  n ⇒  (x ∈ X)  c ∧  (a, EXT(x, y), c) ⇔  b  c  (a, b, c) → y y

N  Approximation is AM-hard language L in AM ⇔ exists poly-time computable R L : x ∈ L ⇒ Pr[  z R L (x, y, z) = 1] = 1 x ∉ L ⇒ Pr[  z R L (x, y, z) = 1] ≤ ½ strong amplification using dispersers yields: (   0) x ∉ L ⇒ Pr[  z R L (x, y, z) = 1] ≤ exp(|y|  -|y|) Circuit C encodes sets S (y, z) = y if R L (x, y, z)=1 if x ∈ L, then is shattered ⇒ VC(C) = |y| if x ∉ L, then #sets ≤ exp(|y|  ) ⇒ VC(C) ≤ |y|  size of instance (N) depends on degree of disperser to get gap of N 1- , need near-linear degree disperser

Conclusions fairly complete picture of approximability of VC dimension list-decodable binary codes yield zero-error dispersers for a non-trivial class of distributions Some improvements and generalizations: Ta-Shma-Zuckerman-Safra 01 construct near- linear degree dispersers (available on ECCC) Using q-ary list-decodable codes, and generalization of Sauer’s Lemma, we obtain approximability threshold of q for a generalization of VC dimension (in final version)