On the Complexity of Approximating the VC Dimension Chris Umans, Microsoft Research joint work with Elchanan Mossel, Microsoft Research June 2001
The VC Dimension C collection of subsets of universe U VC(C) = VC dimension of C: size of largest subset T U shattered by C T shattered if every subset T’ T expressible as T (an element of C) Example: C = {{a}, {a, c}, {a, b, c}, {b, c}, {b}} VC(C) = 2{b, c} shattered by C Plays important role in learning theory, finite automata, comparability theory, computational geometry
Complexity Questions Given C, compute VC(C) since VC(C) log |C|, can compute in O(n log n ) time (Linial-Mansour-Rivest 88) probably can’t do better: problem is LOGNP-complete (Papadimitriou-Yannakakis 96) Often C has a small implicit representation: C(i, x) is a polynomial-size circuit such that C(i, x) = 1 iff x belongs to set i implicit version is 3 -complete (Schaefer 99) (as hard as a b c (a, b, c) for CNF formula )
Approximation Given C (circuit with N inputs), approximate VC(C) approximation within N 1- NP-hard (Schaefer 99) this paper: 3 -hard to approximate to within 2- (for any 0) approximable to within 2 in AM AM-hard to approximate to within N (for some 0) (any 1 if optimal explicit dispersers exist) PSPACE 3 2 AM NP P
Why Interesting first constant approximability threshold for optimization problem in the Polynomial Hierarchy we locate threshold with unusual accuracy: 3 -hard to within 2- (N -(1/4 - ) ) (for any 0) approximable in AM to within 2-O(N -1/2 ) main idea in 3 -hardness result: desired reduction is essentially a randomness extraction problem AM-hardness result requires strong amplification of AM using dispersers constant in disperser seed length matters
Outline for Rest of Talk Arthur-Merlin protocol for 2-approximation 3 -hardness Schaefer’s 3 -completeness proof why we need “randomness extraction” dispersers for simple distributions using list- decodable codes AM-hardness conclusions
2-approximation in AM Sauer-Shelah(-Perles) Lemma: Let C be a collection of subsets of [n] such that. Then VC(C) m+1. ( ) j=0 m njnj |C| Mutual input: circuit C(i, x) (= 1 iff x in set i) Merlin sends set of k elements X = {x 0, x 1,..., x k-1 } Arthur replies with a random k-bit string s Merlin sends an index i Accept iff C(i, x j ) = s j for j = 0, 1, 2,..., k-1 VC(C) k/2 ⇒ VC(C ∩ X) k/2 ⇒ Pr[C accepted] 1/2
Schaefer’s Reduction (a, b, c) an instance of QSAT 3 with |a| = |b| = |c| = n Circuit C encodes these sets over universe {0,1} n x [n]: S ( , v, w) = { } x v if ( , v, w)=1 n if b c (a, b, c) = 1 then C includes sets: {a} x {a} x {a} x and so set{a} x is shattered by C ⇒ VC(C) n
Schaefer’s Reduction Circuit C encodes these sets over universe {0,1} n x [n]: S ( , v, w) = { } x v if ( , v, w)=1 n In general, set of form: {a} x VC(C) onesis shattered if VC(C) n, then {a} x is shattered, which implies b c (a, b, c) = 1 For inapproximability, we want relaxed version of this statement to hold for some close to ½. if VC(C) n, then {a} x n ones is shattered, which implies b c (a, b, c) = 1 ???
A Randomness Extraction Problem we have a distribution X ⊂ {0,1} n with n entropy we need: this gives us: (x ∈ X) c ∧ (a, EXT(x, y), c) ⇔ b c (a, b, c) Note: BUT, X is in a special class of distributions... x ∈ X seed EXT n bits O(log n) bits m = ( n) (1) bits uniform y we only need a disperser we need zero-error ! (need to hit all of {0,1} m )
Generalized Bit-Fixing Sources From what class of distributions is X ? recall a set of form {a} x is shattered ↑ ↑ ↑ ↑ implies C includes following sets: {a} x ??0??0?0???0 {a} x ??0??0?0???1 {a} x ??1 ??1 ?1???1 projecting onto n red positions, we get uniform distr. “generalized bit-fixing source of dimension n” ( Kahn-Kalai-Linial 88: no det. extraction if (1 - )n Ω(n/log n) ) n ones = {a} x X
Dispersers from Codes x ∈ X seed EXT n bits t = O(log n) bits m = ( n) (1) bits uniform we need: so that EXT(X, U t ) = {0,1} m, for all generalized bit- fixing sources X of dimension n binary list-decodable code ECC:{0,1} m → {0,1} n Decode(R, i) gives i th codeword within distance at most (1- )n from R EXT(x, i) ≝ Decode(x, i) Proof: ∀ z ∈ {0,1} m ∃ x ∈ X s.t. dist(ECC(z), x) (1- )n therefore, EXT(x, U t ) hits every z.
Dispersers from Codes x ∈ X seed EXT n bits t = O(log n) bits m = ( n) (1) bits uniform using Guruswami-Sudan 00: Theorem: for all 1 ¾, exists an explicit zero- error disperser EXT:{0,1} n x {0,1} 2(1- )log n + O(1) → {0,1} m for generalized bit-fixing sources of dimension n = n/2 + n with m = n (1). ( notice degree is ≈ n 1/2 instead of ≈ n)
(2- ) Approximation is 3 -Hard (a, b, c) an instance of QSAT 3 with |a| = |b| = |c| = m Circuit C encodes these sets over universe {0,1} m x [n]: S ( , v, w) = { } x v if ( ∧ ( , EXT(v,y), w y ) )=1 if b c (a, b, c) = 1, then {a} x shattered ⇒ VC(C) n if VC(C) (1/2 + )n = n, {a} x n ones shattered, C contains sets {a} x x ∈ X, for some generalized bit-fixing source X of dimension n ⇒ (x ∈ X) c ∧ (a, EXT(x, y), c) ⇔ b c (a, b, c) → y y
N Approximation is AM-hard language L in AM ⇔ exists poly-time computable R L : x ∈ L ⇒ Pr[ z R L (x, y, z) = 1] = 1 x ∉ L ⇒ Pr[ z R L (x, y, z) = 1] ≤ ½ strong amplification using dispersers yields: ( 0) x ∉ L ⇒ Pr[ z R L (x, y, z) = 1] ≤ exp(|y| -|y|) Circuit C encodes sets S (y, z) = y if R L (x, y, z)=1 if x ∈ L, then is shattered ⇒ VC(C) = |y| if x ∉ L, then #sets ≤ exp(|y| ) ⇒ VC(C) ≤ |y| size of instance (N) depends on degree of disperser to get gap of N 1- , need near-linear degree disperser
Conclusions fairly complete picture of approximability of VC dimension list-decodable binary codes yield zero-error dispersers for a non-trivial class of distributions Some improvements and generalizations: Ta-Shma-Zuckerman-Safra 01 construct near- linear degree dispersers (available on ECCC) Using q-ary list-decodable codes, and generalization of Sauer’s Lemma, we obtain approximability threshold of q for a generalization of VC dimension (in final version)