1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.

1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden

2 Standard Approach to Hardness of Approximation 8a 2 A, b 2 B, f(a) is “far” from f(b). Given x 2 A [ B, decide if x 2 A. Hardness of approximation for f: X n ! Y Hardness of a decision “promise problem” A B XnXn “Promise problem”:

3 The “Election Problem” input: a sequence x of n votes to k parties 7/18 4/183/182/18 1/18 (n = 18, k = 6) Want to get s.t. || -  x || < . How big a poll should we conduct? Vote Distribution  x 8 S µ [k], easy to decide between: A = { x |  x (S) ¸ ½ +  } and B = { x |  x (S) · ½ -  }. Hardness due to the abundance of such decision problems ! poll has to be of size  (k).

4 Similarity Hardness vs. Abundance Hardness In this talk: A lower bound technique that captures both types of hardness in the context of sampling algorithms. Hardness of approximation for f: X n ! Y Hardness of a decision “promise problem” Abundance of decision “promise problems” Similarity hardness Abundance hardness

5 Input Data Set Why Sampling? Algorithm A small number of queries Queries can be chosen randomly Output is typically approximate Sub-linear time & space

6 Some Examples Statistics Statistical decision and estimation Statistical learning … CS PAC and machine learning Property testing Sub-linear time approximation algorithms Extractors and dispersers …

7 Query Complexity Query complexity of a function f: # of queries required to approximate f Examples: High query complexity: –Parity –# of distinct elements Low query complexity: –Mean in [0,1] –Median

8 Our Main Result A technique for obtaining lower bounds on the query complexity of approximating functions –Template for obtaining specific lower bounds Arbitrary domain and range All types of approximation Usable for wide classes of functions with symmetry properties –Outperforms previous techniques for functions with “abundance hardness” –Matches previous techniques for functions with “similarity hardness”

9 Previous Work Statistics –Crámer-Rao inequality –VC dimension –Optimality of the sequential probability ratio test CS –Lower bounds via the Hellinger distance [B., Kumar, Sivakumar 01] –Specific lower bounds [Canetti, Even, Goldreich 95], [Radhakrishnan, Ta-Shma 96], [Dagum, Karp, Luby, Ross 95], [Schulman, Vazirani 99], [Charikar, Chaudhuri, Motwani, Narasayya 00] None addresses abundance hardness!

10 Reduction from a Binary Promise Problem Y f(a) f(b) pairwise f(c) “disjoint inputs” Binary promise problem: Given x 2 { a, b }, decide whether x = a or x = b f: X n ! Y Multi-Way, c or x = c Multi-way Can be solved by any sampling algorithm approximating f

11 Main Result The lower bound “recipe” f: X n ! Y: a function with an appropriate symmetry property 1.Identify a set S = { x 1,…,x m } of “pairwise disjoint” inputs. 2.Calculate the “dissimilarity” D(x 1,…,x m ) among x 1,…,x m. (D(¢,…,¢) is a distance measure taking values in [0,log m]). Theorem: Any algorithm approximating f requires q queries, where Tradeoff between “similarity hardness” and “abundance hardness”

12 Measure of Dissimilarity i :distribution of the value of a uniformly chosen entry of x i Then: Jensen-Shannon divergence  2 1 m

13 Application I : The Election Problem Previous bounds on the query complexity:  (1/  2 ) [BKS01]  (k) [Batu et al. 00] O(k/  2 ) [BKS01] Theorem [This paper]  (k/  2 )

14 Combinatorial Designs t-design: Proposition For all k and for all t ¸ 12, there exists a t-design of size m = 2  (k). B1B1 B2B2 B3B3 [k]

15 Proof of the Lower Bound Step 1: Identification of a set S of pairwise disjoint inputs: B 1,…,B m µ [k]: a t-design of size m = 2  (k). S = { x 1,…,x m }, where BiBi [k]nB i Step 2: Dissimilarity calculation: D(x 1,…,x m ) = O(  2 ). By main theorem, # of queries is at least  (k/  2 ).

16 Application II : Low Rank Matrix Approximation Exact low rank approximation: Given an m £ n real matrix M and k · m,n, find the m £ n matrix M k of rank k for which ||M – M k || F is minimized. Solution: SVD. Requires querying all of M. Approximate low rank approximation (LRM k ): Get a rank k martix A, s.t. ||M – A|| F · ||M – M k || F +  ||M|| F. Theorem [This paper] Computing LRM k requires  (m + n) queries.

17 Proof of the Lower Bound 0 2k 0 0 0 0 BiBi M i is all-zero, except for the diagonal, which is the characteristic vector of B i. M i is of rank k  (M i ) k = M i. ||M i || F = k 1/2. ||M i – M j || F ¸ (|B i n B j |) 1/2 ¸ (k/12) 1/2 ¸  (||M i || F + ||M j || F ). Step 1: Identification of a set S of pairwise disjoint inputs: B 1,…,B t µ [2k]: a combinatorial design of size t = 2  (k). S = { M 1,…,M t }, where Step 2: Dissimilarity calculation: D(M 1,…,M t ) = 2k/m. By main theorem, # of queries is at least  (m).

18 Low Rank Matrix Approximation (cont.) Theorem [Frieze, Kannan, Vempala 98] By querying an s £ s submatrix of M chosen using any distributions which “approximate” the row and column weight distributions of M, one can solve LRM k with s = O(k 4 /  3 ). Theorem [This paper] Solving LRM k by querying an s £ s submatrix of M chosen even according to the exact row and column weight distributions of M requires s =  (k/  2 ).

19 Oblivious Sampling Query positions are independent of the given input. Algorithm has a fixed query distribution  on [n] q. i.i.d. queries: queries are independent and identically distributed:  = q, where is a distribution on [n]. Phase 1: Choose query positions i 1,…,i q Phase 2: Query x i1,…,x iq

20 Main Theorem: Outline of the Proof Adaptive sampling Oblivious sampling with i.i.d queries Statistical classification Lower bounds via information theory (For functions with symmetry properties)

21 Statistical Classification Black Box 1 Classifier q i.i.d. samples m 2 i 2 [m] 1,…, m are distributions on Z. Classifier is required to be correct with probability ¸ 1 - .

22 From Sampling to Classification T : oblivious algorithm with query distribution  = q that approximates f: X n ! Y. x : joint distribution of a query and its answer when T runs on input x (distribution on [n] £ X ). S = {x 1,…,x m } : set of pairwise disjoint inputs. Black Box x1 T q i.i.d. samples xm x2 Decide i iff T’s output 2 A(x i )

23  Jensen-Shannon Divergence [Lin 91] KL divergence between distributions , on Z: Jensen-Shannon divergence among distributions 1,…, m on Z: (  = (1/m)  i i ) 7 6 5 4 3 2 1 8

24 Main Result Theorem [Classification lower bound] Any  -error classifier for 1,…, m requires q queries, where Corollary [Query complexity lower bound] For any oblivious algorithm with query distribution  = q that ( ,  )-approximates f, and for any set S = {x 1,…,x m } of “pairwise disjoint” inputs, the number of queries q is at least

25 Outline of the Proof Lemma 1 [Classification error lower bound] Proof: by Fano’s inequality. Lemma 2 [Decomposition of Jensen-Shannon] Proof: By subadditivity of entropy and conditional independence.

26 Conclusions General lower bound technique for the query complexity –Template for obtaining specific bounds –Works for wide classes of functions –Captures both “similarity hardness” and “abundance hardness” Applications –The “Election Problem” –Low rank matrix approximation –Matrix reconstruction Also proved –A lower bound technique for the expected query complexity –Tightly captures similarity hardness but not abundance hardness Open problems –Tight bounds for low rank matrix approximation –Better lower bounds on the expected query complexity –Lower bounds for non-symmetric functions

27 Simulation of Adaptive Sampling by Oblivious Sampling Definition f: X n ! Y is symmetric, if 8x and 8  2 S n, f(  (x)) = f(x). f is  -symmetric, if 8 x 8 , A (  (x)) = A (x). Lemma [BKS01] Any q-query algorithm approximating an  -symmetric f can be simulated by a q-query oblivious algorithm whose queries are uniform without replacement. Corollary If q < n/2, can be simulated by a 2q-query oblivious algorithm whose queries are uniform with replacement.

28 Simulation Lemma: Outline of the Proof T: q-query sampling algorithm approximating f WLOG, T never queries the same location twice. Simulation: Pick a random permutation . Run T on  (x). By  -symmetry, output is likely to be in A  (  (x)) = A(x). Queries to x are uniform without replacement.

29 Extensions Definitions f is (g,  )-symmetric if 8 x, 8 , 8 y 2 A (  (x)), g( ,y) 2 A (x). A function f on m £ n matrices is  -row-symmetric, if for all matrices M, and for all row-permutation matrices , A (  ¢ M) = A(M). Similarly:  -column-symmetry, and (g,  )-row- and column-symmetry. We prove: similar simulations hold for all of the above.

1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.

Similar presentations

Presentation on theme: "1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.

Similar presentations

Presentation on theme: "1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden."— Presentation transcript:

Similar presentations

About project

Feedback