Presentation is loading. Please wait.

Presentation is loading. Please wait.

Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan.

Similar presentations


Presentation on theme: "Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan."— Presentation transcript:

1 Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan (MSR / Hebrew University) Kunal Talwar (Google) Avi Wigderson (IAS) ITCS 2016

2 The star of our show: f: {0,1} n  {0,1}

3 “Complexity” and “Boolean functions”  Certificate complexity  Decision tree depth (deterministic, randomized, quantum)  Sensitivity  Block sensitivity  PRAM complexity  Real Polynomial degree (exact, approximate)  … Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is” All lie in {0,1,…,n} for n-variable Boolean functions.

4 “Complexity” and “Boolean functions” revisited  Unrestricted circuit size  Unrestricted formula size  AC 0 circuit size  DNF size  … Complexity classes: computational ways to “get a handle on how complicated f is” All lie in {0,1,…,2 n } for n-variable Boolean functions.

5 High-level summary of this work: A computational perspective on a classic open question about complexity measures of Boolean functions…....namely, the sensitivity conjecture of [NisanSzegedy92].

6 Background: Complexity measures  Certificate complexity  Decision tree depth (deterministic, randomized, quantum)  Block sensitivity  PRAM complexity  Real Polynomial degree (exact, approximate) Fundamental result(s) in Boolean function complexity: For any Boolean function f, the above complexity measures are all polynomially related to each other.

7 Examples: DT-depth and real degree DT-depth(f) = minimum depth of any decision tree computing f deg R (f) = degree of the unique real multilinear polynomial computing f: {0,1} n  {0,1} 01011 1 01 DT-depth is 4

8 DT-depth and real degree are polynomially related _ _ 01011 1 01 (Lower bound is trivial: for each 1-leaf at depth d, have degree-d polynomial outputing 1 iff its input reaches that leaf, else 0. Sum these.) Polynomial for this leaf: x 1 x 2 (1-x 4 )x 3 Theorem: [NisanSzegedy92,NisanSmolensky,Midrijanis04] For any Boolean function f, deg R (f) < DT-depth(f) < 2deg R (f) 3.

9 An outlier among complexity measures: Sensitivity s(f,x) = sensitivity of f at x = number of neighbors y of x such that f(x) = f(y) s(f) = (max) sensitivity of f = max of s(f,x) over all x in {0,1} n Folklore: s(f) < DT-depth(f). _ _ Question: [Nisan91,NisanSzegedy92] Is DT-depth(f) < poly(s(f))? 01011 1 01 /

10 The sensitivity conjecture _ Conjecture: DT-depth(f) < poly(s(f)). Despite much effort, best known upper bounds are exponential in sensitivity: [Simon82]: # relevant variables < s(f)4 s(f) [KenyonKutin04]: bs(f) < e s(f) [AmbainisBavarianGaoMaoSunZuo14]: bs(f),C(f) < s(f)2 s(f)-1, deg(f) < 2 s(f)(1+o(1)) [AmbainisPrusisVihrovs15]: bs(f) < (s(f)-1/3)2 s(f)-1 Equivalently, block sensitivity, certificate complexity, real degree, approximate degree, randomized/quantum DT-depth… _ _ _ _ _

11 This work: Computational view on the sensitivity conjecture Previous approaches: combinatorial/analytic But conjecture also is a strong computational statement: low-sensitivity functions are very easy to compute! _ Conjecture: DT-depth(f) < poly(s(f)).

12 Prior to this work, even seems not to have been known. In fact, even an upper bound on the # of sensitivity-s functions seems not to have been known. Conjecture implies _ Conjecture: DT-depth(f) < poly(s(f)).

13 Results Theorem: Every n-variable sensitivity-s function is computed by a Boolean circuit of size n O(s). In fact, every such function is computed by a Boolean formula of depth O(s log(n)). So now the picture is ? ? ? ?

14 Results (continued) Theorem: Any n-variable sensitivity-s function can be self- corrected from 2 -cs -fraction of worst-case errors using n O(s) queries and runtime. (Conjecture  low-sensitivity f has low deg R  has low deg 2  has self-corrector) All results are fairly easy. (Lots of directions for future work!) Circuit/formula size bounds are consequences of the conjecture. Another consequence of the conjecture:

15 Simple but crucial insight Fact: If f has sensitivity s, then f(x) is completely determined once you know f’s value on 2s+1 neighbors of x. ……………..… x x neighbors where f=0 neighbors where f=1 2s+1 neighbors Either have at least s+1 many 0-neighbors or at least s+1 many 1-neighbors. The value of f(x) must equal this majority value! (If it disagreed, would have s(f) > s(f,x) > s+1.) f(x)=1 __ f(x)=?

16 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s ? weight level 2s+1; each point here has 2s+1 down-neighbors

17 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s weight level 2s+1; each point here has 2s+1 down-neighbors

18 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s ? weight level 2s+1; each point here has 2s+1 down-neighbors

19 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s weight level 2s+1; each point here has 2s+1 down-neighbors

20 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s weight level 2s+1; each point here has 2s+1 down-neighbors ?

21 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s weight level 2s+1; each point here has 2s+1 down-neighbors

22 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight level 2s+1; each point here has 2s+1 down-neighbors etc; becomes weight levels 0,…,2s

23 Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 0n0n 1n1n weight levels 0,…,2s+1 weight level 2s+1; each point here has 2s+1 down-neighbors etc; becomes 0n0n 1n1n weight levels 0,…,2s Fill in all of {0,1} n this way, level by level.

24 Corollary: There are at most 2 {n choose <2s} sensitivity-s functions over {0,1} n. _ Can we use this insight to compute sensitivity-s functions efficiently?

25 Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn 2s+1 ). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0 n. Algorithm: For |x| stages, Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n 2s new values to compute; each one easy using majority vote) x 0 n = first center next center Compute at most n 2s new values

26 Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn 2s+1 ). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0 n. Algorithm: For |x| stages, Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n 2s new values to compute; each one easy using majority vote) x next center Compute at most n 2s new values

27 Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn 2s+1 ). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0 n. Algorithm: For |x| stages, Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n 2s new values to compute; each one easy using majority vote) x next center Compute at most n 2s new values

28 Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn 2s+1 ). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0 n. Algorithm: For |x| stages, Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n 2s new values to compute; each one easy using majority vote) x

29 Shallow circuits for sensitivity-s functions? The algorithm we just saw seems inherently sequential – takes n stages. Can we parallelize? Yes, by being bolder: go n/s levels at each stage rather than one.

30 Extension of earlier key insight Sensitivity-s functions are noise-stable at every input x. Pick any vertex x. Flip n/(11s) random coordinates to get y. View t-th coordinate flipped as chosen from ‘untouched’ n-t+1 coordinates. At each stage, at most s coordinates are sensitive. Get Pr[f(x) = f(y)] < Pr[stage 1 flips f] + Pr[stage 2 flips f] + … < s/n + s/(n-1) + … + s/(n – n/11s + 1) < 1/10. _ _ _ /

31 Downward walks Similar statement holds for “random downward walks.” Pick any vertex x with |x| many ones. Flip |x|/(11s) randomly chosen 1’s to 0’s to get y. View t-th coordinate flipped as chosen from ‘untouched’ |x|-t+1 coords. Get Pr[f(x) = f(y)] < s/|x| + s/(|x|-1) + … + s/(|x| – |x|/11s + 1) < 1/10. _ _ /

32 Shallow circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a formula of depth O(s log n). Algorithm has value of f on bottom 11s layers “hard-coded” in. Parallel-Alg: Given x If |x| < 11s, return hard-coded value. Sample C=O(1) points x 1, x 2, x C from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one. Return majority vote of the C results. x …….. x1x1 x2x2 xCxC 0n0n weight levels 0,…,11s _

33 Algorithm has value of f on bottom 10s layers “hard-coded” in. Parallel-Alg: Given x If |x| < 11s, return hard-coded value. Sample C=O(1) points x 1, x 2, x C from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one. Return majority vote of the C results. x ….... x1x1 x2x2 xCxC 0n0n Have Parallel-Alg(x) = f(x) with probability 19/20 for all x (proof: induction on |x|) After O(s log n) stages, bottoms out in “red zone”, so parallel runtime is O(s log n) C=O(1), so total work is C O(s log n) = n O(s) weight levels 0,…,11s _

34 Conclusion / Questions Many questions remain about computational properties of low-sensitivity functions. We saw there are at most 2 {n choose <2s} many sensitivity-s functions. Can this bound be sharpened? We saw every sensitivity-s function has a formula of depth O(s log n). Does every such function have a TC 0 circuit / AC 0 circuit / DNF / decision tree of size n poly(s) ? PTF of degree poly(s)? DNF of width poly(s)? GF(2) polynomial of degree poly(s)?

35 A closing puzzle/request We saw sensitivity-s functions obey a “majority rule” (MAJ of any 2s+1 neighbors). Well-known that degree-d functions obey a “parity rule” (PAR over any (d+1)- dim subcube must = 0). If the conjecture is true, then low-sensitivity functions have low degree… …and these two very different-looking rules must coincide! Explain this!

36 Thank you for your attention 36


Download ppt "Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan."

Similar presentations


Ads by Google