Worst case to Average case Reductions for Polynomials

Slides:

Advertisements

Similar presentations

On the degree of symmetric functions on the Boolean cube Joint work with Amir Shpilka.

Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate

On the Density of a Graph and its Blowup Raphael Yuster Joint work with Asaf Shapira.

Simple Affine Extractors using Dimension Expansion. Matt DeVos and Ariel Gabizon.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

Induction Sections 41. and 4.2 of Rosen Fall 2008 CSCE 235 Introduction to Discrete Structures Course web-page: cse.unl.edu/~cse235 Questions:

The Goldreich-Levin Theorem: List-decoding the Hadamard code

Complexity 19-1 Complexity Andrei Bulatov More Probabilistic Algorithms.

Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Probability theory 2011 Convergence concepts in probability theory  Definitions and relations between convergence concepts  Sufficient conditions for.

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

Business Statistics: Communicating with Numbers

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

1. 2 Overview of the Previous Lecture Gap-QS[O(n), ,2|  | -1 ] Gap-QS[O(1), ,2|  | -1 ] QS[O(1),  ] Solvability[O(1),  ] 3-SAT This will imply a.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.

On Constructing Parallel Pseudorandom Generators from One-Way Functions Emanuele Viola Harvard University June 2005.

Lecture #11 Stability of switched system: Arbitrary switching João P. Hespanha University of California at Santa Barbara Hybrid Control and Switched Systems.

Expectation for multivariate distributions. Definition Let X 1, X 2, …, X n denote n jointly distributed random variable with joint density function f(x.

Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.

Polynomials Emanuele Viola Columbia University work partially done at IAS and Harvard University December 2007.

Joint Moments and Joint Characteristic Functions.

Elusive Functions, and Lower Bounds for Arithmetic Circuits Ran Raz Weizmann Institute.

Pseudorandomness: New Results and Applications Emanuele Viola IAS April 2007.

CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.

PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.

1 IAS, Princeton ASCR, Prague. The Problem How to solve it by hand ? Use the polynomial-ring axioms ! associativity, commutativity, distributivity, 0/1-elements.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Tali Kaufman (Bar-Ilan)

Jiaping Wang Department of Mathematical Science 04/22/2013, Monday

Sampling Distributions

Week 5 5. Partial derivatives

Information Complexity Lower Bounds

Availability Availability - A(t)

On the Size of Pairing-based Non-interactive Arguments

3. The X and Y samples are independent of one another.

Chapter 8: Fundamental Sampling Distributions and Data Descriptions:

FIRST ORDER DIFFERENTIAL EQUATIONS

Approximate Inference

Chapter 7 Sampling Distributions.

The Bernoulli distribution

Sum of Squares, Planted Clique, and Pseudo-Calibration

3.1 Expectation Expectation Example

Distinct Distances in the Plane

Pseudorandom bits for polynomials

Additive Combinatorics and its Applications in Theoretical CS

Basis and Dimension Basis Dimension Vector Spaces and Linear Systems

Chapter 7 Sampling Distributions.

Linear sketching with parities

The Curve Merger (Dvir & Widgerson, 2008)

Linear sketching over

Summarizing Data by Statistics

Linear sketching with parities

Chapter 7 Sampling Distributions.

Lattices. Svp & cvp. lll algorithm. application in cryptography

Indistinguishability by adaptive procedures with advice, and lower bounds on hardness amplification proofs Aryeh Grinberg, U. Haifa Ronen.

Support Vector Machines

2.10 Solution by Variation of Parameters Section 2.10 p1.

Chapter 7 Sampling Distributions.

Approximate Inference by Sampling

Shengyu Zhang The Chinese University of Hong Kong

Chapter 8: Fundamental Sampling Distributions and Data Descriptions:

Theorem 9.3 A problem Π is in P if and only if there exist a polynomial p( ) and a constant c such that ICp (x|Π) ≤ c holds for all instances x of Π.

CS151 Complexity Theory Lecture 7 April 23, 2019.

Chapter 7 Sampling Distributions.

Pseudorandomness: New Results and Applications

Presentation transcript:

Worst case to Average case Reductions for Polynomials Shachar Lovett Weizmann Institute / Microsoft Research Joint work with Tali Kaufman

A motivating example Let p(x1,…,xn)=f(x1,…,xn)g(x1,…,xn) f,g generic polynomials of degree d over F2 p(x) is a biased degree 2d polynomial Pr[p(x)=0] ~ 3/4 Reason: p is a biased function of f,g

Another motivating example p(x)=MAJ(f(x),g(x),h(x)) f,g,h generic polynomials of degree d p(x) is unbiased of degree 2d p(x) can be approximated by a lower degree polynomial Pr[p(x)=f(x)] ~ 3/4 Reason: p is a (unbiased) function of f,g,h

Is this generally true? Let p(x1,…,xn) be a degree d polynomial d is constant Assume that: p is biased: Pr[p(x)=0] = ½ + ε or p can be approximated by a lower degree polynomial: Pr[p(x)=f(x)] = ½ + ε Can we deduce a structure theorem for p?

Warm Up – Quadratics Assume p(x) is quadratic Dixon’s Theorem: p(x) = l1(x)l2(x) + l3(x)l4(x) + … + lr-1(x)lr(x) ( + lr+1(x) ) l1,…,lr+1 linear and independent All the non-zero Fourier coefficients of p(x) are 2-r/2 Assume that: Pr[p(x)=0] = ½ + ε (p is biased) Or Pr[p(x)=l(x)] = ½ + ε (p is approximated by linear) p(x) has a Fourier coefficient 2ε  r = O(log(1/ε)) p(x) is a function of O(log(1/ε)) linear functions

Main theorem: general degrees p(x1,…,xn) of degree d Assume that: Pr[p(x)=0] = ½ + ε or Pr[p(x)=f(x)] = ½ + ε ( deg(f) ≤ d-1 ) Then: p(x)=C(f1(x),…,fk(x)) f1,…,fk: polynomials of degree at most d-1 C: any combiner function k depends only on d, ε – independent of n !

Functions of polynomials: Computation vs. Approximation pn(x1,...,xn) - family of degree d polynomials The following models are equivalent: pn can be computed by a constant number of lower degree polynomials: pn can be approximated by a constant number of lower degree polynomials:

Example for applications S4(x1,…,xn) – symmetric polynomial of degree 4 To refute the Inverse Conjecture for the Gowers Norm, needed to prove: For any cubic f(x), Pr[S4(x)=f(x)] ≤ ½ + o(1) Given our theorem, enough to prove: S4 cannot be computed by a constant number of cubics

Bias implies low rank Let p(x1,…,xn) be a degree d polynomial Bias(p) = E[(-1)p(x)] = Pr[p(x)=0]-Pr[p(x)=1] Bias – a measure for the distance of p(x) from uniformity Pr[p(x)=0] = ½ + ε  bias(p) = 2ε Rankd-1(p) = min k s.t. p(x)=C(f(1)(x),…,f(k)(x)) f(1)(x),…,f(k)(x) of degree ≤ d-1 C:F2k  F2 any combiner function Theorem: Bias implies low rank |Bias(p)| ≥ ε  rankd-1(p) ≤ k(d,ε)

bias(p(x)-(a1g(1)(x)+…+akg(k)(x))) ≥ ε 2-O(k) Bias  Approximation “Bias implies low rank” is enough for the general theorem Assume: Pr[p(x)=f(x)] ≥ ½ + ε, deg(p)=d, deg(f) ≤ d-1 Then: Bias(p-f) ≥ 2ε p(x)-f(x) = C(f(1)(x),…,f(k)(x)) deg(f(1)),…,deg(f(k)) ≤ d-1 p(x) = f(x) + C(f(1)(x),…,f(k)(x)) Assume: Pr[p(x)= C(g(1)(x),…,g(k)(x))] ≥ ½ + ε, deg(g(i)) ≤ d-1 Then: There are a1,…,ak  F2 s.t. bias(p(x)-(a1g(1)(x)+…+akg(k)(x))) ≥ ε 2-O(k) p(x)-(a1g(1)(x)+…+akg(k)(x)) = C(f(1)(x),…,f(k’)(x))

Bias implies low rank Recall: Green & Tao prove this when d < |F| p(x1,…,xn) is a degree d polynomial Bias(p) = E[(-1)p(x)] Rankd-1(p) = min k : p(x)=C(f(1)(x),…,f(k)(x)), deg(f(i)) ≤ d-1 We want: |Bias(p)| ≥ ε  rankd-1(p) ≤ k(d,ε) Green & Tao prove this when d < |F| Used to prove the Inverse Conjecture for the Gower Norm in this case However, The ICGN is false when d >> |F| They conjectured that “bias implies low rank” holds even if d >> |F| We prove “bias implies low rank” for all constant degrees Following Green&Tao proof, with one major change Proof by induction on d. d=1: Trivial – any biased linear function is in fact constant

First step: bias amplification Assume: bias(p(x)) ≥ ε We will generate degree d-1 polynomials f(1)(x),…,f(k)(x) s.t. Prx[p(x)=C(f(1)(x),…,f(k)(x))] ≥ 1 -  k=k(, ε) Will use with =2-O(d) Derivatives: pa(x) = p(x+a) – p(x) a  F2n pa(x) of degree d-1 Proof: Fix x, and consider

First step: bias amplification Sampling: There exists a1,…,ak

First step: bias amplification Originally – Lemma of Bogdanov & Viola Used to build PRG for low degree polynomials We will prove: If f(1),…,f(k) are “random enough”, then in fact p(x) = MAJ(f(1)(x),…,f(k)(x)) for all x  F2n Otherwise we “make them random enough” Derivatives f(1),…,f(k) of degree ≤ d-1  use “bias implies low rank” inductively You can think of this as a generalization of: q(x) of degree d-1, Pr[p(x)=q(x)] > 1-2-d  p=q

Partitioning the space f(1),…,f(k) partition the space F2n into 2k “equal” regions

Partitioning the space f(1),…,f(k) partition the space F2n into 2k “equal” regions

Partitioning the space f(1),…,f(k) partition the space F2n into 2k “equal” regions

Partitioning the space F(x) F assigns a value to each region p is equal to F almost everywhere  On most regions, p is almost constant (and equal to F) Good areas: p(x)=F(x) Bad areas: p(x)≠F(x)

Partitioning the space Good areas: p(x)=F(x) Bad areas: p(x)≠F(x)

Good & bad regions Good regions: bad areas are very small ( 2-O(d) ) Good areas: p(x)=F(x) Bad areas: p(x)≠F(x) Good regions: bad areas are very small ( 2-O(d) ) Almost all regions are good ( 1 – 2-O(d) )

Proof strategy The proof has two steps: Good areas: p(x)=F(x) Bad areas: p(x)≠F(x) The proof has two steps: Good regions are excellent: p=F on all points in region (i.e. p(x) is constant on good regions) Assuming almost all regions are excellent, we will prove all regions are excellent (i.e. p(x) is constant on all regions)

Proof: Step 1 We will prove: p(x) is constant on good regions Let R be a good region p(x) = F(R) (=const) for almost all x  R Let x0  R be arbitrary We will prove: p(x0) = F(R) We will use: p is a low degree polynomial Regions defined by lower degree polynomials Induction on “bias implies low rank”

The derivatives identity p(x) of degree d Derivatives reduce degree: py(x) = p(x+y)-p(x) of degree d-1 py_1,…,y_{d+1}(x)  0 Thus we have the identity:

Using the derivatives identity Let R be a good region Take arbitrary x0  R Assume there are y1,…,yd+1 s.t. x0 + iS yi are in the “good part” of R, for all non-empty S Then p(x0 + iS yi) = F(R) for all non-empty S Then also: p(x0) = F(R) !

Using the derivatives identity Good areas: p(x)=F(x) Bad areas: p(x)≠F(x) x0 x0+y1 x0+y2 x0+y1+y2

Getting all the points in R We need to show: y1,…,yd+1: x0 +iS yi R In fact, we need them to be in the “good part” of R We will handle this later Solution: choose y1,…,yd+1 randomly and uniformly show the required condition occurs with positive probability Each separate variable x0 +iS yi is uniform in F2n Problem: handling the dependencies

{ f(j) (x0+iS yi)=cj }j=1..k, non-empty S  [d+1] Structure of regions We need: x0+iS yi R (for all non-empty S) Recall: regions defined by polynomials f(1),f(2),…,f(k) R = {x  F2n: f(1)(x)=c1, f(2)(x)=c2,…} (c1,c2,…  F2) So, we actually need: { f(j) (x0+iS yi)=cj }j=1..k, non-empty S  [d+1] We need to find “randomness” conditions on f(j) s.t. all the events are “almost independent” Thus all occur simultaneously with positive probability

Randomness conditions (1st attempt) We need: for any x0, if y1,…,yd+1 are uniform, then the set of random variables {f(j) (x0+iS yi) : j=1..k, S  [d+1], S non-empty} are almost independent Actually, this can never be true  Reason: f(j) are polynomial of degree ≤ d-1 d derivations zeros f(j) Random variables are linearly dependent These dependencies can be handled

Randomness conditions (2nd attempt) We need: for any x0, if y1,…,yd+1 are uniform, then the set of random variables {f(j) (x0+iS yi) : j=1..k, S  [d+1], S non-empty, |S| ≤ deg(f(j))} are almost independent It turns out its enough to prove this for random x Proof: Cauchy-Schartz Even for random x, this can still be false Reason: non-linear dependencies

Non-linear dependencies Example 1: f(1) decomposes: f(1)(x) = g(x) h(x) f(1) is biased  f(1)(x) is not uniform Example 2: A derivative of f(1) decomposes f(1)y_1,y_2 (x) = f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2) = g(x,y1,y2)h(x,y1,y2) f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2) is biased {f(1)(x), f(1)(x+y1), f(1)(x+y2), f(1)(x+y1+y2)} is not uniform

Solving non-linear dependencies Example 1: f(1)(x) is ε-far from uniform Bias(f(1)(x))  ε Degree of f(1) ≤ d-1 We can use induction on “bias implies low rank” Decompose f(1) into a constant number of polynomials

Solving non-linear dependencies f(1)(x) = G(g(1)(x),…,g(t)(x)) deg(g(i)) ≤ deg(f(1)) – 1 f(1),…,f(k) were used to approximate p(x) Pr[MAJ(f(1)(x),…,f(k)(x)) = p(x)]  1 – 2-O(d) Replace f(1) by g(1),…,g(t) Pr[ MAJ( G(g(1)(x),…,g(t)(x)) ,f(2)(x)…,f(k)(x)) = p(x)]  1 – 2-O(d) Replace by a single combiner function C1: Pr[ C1( g(1)(x),…,g(t)(x) ,f(2)(x)…,f(k)(x) ) = p(x)]  1 – 2-O(d) Got a set of “smaller degree” polynomials approximating p

Solving non-linear dependencies Example 2: f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2) is ε-far from uniform Bias(f(1)(x)+f(1)(x+y1)+f(1)(x+y2)+f(1)(x+y1+y2))  ε f(1)(x)-f(1)(x+y1)-f(1)(x+y2)+f(1)(x+y1+y2) is a polynomial of degree ≤ d-1 (in variables x,y1,y2) Again, we can use induction Here we deviate from the original Green & Tao proof In large fields, it is enough to consider just Example 1

Solving non-linear dependencies General solution: If {f(j) (x+iS yi)} is non-uniform Find a biased linear combination Decompose it In each step, we replace a polynomial with a constant number of smaller degree polynomials We choose what is “non-uniform” adaptively: If we have T polynomials, we need bias ≤ 2-O(T) of all linear combinations to be close to uniform  Required bias is a function of T Still, the process stops after finitely many steps We end with a constant number of polynomials

Getting back to the big picture Good areas: p(x)=F(x) Bad areas: p(x)≠F(x) The proof has two steps: Good regions are excellent: p=F on all points in region (i.e. p(x) is constant on good regions) Assuming almost all regions are excellent, we will prove all regions are excellent (i.e. p(x) is constant on all regions)

Proof of step 1 Let x0 be in a good region R Using the “randomness” of f(1),…,f(k) {x0+iS yi  R} are almost independent Joint event occurs with positive probability In fact, x0+iS yi are almost pairwise-independent, even given that they all lie in R Since R is good, with positive probability, they all lie in the “good part” of R Proof: union bound

Proof of step 2 Assume almost all regions are excellent i.e. p is constant on these regions Let x0,x1 be in a (bad) region R We need to show: p(x0)=p(x1) Consider x0+iS yi, x1+iS yi Assume that for any non-empty S: Region(x0+iS yi) = Region(x1+iS yi) Assume also this is an excellent region Then p(x0+iS yi) = p(x1+iS yi) for all non-empty S  p(x0)=p(x1) For random y1,…,yd+1, this happens with positive probability

Proof of step 2 Good areas: p(x)=F(x) Bad areas: p(x)≠F(x) x0+y1 x1+y1 x0+y1+y2 x1+y1+y2 x0 x1 x0+y2 x1+y2

Summary of results Bias implies low rank If: p(x1,…,xn) degree d polynomial over F The distribution of p(x) is ε-far from uniform Then: p(x)=C(f(1)(x),…,f(k)(x)) deg(f(i)) ≤ d-1, k=k(F,d,ε) In fact: f(j) are derivatives of p f(j)(x) = pa_j(x) = p(x+aj)-p(x)

Summary of results Approximation and computation are equivalent If: Pr[p(x) = C(g(1)(x),…,g(k)(x))]  1/F + ε deg(g(i)) ≤ d-1 Then: p(x) = C’(f(1)(x),…,f(k’)(x)) deg(f(i)) ≤ d-1, k’=k’(F,d,ε,k) In fact: f(j)(x)=pa_j(x) or f(j)=g(j)(x+aj)

Open problems Give a good bound on the constants Even in the case of d<<|F| Given p, can we compute its rank? Assume p(x)=f1(x) f2(x)+ f3(x) f4(x) Can we find f1,…,f4 efficiently? Or is it NP-hard? Generalization to other “constant-depth” models E.g. constant-depth circuits