Tight Fourier Tails for AC0 Circuits Avishay Tal (IAS) CCC ’2017
Bounded Depth Circuits A C 0 (𝑚,𝑑): 𝑛 variables 𝑚 gates (size of the circuit) depth 𝑑 alternating gates A C 0 ≔A C 0 𝑝𝑜𝑙𝑦 𝑛 ,𝑂 1
Brief History Parity 𝑥 1 , …, 𝑥 𝑛 = 𝑥 1 + 𝑥 2 +…+ 𝑥 𝑛 (𝑚𝑜𝑑 2) [Ajtai’83, Furst-Saxe-Sipser’84, Yao’85]: Parity is not in AC0 [Håstad ’86]: any depth-𝑑 circuit computing parity is of size at least exp 𝑛 1/(𝑑−1) . Result is tight: there exists a circuit of size exp 𝑛 1/(𝑑−1) and depth 𝑑 computing Parity Challenge: Give an explicit function with better lower bounds. Really good lower bounds will imply lower bounds for NC1 & log-space.
Brief History [Linial-Mansour-Nisan’89]: Bounded depth circuits are well-approximated in L2 by low degree polynomials. Theorem: Let 𝑓∈A C 0 (𝑚,𝑑). Then, ∃𝑝 of deg p =𝑂 log 𝑚/𝜀 𝑑 s.t. 𝐄 𝑥 𝑝 𝑥 −𝑓 𝑥 2 ≤𝜀 [Håstad ’12]: any 𝑓∈A C 0 (𝑚,𝑑) may agree with Parity on at most 1 2 + exp(−𝑛/ log (𝑚) 𝑑−1 ) of the inputs. [Imagaliazzo-Matthews-Paturi’12]: … 1 2 +exp(−𝑛/ log (𝑚/𝑛) 𝑑−1 ) [Håstad ’12] and [IMP’12] results are tight!
Discrete Fourier Analysis 101 For functions 𝑓,𝑔: −1,1 𝑛 →ℝ define inner-product as 𝑓,𝑔 = 𝑬 𝑥 [𝑓 𝑥 ⋅𝑔(𝑥)] The characters 𝜒 𝑆 𝑥 = 𝑖∈𝑆 𝑥 𝑖 for 𝑆⊆[𝑛] form an orthonormal basis. Hence, any function 𝑓: −1,1 𝑛 →ℝ has a unique expansion 𝑓(𝑥) = 𝑆⊆[𝑛] 𝑓 𝑆 ⋅ 𝑖∈𝑆 𝑥 𝑖 called the Fourier expansion. The Fourier coefficients 𝑓 (𝑆) are real numbers given by 𝑓 𝑆 = 𝑓, 𝜒 𝑆 = 𝐄 𝑥 𝑓 𝑥 ⋅ 𝑖∈𝑆 𝑥 𝑖 Plancherel’s Identity: 𝐄 𝑥 𝑓 𝑥 ⋅𝑔(𝑥) = 𝑓,𝑔 = 𝑆 𝑓 𝑆 ⋅ 𝑔 (𝑆) Parseval’s Identity: 𝐄 𝑥 𝑓 𝑥 2 = 𝑓,𝑓 = 𝑆 𝑓 𝑆 2 If 𝑓 is Boolean, i.e., 𝑓: −1,1 𝑛 →{−1,1}, then 𝑆 𝑓 𝑆 2 =1 Example: Majority MAJ(x_1, x_2, x_3) = ½ x1 + ½ x2 + ½ x3 – ½ x1x2x3
Discrete Fourier Analysis 101 The Fourier transform of a Boolean function 𝑓 naturally defines a distribution 𝐷 𝑓 over sets 𝑆⊆[𝑛]: Denote by 𝐖 𝑘 𝑓 = 𝐏𝐫 𝑆∼ 𝐷 𝑓 [|𝑆|=𝑘] = 𝑆:|𝑆|=𝑘 𝑓 𝑆 2 Denote by 𝐖 ≥𝑘 𝑓 = 𝐏𝐫 𝑆∼ 𝐷 𝑓 [ 𝑆 ≥𝑘] = 𝑆: 𝑆 ≥𝑘 𝑓 𝑆 2 The probability to sample 𝑆 from 𝐷 𝑓 equals 𝑓 𝑆 2 .
Tails and Low-Degree Approximation Equivalence Let 𝑓: −1,1 𝑛 →ℝ. The truncated Fourier expansion of 𝑓 at level 𝑘 is a degree 𝑘 polynomial defined by 𝑓 ≤𝑘 𝑥 = 𝑆: 𝑆 ≤𝑘 𝑓 𝑆 ⋅ 𝑖∈𝑆 𝑥 𝑖 By Parseval: 𝐄 𝑥 𝑓 𝑥 − 𝑓 ≤𝑘 𝑥 2 = 𝑾 >𝑘 [𝑓]. By Parseval: this is the best L2-approx. of 𝑓 among degree 𝑘 polys. 𝑓 has a degree-𝑘 L2-approximation with error 𝜀 iff 𝑾 >𝑘 𝑓 ≤𝜀
𝐖 𝑘 𝑃𝑎𝑟𝑖𝑡𝑦 𝐖 𝑘 𝑓
Comparison of Results in Fourier language W 𝑘 𝑓 LMN’89 exp − 𝑘 1/𝑑 decay Boppana’97 Our Result 1/𝑘 decay Håstad’01 Lower Bound exp −𝑘 decay Håstad’12 IMP’12 𝑘 log 𝑚 𝑑−1 log 𝑚 𝑑 𝑛
Comparison of Results in Polynomial Language If 𝑓 can be computed by a circuit with size 𝑚 and depth 𝑑, then 𝑓 can be 𝜀-approximated in L2 by polynomials of degree: LMN’89 𝑂(log 𝑚/𝜀 𝑑 ) Boppana’97 𝑂(log 𝑚 𝑑−1 /𝜀) Håstad’01 𝑂(log 𝑚/𝜀 𝑑−2 ⋅ log (𝑚) ⋅ log (1/𝜀) ) This Work 𝑂(log 𝑚 𝑑−1 ⋅ log (1/𝜀) )
Main Theorem A significant improvement for 𝜀≪ 1 poly(𝑚) . If 𝑓 can be computed by a circuit of size 𝑚 and depth 𝑑, then ∀𝑘: 𝑾 ≥𝑘 𝑓 ≤ exp −𝑘/ log (𝑚) 𝑑−1 . Alternatively, 𝑓 can be 𝜀-approximated in L2 by a polynomial of degree 𝑂 log 𝑚 𝑑−1 ⋅ log 1/𝜀 . 𝑾 𝑘 𝑓 A significant improvement for 𝜀≪ 1 poly(𝑚) . Tight (for any 𝑚≫𝑛)
Applications to Pseudo-randomness F PRG A distribution 𝐷 over ±1 𝑛 is pseudorandom for crkts of class 𝐶 if ∀𝑓∈𝐶: 𝐄 𝑥~𝐷 𝑓 𝑥 ≈ 𝜀 𝐄 𝑥∼𝑈 [𝑓 𝑥 ] A pseudo-random generator (PRG) for 𝐶 is a function PRG: −1,1 𝑠 → −1,1 𝑛 such that PRG( 𝑈 𝑠 ) is pseudorandom for 𝐶.
Summary of Applications
Why should we care? Why are we not satisfied by exp − 𝑘 1/𝑑 decay in tails and want exp −𝑘 decay? Motivating question: give a Fourier analytical proof that Majority cannot be approximated by AC0 circuits. (Other proofs: [Smolensky’93, O’Donnell-Wimmer’07]) 𝑓∈A C 0 𝐖 𝑘 𝑓 𝐖 𝑘 MAJ polylog(𝑛)
Different Notions of Fourier Concentration Let 𝑓 be a Boolean function and 𝑡 a parameter. TFAE: for all k: 𝐖 ≥𝑘 𝑓 ≤ 𝑒⋅ 𝑒 −𝑘/𝑂(𝑡) for all k: 𝐄 𝑆∼ 𝐷 𝑓 |𝑆| 𝑘 ≤𝑂 𝑡 𝑘 for all p, k: 𝐏𝐫 𝜌∼ 𝑅 p deg 𝑓 𝜌 ≥𝑘 ≤𝑂 𝑝𝑡 𝑘 . and they imply Exp. Small Fourier Tails Fourier Moments “Switching Lemma” 𝑆: 𝑆 =𝑘 | 𝑓 𝑆 | =𝑂 𝑡 𝑘
Majority is not approximated by AC0 Problem: both MAJ and AC0 are concentrated on lower levels of the Fourier spectrum. Idea: Recall 𝑓∈𝐀 𝐂 𝟎 𝑆 =𝑘 𝑓 𝑆 ≤polylog 𝑛 𝑘 . on the k’th level, 𝑓’s Fourier mass is concentrated on only polylog 𝑛 𝑘 coefs out of all the 𝑛 𝑘 coefs. Since MAJ is symmetric, it spreads its Fourier weight equally within each layer: every coefficient in the k’th level is at most 1 𝑛 𝑘 .
Majority is not approximated by AC0 Using Plancherel: 𝐄 𝑥 𝑓 𝑥 ⋅MAJ 𝑥 = 𝑆 𝑓 𝑆 ⋅ MAJ 𝑆 ≤ 𝑘=1 𝑛 𝑆 =𝑘 𝑓 𝑆 ⋅ MAJ 𝑆 For 1≤𝑘< 𝑛 0.1 : 𝑆 =𝑘 𝑓 𝑆 ⋅ MAJ 𝑆 ≤ polylog 𝑛 𝑘 𝑛 𝑘 For 𝑘≥ 𝑛 0.1 : 𝑆 ≥ 𝑛 0.1 𝑓 𝑆 ⋅ MAJ 𝑆 ≤ 𝑆≥ 𝑛 0.1 𝑓 𝑆 2 ⋅ 𝑆≥ 𝑛 0.1 MAJ 𝑆 2 = 𝐖 ≥ 𝑛 0.1 𝑓 ⋅ 𝐖 ≥ 𝑛 0.1 MAJ ≤ exp (− 𝑛 0.1 /polylog 𝑛 ) ≪ 1 𝑛 𝐄 𝑥 𝑓 𝑥 ⋅MAJ 𝑥 ≤ polylog 𝑛 𝑛
Open Question Which distributions fool AC0? [Aaronson’10, Fefferman-Shaltiel-Umans-Viola’12] Can you find a distribution which is pseudorandom for AC0 but not pseudorandom for log-time quantum algorithms? F an oracle separation between BQP from PH
Exponentially Small Fourier Tails Definition: 𝑓 has ESFT(t) if for all 𝑘: 𝐖 ≥𝑘 𝑓 ≤ 𝑒⋅ 𝑒 −𝑘/𝑡 Several interesting classes of functions have ESFT(t) CNFs/DNFs of width-𝑤 [Håstad’86, LMN’89] 𝑡 = 𝑂(𝑤) Formulas of size 𝑚 [Reichardt’11] 𝑡 = 𝑂 𝑚 Read-Once Formulas [Impagliazzo-Kabanets’14] 𝑡 = 𝑂 𝑛 1/3.27 Circuits of size 𝑚 and depth 𝑑 𝑡 = 𝑂( log 𝑚 𝑑−1 ) Functions with max-sensitivity 𝑠 [Gopalan-Servedio-T-Wigderson’16]: 𝑡 = 𝑂(𝑠)
Thank You!