Lower bounds against convex relaxations via statistical query complexity Based on: V. F., Will Perkins, Santosh Vempala. On the Complexity of Random Satisfiability.

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Lower bounds against convex relaxations via statistical query complexity Based on: V. F., Will Perkins, Santosh Vempala. On the Complexity of Random Satisfiability Problems with Planted Solutions. STOC 2015 V. F., Cristobal Guzman, Santosh Vempala. Statistical Query Algorithms for Stochastic Convex Optimization. SODA 2017 V. F. A General Characterization of the Statistical Query Complexity. arXiv 2016 Vitaly Feldman IBM Research – Almaden

The plan Boolean constraint satisfaction problems Convex relaxations Comparison with lower bounds against LP/SDP hierarchies (Known) sign-rank lower bounds via SQ complexity

MAX-CSPs 𝑘-SAT MAX-𝑘-CSP 𝑘-SAT refutation Given: 𝜙=( 𝑐 1 , 𝑐 2 ,…, 𝑐 𝑚 ), where clause 𝑐 𝑖 is OR of ≤𝑘 (possibly negated) variables Is 𝜙 satisfiable? MAX-𝑘-CSP Find argmax 𝜎∈ 0,1 𝑛 𝑖 𝑝 𝑖 𝜎 for 𝑘-ary predicates 𝑝 1 ,…, 𝑝 𝑚 𝑘-SAT refutation If 𝜙 is satisfiable output YES. If 𝜙 is random output NO with prob >2/3 𝜙∼ 𝑈 𝑘 𝑚 : 𝑘-clauses are chosen randomly and uniformly from of all 𝑘-clauses Unsatisfiable w.h.p. for 𝑚>𝑐𝑜𝑛𝑠𝑡⋅ 2 𝑘 𝑛 1 𝑚 argmax 𝜎∈ 0,1 𝑛 𝑖 𝑐 𝑖 𝜎 ≲1− 1 2 𝑘 Best poly-time algorithm uses 𝑂 𝑛 𝑘/2 clauses [Goerdt,Krivelevich 01; Coja-Oglan,Goerdt,Lanka 07; Allen,O’Donnell,Witmer 15] Conjectured to be hard [Feige 02]

Convex relaxation for MAX-CSPs Objective-wise mapping: Denote 𝑓 𝜙 ≐ 1 𝑚 𝑖 𝑓 𝑐 𝑖 Which (𝐾,𝐹,𝛼) allow such mappings? Clause 𝑐(𝜎) over 0,1 𝑛 Convex function 𝑓 𝑐 𝑤 ∈𝐹 over a convex body 𝐾⊆ ℝ 𝑑 max 𝜎∈ 0,1 𝑛 𝑖 𝑐 𝑖 𝜎 ≡ min 𝜎∈ 0,1 𝑛 𝑖 ¬ 𝑐 𝑖 𝜎 min 𝑤∈𝐾 𝑖 𝑓 𝑐 𝑖 𝑤 Refutation gap 𝛼: If 𝜙 is satisfiable: min 𝑤∈𝐾 𝑓 𝜙 𝑤 ≤0 If 𝜙∼ 𝑈 𝑘 𝑚 : min 𝑤∈𝐾 𝑓 𝜙 𝑤 ≥𝛼>0 (with prob >2/3) Can kCSPs be solved using convex programming? Only need to rule out relaxations for which the convex optimization problem can be solved efficiently. The complexity depends on K, F, and alpha

Outline Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Convex optimization algorithms Convex relaxation Lower bound on statistical query complexity of stochastic 𝑘-SAT refutation Optimization of 𝐾,𝐹,𝛼 in the stochastic setting has low SQ complexity Convex relaxation is a reduction from stochastic k-SAT and stochastic convex optimization If for (F,K,\alpha) the upper bound for SCO is lower than the lower bound for k-SAT then obtain a contradiction YES: 𝜙∼ 𝐷 𝑚 the support of 𝐷 is satisfiable Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚

Lower bound example I ℓ 2 -Lipschitz convex optimization needs 𝑑=exp⁡(𝑛⋅ 𝛼 2/𝑘 ) ℓ 1 -Lipschitz convex optimization needs 𝑑=exp⁡(𝑛⋅ 𝛼 2/𝑘 ) For 𝛼>0 and any convex 𝐾⊆ 𝐵 2 𝑑 1 𝐹={all convex funcs 𝑓 s.t. ∀𝑤∈𝐾, 𝛻𝑓(𝑤) 2 ≤1} Then 𝑑= exp Ω 𝑘 𝑛⋅ 𝛼 2/𝑘 For 𝛼>0 and any convex 𝐾⊆ 𝐵 1 𝑑 1 𝐹={all convex funcs 𝑓 s.t. ∀𝑤∈𝐾, 𝛻𝑓(𝑤) ∞ ≤1} Then 𝑑= exp Ω 𝑘 𝑛⋅ 𝛼 2/𝑘

Lower bound example II General convex optimization needs 𝑑= Ω 𝑘 (𝑛 𝑘/2 ) For 𝛼= Ω 𝑘 (1) and any convex 𝐾 𝐹={all convex funcs over 𝐾 with range [−1,1]} Then 𝑑= Ω 𝑘 𝑛 log 𝑛 𝑘/2

Lower bounds from algorithms ℓ 2 -Lipschitz convex optimization needs 𝑑=exp⁡(𝑛⋅ 𝛼 2/𝑘 ) Projected gradient descent ℓ 1 −Lipschitz convex optimization needs 𝑑=exp⁡(𝑛⋅ 𝛼 2/𝑘 ) Entropic mirror descent (multiplicative weights) Many more lower bounds of this type can be easily obtained: different norms, exploit smoothness and/or strong convexity General convex optimization needs 𝑑= Ω 𝑘 (𝑛 𝑘/2 ) Random walks Center of gravity

Statistical queries [Kearns ‘93] 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 ∼𝐷 over 𝑋 Replace i.i.d. inputs from D with oracle access to D. The oracle approximately evaluates average of any function with range [-1,1]

Statistical queries [Kearns ‘93] 𝜙 1 STA T 𝐷 (𝜏) oracle 𝑣 1 𝜙 2 𝑣 2 𝐷 𝜙 𝑞 SQ algorithm 𝑣 𝑞 𝑣 1 − 𝐄 𝑥∼𝐷 𝜙 1 𝑥 ≤𝜏 𝜏 is tolerance of the query; 𝜏=1/ 𝑚 𝜙 1 :𝑋→ −1,1 Replace i.i.d. inputs from D with oracle access to D. The oracle approximately evaluates average of any function with range [-1,1] Problem 𝑃∈SQCompl 𝑞, 𝑚 : If exists a SQ algorithm that solves 𝑃 using 𝑞 queries to STAT 𝐷 (𝜏=1/ 𝑚 )

Statistical queries [Kearns ‘93] 𝜙 1 STA T 𝐷 (𝜏) oracle 𝑣 1 𝜙 2 𝑣 2 𝐷 𝜙 𝑞 SQ algorithm 𝑣 𝑞 𝑣 1 − 𝐄 𝑥∼𝐷 𝜙 1 𝑥 ≤𝜏 𝜏 is tolerance of the query; 𝜏=1/ 𝑚 𝜙 1 :𝑋→ −1,1 Replace i.i.d. inputs from D with oracle access to D. The oracle approximately evaluates average of any function with range [-1,1] Applications: Noise-tolerant learning [Kearns 93; …] Private data analysis [Dinur,Nissim 03; Blum,Dwork,McSherry,Nissim 05; DMN,Smith 06; …] Distributed/low communication/memory ML [Ben-David,Dichterman 98; Chu et al., 06; Balcan,Blum,Fine,Mansour 12; Steinhardt,G. Valiant,Wager 15; F. 16] Evolvability [L. Valiant 06; F. 08; …] Adaptive data analysis [Dwork, F., Hardt,Pitassi,Reingold,Roth 14; …]

Outline 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚 Convex optimization algorithms Convex relaxation Lower bound on SQ complexity of stochastic 𝑘-SAT refutation Optimization of 𝐾,𝐹,𝛼 in the stochastic setting has low SQ complexity 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚

Stochastic convex optimization (SCO) Convex body 𝐾⊆ ℝ 𝑑 Class of convex functions 𝐹 over 𝐾 What is the SQ complexity of Opt 𝐾,𝐹,𝜖 ? Opt 𝐾,𝐹,𝜖 : Unknown distribution 𝐷 over 𝐹 𝜖-minimize 𝑓 𝐷 (𝑤)≐ 𝐄 𝑓∼𝐷 [𝑓(𝑤)] over 𝐾: Find 𝑤 s.t. 𝑓 𝐷 𝑤 ≤ min 𝑤∈𝐾 𝑓 𝐷 𝑤 +𝜖 Standard: Given 𝑚 i.i.d. samples 𝑓 1 ,…, 𝑓 𝑚 ~𝐷 SQ: Given STAT 𝐷 (𝜏=1/ 𝑚 ) 𝑓 1 𝑓 2 … 𝑓 𝑚 In SCO the goal is to optimize the expected function. Generalizes convex problems in ML/statistics 𝑓 𝐷 𝑤

SQ algorithms for SCO Reduction from an optimization oracle to SQ oracle Direct analysis of an existing SCO algorithm New/modified algorithm Hard work Reductions

Zero-order/value oracle 𝜂-approximate value oracle for 𝑓 over 𝐾⊆ ℝ 𝑑 : Given 𝑤∈𝐾, Val 𝑓 (𝜂) returns 𝑣, 𝑣−𝑓 𝑤 ≤𝜂 𝑓 𝑤 If for all 𝑓∈𝐹, range 𝑓 ⊆[−1,1] then for any 𝐷 over 𝐹, STAT 𝐷 (𝜂) can simulate Val 𝑓 𝐷 𝜂 [P.Valiant ‘11] STA T 𝐷 (𝜂) 𝐷 𝜙 𝑤 𝑓 ≐𝑓 𝑤 𝑤 Val 𝑓 𝐷 𝜂 𝑣 𝐄 𝑓∼𝐷 𝜙 𝑤 𝑓 = 𝐄 𝑓∼𝐷 𝑓(𝑤) = 𝑓 𝐷 𝑤

Corollaries Known results for arbitrary 𝐾 and range 𝑓 ⊆[−1,1] Ellipsoid-based: poly 𝑑 𝜖 queries to Val 𝑓 1 poly 𝑑/𝜖 [Nemirovski,Yudin 77; Grotschel,Lovasz,Schrijver 88] Random walks: poly 𝑑 𝜖 queries to Val 𝑓 Ω(𝜖/𝑑) [Belloni,Liang,Narayanan,Rakhlin 15; F.,Perkins,Vempala 15] Corollary: For 𝐹 = {all convex funcs over 𝐾 with range [−1,1]} Opt 𝐾,𝐹,𝜖 ∈SQCompl poly 𝑑 𝜖 ,𝑂 𝑑 2 𝜖 2 In high dimension weaker than full access/gradient oracle [Nemirovski,Yudin ‘77; Singer,Vondrak ‘15; Li, Risteski ‘16]

First-order/gradient oracles Global approximate gradient oracle of 𝑓 over 𝐾: Given 𝑤∈𝐾, Grad 𝑓,𝐾 (𝜂) returns 𝑔, s.t. for all 𝑢,𝑣∈ 𝐾 𝑔−𝛻𝑓 𝑤 , 𝑣−𝑢 ≤𝜂 𝑓 𝑤 𝑓 𝑤 0 +〈𝛻𝑓 𝑤 0 ,𝑤− 𝑤 0 〉 𝑤 0 𝑓 𝑤 0 + 𝑔,𝑤− 𝑤 0 } 𝜂 Global approximate oracle behaves like the true oracle over the whole domain: linear approximation of f at every point is \eta close to the true one The assumption on F imply that the gradients of functions in the support of D are uniformly bounded. This can be related to a bound on the variance If 𝐾= 𝐵 ‖⋅‖ (1) then equivalent to 𝑔−𝛻𝑓 𝑤 ∗ ≤𝜂/2 To implement Grad 𝑓 𝐷 ,𝐾 (𝜂) need to estimate 𝛻 𝑓 𝐷 (𝑤)= 𝐄 𝑓∼𝐷 𝛻𝑓(𝑤) within 𝜂/2 in ‖⋅ ​ ∗ Assuming that ∀𝑓∈𝐹, 𝑤∈𝐾, 𝛻𝑓 𝑤 ∗ ≤1!

Mean vector estimation Mean estimation in ‖⋅ ​ ∗ : Given distribution 𝐷 over 𝐵 ‖⋅ ​ ∗ (1) Find 𝑧 s.t. 𝑧 − 𝑧 𝐷 ∗ ≤𝜖, where 𝑧 𝐷 ≐𝐄 𝑧∼𝐷 𝑧 Easy case: ℓ ∞ Coordinate-wise estimation: for every 𝑖∈[𝑑], ask query 𝑔 𝑖 𝑧 = 𝑧 𝑖 . Let 𝑧 𝑖 be the answer of STAT 𝐷 (𝜖). Then 𝑧 − 𝑧 𝐷 ∞ ≤𝜖 What about ℓ 2 ? 𝑧 − 𝑧 𝐷 2 = 𝑖 𝑧 𝑖 − 𝑧 𝐷,𝑖 2 Coordinate-wise estimation requires 𝜏=𝜖/ 𝑑 In contrast, 𝑂 1 𝜖 2 samples suffice We abstract the gradient estimation problem to estimation of mean vector. The problem can be thought of as reducing high-dimensional concentration to one-dimensional.

Kashin’s representation [Lyubarskii, Vershynin 10] Vectors 𝑢 1 ,…, 𝑢 𝑁 provide Kashin’s representation with level 𝜆 if : tight frame: ∀𝑧∈ ℝ 𝑑 , 𝑖 𝑧, 𝑢 𝑖 2 = 𝑧 2 2 low dynamic range: ∀𝑧∈ ℝ 𝑑 , ∃𝑎∈ ℝ 𝑁 , 𝑖 𝑎 𝑖 𝑢 𝑖 =𝑧 and 𝑎 ∞ ≤ 𝜆 𝑑 𝑧 2 Thm [LV 10]: There exists Kashin’s representation of level 𝜆=𝑂(1) for 𝑁=2𝑑 and can be constructed efficiently Use coordinate-wise mean-estimation in Kashin’s representation: For every 𝑖∈[2𝑑], ask query 𝑔 𝑖 𝑧 = 𝑑 𝑎 𝑖 𝜆 to STAT 𝐷 (𝜖/𝜆). \ell_2 mean estimation Improvement from \eps/\sqrt d matching sample complexity. Corollary: Mean estimation in ‖⋅ ​ 2 can be solved using 2𝑑 queries to STAT 𝐷 (Ω(𝜖))

Other norms ℓ 𝑞 norms What about the general case? Always in SQCompl 𝑑, 𝑑 𝜖 2 Mostly open Different from sample complexity for some hard to compute norms Tight upper and lower bounds. SQ estimation complexity (1/\tau^2) matches sample complexity for \ell_q norm up to log d factor. For any norm the estimation complexity if at most d/\epsilon^2 The problem is open for most norms Norms for which SQ complexity is known to be different from information theoretic complexity are hard to compute norms such nuclear tensor norms

Example corollaries ℓ 2 -Lipschitz SCO: For any convex 𝐾⊆ 𝐵 2 𝑑 1 𝐹={all convex funcs 𝑓 s.t. ∀𝑤∈𝐾, 𝛻𝑓(𝑤) 2 ≤1} Opt 𝐾,𝐹,𝜖 ∈SQCompl 𝑂 𝑑 𝜖 2 ,𝑂 1 𝜖 2 ℓ 1 -Lipschitz SCO: For any convex 𝐾⊆ 𝐵 1 𝑑 1 𝐹={all convex funcs 𝑓 s.t. ∀𝑤∈𝐾, 𝛻𝑓(𝑤) ∞ ≤1} Opt 𝐾,𝐹,𝜖 ∈SQCompl 𝑂 𝑑 log 𝑑 𝜖 2 ,𝑂 1 𝜖 2 Corollaries from plugging the approximate gradient into standard gradient based algorithms

Outline 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚 Convex optimization algorithms Convex relaxation Lower bound on SQ complexity of stochastic 𝑘-SAT refutation From SQ dimension to SQ complexity Lower bound on SQ dimension of 𝑘-SAT Optimization of 𝐾,𝐹,𝛼 in the stochastic setting has low SQ complexity 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚

Stochastic 𝑘-SAT refutation If 𝜙∼ 𝐷 𝑚 s.t. the support of 𝐷 is satisfiable, output YES If 𝜙∼ 𝑈 𝑘 𝑚 , output NO with prob >2/3

SQ dimension One-vs-many decision problems: Fixed-distribution PAC learning [Blum,Furst,Jackson,Kearns,Mansour,Rudich 95; …] General statistical problems Lower bounds [F.,Grigorescu,Reyzin,Vempala,Xiao 13; FPV 15] Characterization [F. 16] One-vs-many decision problems: Let 𝓓 1 be a set distributions over 𝑋 and 𝐷 0 be a reference distribution over 𝑋 Dec 𝓓 1 , 𝐷 0 : for an input distribution 𝐷∈ 𝓓 1 ∪{ 𝐷 0 } decide if 𝐷∈ 𝓓 1 Several other notions of dimension and analysis techniques are known In this talk a simple SQ dimension for decision problems that suffices for k-SAT.

SQ dimension of Dec 𝓓 1 , 𝐷 0 [F. 16] maxDiscr 𝓓, 𝐷 0 ,𝜏 = 1 𝓓 ⋅ max 𝜙:𝑋→[−1,1] 𝐷∈𝓓; 𝐄 𝐷 𝜙 − 𝐄 𝐷 0 𝜙 >𝜏 SQDim Dec 𝓓 1 , 𝐷 0 ,𝜏 ≐ max 𝓓⊆ 𝓓 1 1 maxDiscr 𝓓, 𝐷 0 𝜏 𝓓 𝐷 0 𝜙 If SQDim Dec 𝓓 1 , 𝐷 0 ,𝜏 >𝑁 then any algorithm that solves Dec 𝓓 1 , 𝐷 0 given access to STAT 𝐷 𝜏 requires >𝑁 queries Dec 𝓓 1 , 𝐷 0 ∉SQCompl 𝑁, 1 𝜏 2

SQD of 𝑘-SAT refutation Hard family of distributions: 𝐷 𝜎 uniform over all 𝑘-clauses in which σ satisfies an odd number of literals 𝐷 0 = 𝑈 𝑘 ; 𝓓= 𝐷 𝜎 ; 𝜎∈ −1,1 𝑛 𝐄 𝐷 𝜎 𝜙 − 𝐄 𝑈 𝑘 𝜙 is a degree-𝑘 (multilinear) polynomial of 𝜎 with constant term =0 maxDiscr 𝓓, 𝐷 0 ,𝜏 = 1 𝓓 ⋅ max 𝜙:𝑋→[−1,1] 𝜎; 𝐄 𝐷 𝜎 𝜙 − 𝐄 𝑈 𝑘 𝜙 >𝜏 Concentration properties of low-degree polynomials over −1,1 𝑛 : for all 𝑡>0, Pr 𝜎∈ −1,1 𝑛 𝐄 𝐷 𝜎 𝜙 − 𝐄 𝑈 𝑘 𝜙 > 𝑡⋅𝑛 −𝑘/2 = 𝑒 −Ω(𝑘⋅ 𝑡 2/𝑘 ) Can be used for various other classes of CSPs and gives tight lower bounds Lower bound holds against other algorithmic approaches that are SQ implementable Thm: SQDim Dec 𝓓 1 , 𝐷 0 , 𝑡⋅𝑛 −𝑘/2 = 𝑒 Ω(𝑘⋅ 𝑡 2/𝑘 ) ∀𝑞>1, 𝑘-SAT-Refute∉SQCompl 𝑞, 𝑛 log 𝑞 𝑘

Outline 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚 Convex optimization algorithms Convex relaxation Lower bound on SQ complexity of stochastic 𝑘-SAT refutation Optimization of 𝐾,𝐹,𝛼 in the stochastic setting has low SQ complexity 𝑘-SAT-Refute∉ SQCompl 𝑞,𝑚 Opt 𝐾,𝐹,𝛼 ∈SQCompl 𝑞,𝑚

Comparison with known approaches Sherali-Adams,SOS/Laserre hierarchies: [Grigoriev 01; Shoenebeck 08; Charikar,Makarychev,Makarychev 09; O’Donnell,Witmer 14] LP extended formulations: [Chan,Lee,Raghavendra,Steurer 13; Kothari,Meka,Raghavendra 16] Same: Objective-wise relaxation to functions over a fixed 𝐾 Incomparable/complementary: Known SQ based Linear functions 𝑐→ 𝑤 𝑐 Convex functions: 𝑐→ 𝑓 𝑐 ∈𝐹 𝐾 is a polytope with bounded number of facets 𝐾 is any convex body. SQCompl(𝐾,𝐹,𝛼) is bounded Assumes mapping 𝑀: 0,1 𝑛 →𝐾 s.t. 𝑐 𝜎 =〈 𝑤 𝑐 ,𝑀 𝜎 〉 and gap Assumes an 𝛼 gap in optimization outcomes “Variance”/“Overfitting” “Bias”/“Model misspecification” In known approaches enforcing linear structure makes the model too rich. Overfits since the number of given clauses is too small. Hence lower bounds only hold against relaxations that are not efficient from the point of view of sample complexity. In SQ lower bounds low SQC complexity ensures that there is no overfitting. The lower bound shows that models that are efficient from both computational and statistical points of view are not rich enough to express the CSPs The results are incomparable/complementary. [Barak,Moitra ’16]

Sign-rank lower bounds via SQ complexity For a matrix 𝐴∈ −1,1 𝑚×𝑛 , signRank 𝐴 ≐ min 𝐴 ′ :sign 𝐴 ′ [𝑖,𝑗] =𝐴[𝑖,𝑗] rank( 𝐴 ′ ) Dimension complexity: Let 𝐻 be a set of −1,1 -valued functions over 𝑋 DC(𝐻) is the lowest 𝑑 such that exists a mapping 𝑀:𝑋→ ℝ 𝑑 such that: ∀ℎ∈𝐻 exists 𝑤∈ ℝ 𝑑 , such that ∀𝑥∈𝑋, ℎ 𝑥 =sign 𝑤,𝑀 𝑥 Define 𝐴 𝐻 ∈ −1,1 |𝐻|×|𝑋| 𝐴 𝐻 ℎ,𝑥 =ℎ(𝑥) Then DC 𝐻 =signRank 𝐴 𝐻 Halfspaces over ℝ 𝑑 can be PAC learned in SQCompl(poly 𝑑 ,poly 𝑑 ) [Blum,Frieze,Kannan,Vempala 96] Learning of PAR (parity functions) not in SQcompl 2 𝑛/3 , 2 𝑛/3 [Kearns 93; BFJKMR 95] Same approach to get lower bounds in a different context. A_PAR is the Hadamard matrix. Provides a proof of Forster’s result: breakthrough on a problem open for 15 years. Follows easily from results known 5 years earlier Heavy lifting done by the BFKV algorithm. Corollary: signRank 𝐴 PAR = 2 Ω(𝑛) Proved by Forster [2001]

Conclusions Convex relaxations fail for XOR constraint optimization SQ complexity lower bounds bridge between algorithms and structural lower bounds Extensions Other MAX-𝑘-CSPs Stronger 𝑛 1−𝛽 -wise reductions [F., Ghazi ‘17] Many open problems