New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio (Harvard/Columbia)

Slides:

Advertisements

Similar presentations

Chapter 2 Functions and Graphs.

Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Analysis of Algorithms II

The Polynomial Method In Quantum and Classical Computing Scott Aaronson (MIT) OPEN PROBLEM.

Quantum Lower Bounds You probably Havent Seen Before (which doesnt imply that you dont know OF them) Scott Aaronson, UC Berkeley 9/24/2002.

Quantum Lower Bounds The Polynomial and Adversary Methods Scott Aaronson September 14, 2001 Prelim Exam Talk.

The Future (and Past) of Quantum Lower Bounds by Polynomials Scott Aaronson UC Berkeley.

Scott Aaronson Institut pour l'Étude Avançée Le Principe de la Postselection.

Parikshit Gopalan Georgia Institute of Technology Atlanta, Georgia, USA.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Artificial Intelligence 12. Two Layer ANNs

LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.

Recurrences : 1 Chapter 3. Growth of function Chapter 4. Recurrences.

Three Special Functions

Rahnuma Islam Nishat Debajyoti Mondal Md. Saidur Rahman Graph Drawing and Information Visualization Laboratory Department of Computer Science and Engineering.

The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.

5.4 Basis And Dimension.

5.1 Real Vector Spaces.

Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.

Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.

1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

MS 101: Algorithms Instructor Neelima Gupta

Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.

Section 3.4 The Traveling Salesperson Problem Tucker Applied Combinatorics By Aaron Desrochers and Ben Epstein.

Fast Algorithms For Hierarchical Range Histogram Constructions

Extremal properties of polynomial threshold functions Ryan O’Donnell (MIT / IAS) Rocco Servedio (Columbia)

QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.

Review for Test 3.

Linear Separators.

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.

The number of edge-disjoint transitive triples in a tournament.

CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.

DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

Packing Element-Disjoint Steiner Trees Mohammad R. Salavatipour Department of Computing Science University of Alberta Joint with Joseph Cheriyan Department.

1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.

11.1 Boolean Functions. Boolean Algebra An algebra is a set with one or more operations defined on it. A boolean algebra has three main operations, and,

Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?

Section 2: Finite Element Analysis Theory

MATH 224 – Discrete Mathematics

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.

Logic Circuits Chapter 2. Overview  Many important functions computed with straight-line programs No loops nor branches Conveniently described with circuits.

The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.

Chapter 5 Existence and Proof by contradiction

1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)

Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.

1 What happens to the location estimator if we minimize with a power other that 2? Robert J. Blodgett Statistic Seminar - March 13, 2008.

One Function of Two Random Variables

CIE Centre A-level Further Pure Maths

Approximation Algorithms based on linear programming.

Ch 9.6: Liapunov’s Second Method In Section 9.3 we showed how the stability of a critical point of an almost linear system can usually be determined from.

Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.

1 IAS, Princeton ASCR, Prague. The Problem How to solve it by hand ? Use the polynomial-ring axioms ! associativity, commutativity, distributivity, 0/1-elements.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Computation of the solutions of nonlinear polynomial systems

Dana Ron Tel Aviv University

Chapter 5. Optimal Matchings

Trigonometric Identities

Propositional Calculus: Boolean Algebra and Simplification

Maths for Signals and Systems Linear Algebra in Engineering Lectures 9, Friday 28th October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.

Switching Lemmas and Proof Complexity

Presentation transcript:

New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio (Harvard/Columbia)

Polynomials with prescribed signs Suppose m disjoint regions R 1, …, R m are given in R n, along with associated signs, σ 1, …, σ m. What is the lowest degree polynomial p : R nR which has the prescribed signs on the regions? In one dimension the problem is trivial: if the regions are intervals, the number of sign alternations is necessary and sufficient. In two or more dimensions…??

Polynomial threshold functions A special case: Let f :{0,1} n{+1,1} be a boolean function. Let p : R nR be a polynomial. We say that p is a polynomial threshold function (PTF) for f, or p sign-represents f, if: f(x) = sgn(p(x)) for all x {0,1} n. We are concerned with finding the lowest degree PTF for f.

Polynomial threshold functions For example: x 1 +x 2 +…+x n ½ deg 1 PTF for OR x 1 +x 2 +…+x n (n½) deg 1 PTF for AND x 1 +x 2 +…+x n (n/2) deg 1 PTF for MAJ (12x 1 ) (12x 2 )···(12x n ) deg n for PARITY Every n -bit boolean function has a PTF (indeed, an exact rep.) of degree n. (Consider: … + f (1101) x 1 x 2 (x 31)x 4 + … )

Polynomial threshold functions What are PTFs good for? natural algebraic model of complexity upper bounds machine learning: given a class of functions C, if every function has a PTF of degree d, can learn C in time n O(d) used to prove PP closed under intersection lower bounds oracle separations slightly stricter model related to quantum decision tree complexity

Prior work lower bounds Minsky & Papert, Perceptrons, 1968: artificial intelligence perspective proved three major lower bounds: -PARITY requires PTF degree n -a certain DNF formula, one in a box, the n 1/3 way OR of n 2/3 way AND s, requires PTF degree n 1/3 -MAJ(x 1,…,x n ) AND MAJ(y 1,…,y n ) requires superconstant PTF degree No new, essentially diff., lower bounds known.

Prior work upper bounds [BRS95] considered AND-MAJ n as well; they showed it has PTF degree O(log n) ; they used this to show PP is closed under intersection [KS01] showed that every DNF formula on n variables with s terms has a PTF of degree O(n 1/3 log s) ; they use this to get a subexponential time learning algorithm for DNF formulas which is fastest known

Our results Upper bound: every boolean function given by an AND/OR/NOT formula of size s and depth d has a PTF of degrees log O(d) s (note that degree s is trivial) gives a subexponential time learning algorithm for, say, linear size formulas of superconstant depth, first such known Lower bound: new technique AND-MAJ n requires PTF degree Ω(log n / log log n).

Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

Boolean formulas AND OR x1x1 x2x2 x3x3 x4x4 x7x7 x2x2 x5x5 x1x1 x8x8 x9x9 x 10 x 11 x 12 x1x1 x4x4 x6x6 x7x7 x 13 a formula is a tree whose gates are AND s or OR s, unbounded fanin leaves are labeled with literals size is number of leaves depth is longest roottoleaf path

PTFs for boolean formulas (In this section we use {0,1} always.) Idea: replace all gates with low degree polynomials which simulate the gate: AND(v 1,…v k ) ? v 1 + … + v k (k1) [(v 1 + … + v k ) / k] k log(1/ε) AND

A better amplifying polynomial We want to amplify the disparity between 11/ k and 1. Raising to the power of k works, but costs a lot of degree. We desire a polynomial of low degree which keeps values in [0, 11/ k] between 0 and 1 but amplifies the point 1 to, say, 2. Equivalently, want to get a polynomial bounded on [0,1], with maximum derivative at 1.

Chebyshev polynomials This is an old problem of analysis, solved by the Chebyshev polynomials of the first kind. These are a family of orthogonal polynomials, (C r ) r N, with the properties: deg(C r ) = r, C r ([-1,1]) [-1,1], C r ' (1) = r 2, C r (1+1/r 2 ) 2. C r (x) = cos(r acos(x)).

Chebyshev polynomials at gates Chebyshev polynomials give us a square- root degree savings: Imagine replacing AND(v 1,…v k ) with: C k ([(v 1 + … + v k ) / (k-1)]). ( * ) (v 1 + … + v k ) / (k-1) 1+1/k if all v i s are roughly 1, and is in [0,1] otherwise. Hence ( * ) is something like 2 when the AND is true, and is between -1 and 1 otherwise. (This idea is originally from [KS01].)

Chebyshev polynomials at gates In fact, we will replace each AND gate by: ε C k ([(v 1 + … + v k ) / (k-1)]) log(1/ε), and something similar for OR gates. Note that if the inputs have 0/1 values ε, so do the outputs. Further, if the v i s all have degree bounded by d, the resulting polynomial has degree bounded by d k log(1/ ε).

Almost done By applying these polynomials at every gate, we can easily conclude: Suppose F is a formula in which along every path from root to leaf, the product of the fan-ins is t. Then we can sign-represent F with a polynomial of degreet log O(d) s. (Need to take ε 1/s.) We are not quite done, because these fan-in products can be huge!

Bounding fan-in products OR AND OR … x 1 … x n/100 x n/100 … x 2 n/100 … … Only n variables (leaves) are used, but one path has fan-in product (n/100) 100.

Solution: bucket The trick is now to partition each gate into gates, each of which has subformulas of similar size: AND s1s1 s2s2 s3s3 s4s4 1 s i < 22 j s i < 2 j+1 s/2 s i < s log s

Conclusion of upper bound Now it is easy to see that gates with a subformula of depth d and size s have maximum root-to-leaf fan-in product of O(s log d s) : Pf: By induction: the AND bucket with subsizes in [2 j, 2 j+1 ] has fan-in at most s/2 j. Hence if we first modify our formulas in this way, and then apply the Chebyshev construction, we get PTFs of degrees log O(d) s, as desired.

Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

Lower bound for AND-MAJ n Recall the AND-MAJ n function: (x 1,…,x n, y 1,…,y n ) MAJ(x 1,…,x n ) AND MAJ(y 1,…,y n ). Minsky and Papert (1968) showed that any PTF required superconstant ω(1) degree. Beigel, Reingold, and Spielman (1995) exhibited a PTF of degree O(log n). We give a new lower bound of: Ω(log n / log log n).

The two-dimensional problem Minsky and Papert observed that the problem of PTFs for AND-MAJ n is equivalent to a much simpler polynomial sign prescription problem – the M-intersector problem: -R 2, bivariate polynomial -regions: all odd lattice points bounded by M -upperright points positive, others negative y x M

Proof of equivalence Switch to {+1,1} in input and output. ( ) Suppose p is an n -intersector. Then p(x i, y i ) is a PTF for AND-MAJ n of same degree. ( ) Suppose p is the PTF. Consider: q(x 1 …x n, y 1 …y n ) = p(x π(1) …x π(n), y π'(1) …y π'(n) ). By symmetry, q is also a PTF for AND-MAJ n. But q is symmetric in x s and y s, hence depends only on their sum, q=q(x i, y i ). π,π' S n

The M -intersector problem Consider the more general sign prescription problem: No polynomial can have these signs! Proof: Assume we have p of minimal degree. By continuity, p must be 0 on x half-axis. By Bezout, x | p. Divide through; the result has smaller degree, solves (essentially) same problem. y x +

Reproving Minsky-Papert This can be used to show Minsky and Paperts superconstant lower bound. Suppose there was a fixed d such that there was a M -intersector of degree d for every M. Take M, rescaling to the unit square. By compactness and continuity, there is a limiting degree- d polynomial whose signs are as on the previous slide, a contradiction.

The relaxed case [BRS95] constructed a bivariate polynomial of degree O(log M) for the sign pattern shown. We now describe how to obtain a lower bound of Ω(log M / log log M) for the M -intersector problem. We show that for any d, there is a subset of lattice points with coordinates at most d O(d) which cant be done in degree d. y x + 1 M

A constructive solution It is possible to show PTF lower bounds constructively. Let Z denote the set of odd lattice points, and let f denote the function which is +1 in the upper-right quadrant,1 elsewhere. Suppose we could find a probability distribution w on Z under which every monomial x i y j, 0 i+j d, had zero correlation with f.

A constructive solution I.e., suppose we have w : Z R0, w (z) = 1, such that: f(x,y) x i y j w (x,y) = 0 for all monomials x i y j of degree at most d. Suppose also that w = 0 on points with coordinates exceeding M. We claim this implies no M -intersector of degree d exists. z Z (x,y) Z

Proof of constructive method Proof: Suppose p were an M -intersector of degree d. On one hand, by linearity of expectation, E w [f(x,y)p(x,y)] = 0, since f is uncorrelated with monomials of degree d. On the other hand, on all lattice points bounded by M, f(x,y)p(x,y) > 0. But w gives all of its probability mass to these points. Intriguingly, the much stronger converse (no distribution PTF) is true, by LP duality.

Constructing the distribution There are D = (d+1)(d+2)/2 constraints – monomials we want to be uncorrelated with. Suppose we pick just D+1 points for our distribution to be supported on, (x 1,y 1 ), …, (x D+1,y D+1 ). Then the condition that w is a probability distribution over these points under which all constraint monomials have 0 correlation with f is a (D+1)×(D+1) linear system.

Constructing the distribution monomial x i y j point (x k,y k ) f(x k,y k ) x k i y k j · · · · · = 1000 : :01000 : :0 w (x 1,y 1 ) w (x 2,y 2 ) w (x 3,y 3 ) w ( x D+1, y D+1 ) Our desire is that the solution be nonnegative.

Me thinking

Rocco thinking

Our solution We now pull a rabbit out of our hat and name the exact set of points on which the distribution will be supported. Essentially, we want just the grid of points, but in the log scale. Let h be a large number to be named later. Our points will be a subset of { ( h i, h j ) : 0 i+j d}.

Our solution The exact (D+1) points to consider are: {( 1) l h k, ( 1) k h l : 0 k+l d} { 1, 1 }, where h = d O(1), and odd.

Finishing the proof We consider the linear system given by this choice of points. We need to show the solution consists of nonnegative values. The solution weights are ratios of two certain determinants, by Cramers rule. Each determinant is a polynomial in h. We calculate the highest order terms, show that they dominate the polynomial (using the fact that h is large), and show they have the same sign. (Details omitted!)

Finishing the proof Hence, weve constructed a true probability distribution over the odd lattice points, under which f has zero correlation with all monomials of degree at most d. The largest coordinate used is d O(d). This shows that d O(d) -intersectors require PTF degree d ; i.e., M -intersectors require PTF degree Ω(log M / log log M).

Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

Open questions Does every boolean formula of size s have a PTF of degree O(s) independent of depth? Minsky and Papert showed a Ω(n 1/3 ) PTF lower bound for a certain depth 2 circuit. Can one show a significantly stronger lower bound for any constant depth circuit? Better lower or upper bounds for the intersection of two weighted thresholds? Explore the polynomial sign prescription problem further.