Download presentation
Published byHollie Cole Modified over 7 years ago
1
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Sum of squares optimization: scalability improvements and applications to difference of convex programming. Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
2
Nonnegative polynomials
A polynomial 𝑝 𝑥 ≔𝑝 𝑥 1 ,…, 𝑥 𝑛 is nonnegative if 𝑝 𝑥 ≥0,∀𝑥∈ ℝ 𝑛 . 𝑝 𝑥 = 𝑥 4 −5 𝑥 2 −𝑥+10 Is this polynomial nonnegative?
3
Optimizing over nonnegative polynomials (1/3)
Interested in more than checking nonnegativity of a given polynomial Problems of the type: Linear objective and affine constraints in the coefficients of 𝑝 (e.g., sum of coefs =1) min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 Decision variables are the coefficients of the polynomial 𝑝 Nonnegativity condition Why would we be interested in problems of this type?
4
Optimizing over nonnegative polynomials (2/3)
Optimization: polynomial optimization 𝜸 ∗ min 𝑥 𝑝(𝑥) 𝑠.𝑡. 𝑓 𝑖 𝑥 ≤0 𝑔 𝑗 𝑥 =0 max 𝛾 𝛾 𝑠.𝑡. 𝑝 𝑥 −𝛾≥0, ∀𝑥∈{ 𝑓 𝑖 𝑥 ≤0, 𝑔 𝑗 𝑥 =0} Optimal power flow problem Combinatorial optimization problems Economics and game theory Sensor network localization
5
Optimizing over nonnegative polynomials (3/3)
Controls: Automated search for Lyapunov functions for dynamical systems Software verification Statistics: Convex regression
6
Imposing nonnegativity (1/3)
Is this polynomial nonnegative? NP-hard to decide for degree ≥4. What if 𝑝 can be written as a sum of squares (sos)?
7
Imposing nonnegativity (2/3)
A polynomial 𝑝(𝑥) of degree 2d is sos if and only if ∃𝑄≽0 such that where 𝑧= 1, 𝑥 1 ,…, 𝑥 𝑛 , 𝑥 1 𝑥 2 ,…, 𝑥 𝑛 𝑑 𝑇 is the vector of monomials of degree up to 𝑑. Example: Sufficient condition but not necessary – we don’t lose that much
8
Imposing nonnegativity (3/3)
Initial optimization problem: min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 Sum of squares relaxation: min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 Intractable Equivalent semidefinite programming formulation: min 𝑝,𝑄 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝑝=𝑧 𝑥 𝑇 𝑄𝑧 𝑥 𝑄≽0 But: Size of 𝑄= 𝑛+𝑑 𝑑 × 𝑛+𝑑 𝑑
9
This talk Recent efforts to make sos more scalable by avoiding SDP
Using sum of squares to optimize over convex functions
10
Alternatives to sum of squares: dsos and sdsos
Sum of squares (sos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄≽0 SDP DD cone ≔ 𝑸 𝑸 𝒊𝒊 ≥ 𝒋 𝑸 𝒊𝒋 , ∀𝒊} PSD cone≔ 𝑸 𝑸≽𝟎} SDD cone ≔ 𝑸 ∃ diagonal 𝑫 with 𝑫 𝒊𝒊 >𝟎 s.t. 𝑫𝑸𝑫 𝒅𝒅} Diagonally dominant sum of squares (dsos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙𝑙𝑦 𝑑𝑜𝑚𝑖𝑛𝑎𝑛𝑡 (dd) LP Scaled diagonally dominant sum of squares (sdsos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄 𝑠𝑐𝑎𝑙𝑒𝑑 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙𝑙𝑦 𝑑𝑜𝑚𝑖𝑛𝑎𝑛𝑡 (sdd) SOCP Ahmadi, Majumdar
11
Alternatives to sum of squares: dsos and sdsos
Initial optimization problem: min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒅𝒔𝒐𝒔/𝒔𝒅𝒔𝒐𝒔 Intractable scalability b Example: For a parametric family of polynomials: 𝑝 𝑥 1 , 𝑥 2 =2 𝑥 𝑥 2 4 +𝑎 𝑥 1 3 𝑥 2 +(1−𝑎) 𝑥 1 2 𝑥 2 2 +𝑏 𝑥 1 𝑥 2 3 a
12
Improvements on dsos and sdsos
Replacing sos polynomials by dsos/sdsos polynomials: +: fast bounds - : not always as good quality (compared to sos) Iteratively construct a sequence of improving LP/SOCPs Initialization: Start with the dsos/sdsos polynomials Method: Cholesky change of basis
13
Cholesky change of basis (1/3)
dd in the “right basis” psd but not dd Goal: iteratively improve on basis
14
Cholesky change of basis (2/3)
Initialize min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑸𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 2 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑼 𝒌 𝑻 𝑸 𝑼 𝒌 𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 2 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑼 𝟏 𝑻 𝑸 𝑼 𝟏 𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 1 Replace: 𝑈 1 =𝑐ℎ𝑜𝑙( 𝑄 ∗ ) Step 1 Replace: 𝑈 𝑘 =𝑐ℎ𝑜𝑙( 𝑈 𝑘−1 𝑇 𝑄 ∗ 𝑈 𝑘−1 ) Sos problem min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 New basis 𝒌≔𝒌+𝟏 One iteration of this method on a parametric family of polynomials: 𝑝 𝑥 1 , 𝑥 2 =2 𝑥 𝑥 2 4 +𝑎 𝑥 1 3 𝑥 2 +(1−𝑎) 𝑥 1 2 𝑥 2 2 +𝑏 𝑥 1 𝑥 2 3
15
Cholesky change of basis (3/3)
Theorem: Under mild assumptions, this algorithm converges, i.e., the optimal value/ solution of the sequence of LPs/SOCPs converges to the optimal value/solution of the SDP. Lower bound on optimal value Example: minimizing a degree-4 polynomial in 4 variables
16
This talk Recent efforts to make sos more scalable by avoiding SDP
Using sum of squares to optimize over convex functions
17
Link between nonnegativity and convexity
Optimizing over nonnegative polynomials Link with convexity min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 Relax Nonnegative polynomial in 𝒙 and 𝒚 𝑝(𝑥) convex 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦≥0, ∀𝑥,𝑦∈ ℝ 𝑛 ⇔ 𝐻 𝑝 𝑥 ≽0, ∀𝑥 ⇔ Relax 𝒚 𝑻 𝑯 𝒑 𝒙 𝒚 sos Sos-convexity (SDP)
18
Application 1: 3D geometry problems
19
3D point cloud containment (1/3)
Goal: contain a set of points { 𝑥 𝑖 ∈ ℝ 3 } with convex set of “minimum” volume. Applications: virtual and augmented reality, robotics, computer graphics. Idea: parametrize the convex set as sublevel set of a convex polynomial Sos-convex formulation min 𝑝 Volume Surrogate 𝑠.𝑡. 𝑝 𝑥 𝑖 ≤1 𝑝 sos-convex [In collaboration with Vikas Sindhwani, Ameesh Makadia, Google, NYC]
20
3D point cloud containment (2/3)
Euclidean distance between sets can be computed exactly using SDP. min 𝑥,𝑦 𝑥−𝑦 2 2 s.t. 𝑥∈ 𝑆 1 , 𝑦∈ 𝑆 2 𝑆 1 ≔ 𝑥 𝑔 1 𝑥 ≤1,…, 𝑔 𝑚 𝑥 ≤1} 𝑆 2 ≔ 𝑦 ℎ 1 𝑦 ≤1 ,…, ℎ 𝑝 𝑦 ≤1} 𝑔 1 ,…, 𝑔 𝑚 , ℎ 1 ,… ℎ 𝑝 sos-convex where Polynomial optimization problem where objective and constraints are sos-convex Solution can be computed exactly via SDP using first level of Lasserre’s hierarchy
21
3D point cloud containment (3/3)
Controlling convexity with a parameter 𝑐: min 𝑝 Volume Surrogate 𝑠.𝑡. 𝑝 𝑥 𝑖 ≤1 𝑝 𝑥 +𝑐 𝑖 𝑥 𝑖 2 𝑑 sos-convex When 𝑐=0, we get our previous problem with convex sets. As 𝑐↑, the shape can get less and less convex.
22
Application 2: Difference of convex programming
[INFORMS Computing Society Best Student Paper Prize 2016]
23
Difference of Convex (DC) programming
Problems of the form min 𝑓 0 (𝑥) 𝑠.𝑡. 𝑓 𝑖 𝑥 ≤0 where 𝑓 𝑖 𝑥 ≔ 𝑔 𝑖 𝑥 − ℎ 𝑖 𝑥 , 𝑔 𝑖 , ℎ 𝑖 convex. Applications: Machine Learning (Sparse PCA, Kernel selection, feature selection in SVM) Studied for quadratics, polynomials nice to study question computationally Hiriart-Urruty, 1985 Tuy, 1995
24
Difference of Convex (dc) decomposition
Difference of convex (dc) decomposition: given a polynomial 𝑓, find 𝑔 and ℎ such that 𝒇=𝒈−𝒉, where 𝑔,ℎ convex polynomials. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique?
25
Existence of dc decomposition (1/3)
Theorem: Any polynomial can be written as the difference of two sos-convex polynomials. Corollary: Any polynomial can be written as the difference of two convex polynomials.
26
Existence of dc decomposition (2/3)
Lemma: Let 𝐾 be a full dimensional cone in a vector space 𝐸. Then any 𝑣∈𝐸 can be written as 𝑣= 𝑘 1 − 𝑘 2 , with 𝑘 1 , 𝑘 2 ∈𝐾. Proof sketch: =:𝑘′ ∃ 𝛼<1 such that 1−𝛼 𝑣+𝛼𝑘∈𝐾 E K ⇔𝑣= 1 1−𝛼 𝑘 ′ − 𝛼 1−𝛼 𝑘 To change 𝒌 𝒌′ 𝒗 𝑘 1 ∈𝐾 𝑘 2 ∈𝐾
27
Existence of dc decomposition (3/3)
Here, 𝐸={polynomials of degree 2d, in n variables}, 𝐾={sos-convex polynomials of degree 2d and in n variables }. Remains to show that 𝐾 is full dimensional: Also shows that a decomposition can be obtained efficiently: In fact, we show that a decomposition can be found via LP and SOCP (not covered here). ∑ 𝒙 𝒊 𝟐 𝒅 can be shown to be in the interior of 𝐾. 𝒇=𝒈−𝒉, 𝒈,𝒉 sos-convex solving is an SDP.
28
Uniqueness of dc decomposition
Dc decomposition: given a polynomial 𝑓, find convex polynomials 𝑔 and ℎ such that 𝒇=𝒈−𝒉. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique? Yes Through sos-convexity Alternative decompositions 𝑓 𝑥 = 𝑔 𝑥 +𝑝 𝑥 − ℎ 𝑥 +𝑝 𝑥 𝑝(𝑥) convex Initial decomposition x𝑓 𝑥 =𝑔 𝑥 −ℎ(𝑥) “Best decomposition?”
29
Convex-Concave Procedure (CCP)
Heuristic for minimizing DC programming problems. Idea: Input 𝑘≔0 x 𝑥 0 , initial point 𝑓 𝑖 = 𝑔 𝑖 − ℎ 𝑖 , 𝑖=0,…,𝑚 Convexify by linearizing 𝒉 x 𝒇 𝒊 𝒌 𝒙 = 𝑔 𝑖 𝑥 −( ℎ 𝑖 𝑥 𝑘 +𝛻 ℎ 𝑖 𝑥 𝑘 𝑇 𝑥− 𝑥 𝑘 ) Solve convex subproblem Take 𝑥 𝑘+1 to be the solution of min 𝑓 0 𝑘 𝑥 𝑠.𝑡. 𝑓 𝑖 𝑘 𝑥 ≤0, 𝑖=1,…,𝑚 convex convex affine 𝑘≔𝑘+1 𝒇 𝒊 𝒌 𝒙 𝒇 𝒊 (𝒙)
30
Convex-Concave Procedure (CCP)
Toy example: min 𝑥 𝑓 𝑥 , where 𝑓 𝑥 ≔𝑔 𝑥 −ℎ(𝑥) Convexify 𝑓 𝑥 to obtain 𝑓 0 (𝑥) Initial point: 𝑥 0 =2 Minimize 𝑓 0 (𝑥) and obtain 𝑥 1 Reiterate 𝑥 ∞ 𝑥 3 𝑥 4 𝑥 2 𝑥 1 𝑥 0 𝑥 0
31
Picking the “best” decomposition for CCP
Algorithm Linearize 𝒉 𝒙 around a point 𝑥 𝑘 to obtain convexified version of 𝒇(𝒙) Idea Pick ℎ 𝑥 such that it is as close as possible to affine around 𝑥 𝑘 Mathematical translation Minimize curvature of ℎ at 𝑥 𝑘 Worst-case curvature* min g,h 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑥 𝑘 ) s.t. 𝑓=𝑔−ℎ 𝑔,ℎ convex Average curvature* min 𝑔,ℎ 𝑇𝑟 𝐻 ℎ ( 𝑥 𝑘 ) s.t. 𝑓=𝑔−ℎ, 𝑔,ℎ convex * 𝜆 𝑚𝑎𝑥 𝐻 ℎ 𝑥 𝑘 = max 𝑦∈ 𝑆 𝑛−1 𝑦 𝑇 𝐻 ℎ 𝑥 𝑘 𝑦 * 𝑇𝑟 𝐻 ℎ 𝑥 𝑘 = 𝑦∈ 𝑆 𝑛−1 𝑦 𝑇 𝐻 ℎ 𝑥 𝑘 𝑦 𝑑𝜎
32
Undominated decompositions (1/2)
Definition: g ,ℎ≔𝑔−f is an undominated decomposition of 𝑓 if no other decomposition of 𝑓 can be obtained by subtracting a (nonaffine) convex function from 𝑔. 𝒈 𝒙 = 𝒙 𝟒 + 𝒙 𝟐 , 𝒉 𝒙 =𝟒 𝒙 𝟐 +𝟐𝒙−𝟐 Convexify around 𝑥 0 =2 to get 𝒇 𝟎 𝒙 𝒇 𝒙 = 𝒙 𝟒 −𝟑 𝒙 𝟐 +𝟐𝒙−𝟐 Cannot substract something convex from g and get something convex again. DOMINATED BY 𝒈 ′ 𝒙 = 𝒙 𝟒 , 𝒉 ′ 𝒙 =𝟑 𝒙 𝟐 +𝟐𝒙−𝟐 Convexify around 𝑥 0 =2 to get 𝒇 𝟎′ 𝒙 If 𝒈′ dominates 𝒈 then the next iterate in CCP obtained using 𝒈 ′ always beats the one obtained using 𝒈.
33
Undominated decompositions (2/2)
Theorem: Given a polynomial 𝑓, consider min 1 𝐴 𝑛 𝑆 𝑛−1 𝑇𝑟 𝐻 𝑔 𝑑𝜎 , (where 𝐴 𝑛 = 2 𝜋 𝑛/2 Γ(𝑛/2) ) s.t. 𝑓=𝑔−ℎ, 𝑔 convex, ℎ convex Any optimal solution is an undominated dcd of 𝑓 (and an optimal solution always exists). Theorem: If 𝑓 has degree 4, it is strongly NP-hard to solve (⋆). Idea: Replace 𝑓=𝑔−ℎ, 𝑔, ℎ convex by 𝑓=𝑔−ℎ, 𝑔,ℎ sos-convex. (⋆) 𝑔,ℎ
34
Comparing different decompositions (1/2)
Solving the problem: min 𝐵= 𝑥 𝑥 ≤𝑅} 𝑓 0 , where 𝑓 0 has 𝑛=8 and 𝑑=4. Decompose 𝑓 0 , run CCP for 4 minutes and compare objective value. Feasibility 𝝀 𝒎𝒂𝒙 𝑯 𝒉 ( 𝒙 𝟎 ) Undominated min g,h 𝑡 s.t. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex 𝑡𝐼− 𝐻 ℎ 𝑥 0 ≽0 min 𝑔,ℎ 1 𝐴 𝑛 𝑆 𝑛−1 𝑇𝑟 𝐻 𝑔 𝑑𝜎 𝑠.𝑡. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex min g,h 0 s.t. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex
35
Comparing different decompositions (2/2)
Average over 30 instances Solver: Mosek Computer: 8Gb RAM, 2.40GHz processor Feasibility 𝝀 𝒎𝒂𝒙 𝑯 𝒉 𝒙 𝟎 Undominated Conclusion: Performance of CCP strongly affected by initial decomposition.
36
Cholesky change of basis
Main messages (1/2) Optimizing over nonnegative polynomials has many applications. Sum of squares techniques are powerful relaxations but expensive (SDP). Present more scalable versions of sum of squares (iterative LPs and SOCPs). PSD Cholesky change of basis DD SDD
37
Main messages (2/2) Imposing convexity can be done using sum of squares techniques (leads to sos-convexity). Sos-convexity can be used for 3D geometry problems. Sos-convexity can be used in DCP, to decompose a polynomial into a difference of convex polynomials. The choice of the decomposition impacts performance of CCP. 𝑝(𝑥) convex 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦≥0, ∀𝑥,𝑦∈ ℝ 𝑛 ⇔
38
Thank you for listening
Questions? Want to learn more?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.