Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Name: Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Uploaded: 2017-12-12T20:29:35+00:00
Duration: PTM22S6
Channel: Hollie Cole
Description: Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Sum of squares optimization: scalability improvements and applications to difference of convex programming. Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Nonnegative polynomials
A polynomial 𝑝 𝑥 ≔𝑝 𝑥 1 ,…, 𝑥 𝑛 is nonnegative if 𝑝 𝑥 ≥0,∀𝑥∈ ℝ 𝑛 . 𝑝 𝑥 = 𝑥 4 −5 𝑥 2 −𝑥+10 Is this polynomial nonnegative?

Optimizing over nonnegative polynomials (1/3)
Interested in more than checking nonnegativity of a given polynomial Problems of the type: Linear objective and affine constraints in the coefficients of 𝑝 (e.g., sum of coefs =1) min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 Decision variables are the coefficients of the polynomial 𝑝 Nonnegativity condition Why would we be interested in problems of this type?

Optimization: polynomial optimization 𝜸 ∗ min 𝑥 𝑝(𝑥) 𝑠.𝑡. 𝑓 𝑖 𝑥 ≤0 𝑔 𝑗 𝑥 =0 max 𝛾 𝛾 𝑠.𝑡. 𝑝 𝑥 −𝛾≥0, ∀𝑥∈{ 𝑓 𝑖 𝑥 ≤0, 𝑔 𝑗 𝑥 =0} Optimal power flow problem Combinatorial optimization problems Economics and game theory Sensor network localization

Controls: Automated search for Lyapunov functions for dynamical systems Software verification Statistics: Convex regression

Imposing nonnegativity (1/3)
Is this polynomial nonnegative? NP-hard to decide for degree ≥4. What if 𝑝 can be written as a sum of squares (sos)?

A polynomial 𝑝(𝑥) of degree 2d is sos if and only if ∃𝑄≽0 such that where 𝑧= 1, 𝑥 1 ,…, 𝑥 𝑛 , 𝑥 1 𝑥 2 ,…, 𝑥 𝑛 𝑑 𝑇 is the vector of monomials of degree up to 𝑑. Example: Sufficient condition but not necessary – we don’t lose that much

Initial optimization problem: min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 Sum of squares relaxation: min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 Intractable Equivalent semidefinite programming formulation: min 𝑝,𝑄 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝑝=𝑧 𝑥 𝑇 𝑄𝑧 𝑥 𝑄≽0 But: Size of 𝑄= 𝑛+𝑑 𝑑 × 𝑛+𝑑 𝑑

This talk Recent efforts to make sos more scalable by avoiding SDP
Using sum of squares to optimize over convex functions

Alternatives to sum of squares: dsos and sdsos
Sum of squares (sos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄≽0 SDP DD cone ≔ 𝑸 𝑸 𝒊𝒊 ≥ 𝒋 𝑸 𝒊𝒋 , ∀𝒊} PSD cone≔ 𝑸 𝑸≽𝟎} SDD cone ≔ 𝑸 ∃ diagonal 𝑫 with 𝑫 𝒊𝒊 >𝟎 s.t. 𝑫𝑸𝑫 𝒅𝒅} Diagonally dominant sum of squares (dsos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙𝑙𝑦 𝑑𝑜𝑚𝑖𝑛𝑎𝑛𝑡 (dd) LP Scaled diagonally dominant sum of squares (sdsos) 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄 𝑠𝑐𝑎𝑙𝑒𝑑 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙𝑙𝑦 𝑑𝑜𝑚𝑖𝑛𝑎𝑛𝑡 (sdd) SOCP Ahmadi, Majumdar

Alternatives to sum of squares: dsos and sdsos
Initial optimization problem: min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒅𝒔𝒐𝒔/𝒔𝒅𝒔𝒐𝒔 Intractable scalability b Example: For a parametric family of polynomials: 𝑝 𝑥 1 , 𝑥 2 =2 𝑥 𝑥 2 4 +𝑎 𝑥 1 3 𝑥 2 +(1−𝑎) 𝑥 1 2 𝑥 2 2 +𝑏 𝑥 1 𝑥 2 3 a

Improvements on dsos and sdsos
Replacing sos polynomials by dsos/sdsos polynomials: +: fast bounds - : not always as good quality (compared to sos) Iteratively construct a sequence of improving LP/SOCPs Initialization: Start with the dsos/sdsos polynomials Method: Cholesky change of basis

Cholesky change of basis (1/3)
dd in the “right basis” psd but not dd Goal: iteratively improve on basis

Initialize min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑸𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 2 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑼 𝒌 𝑻 𝑸 𝑼 𝒌 𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 2 min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑=𝒛 𝒙 𝑻 𝑼 𝟏 𝑻 𝑸 𝑼 𝟏 𝒛 𝒙 , 𝑸 𝒅𝒅/𝒔𝒅𝒅 Step 1 Replace: 𝑈 1 =𝑐ℎ𝑜𝑙( 𝑄 ∗ ) Step 1 Replace: 𝑈 𝑘 =𝑐ℎ𝑜𝑙( 𝑈 𝑘−1 𝑇 𝑄 ∗ 𝑈 𝑘−1 ) Sos problem min 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 New basis 𝒌≔𝒌+𝟏 One iteration of this method on a parametric family of polynomials: 𝑝 𝑥 1 , 𝑥 2 =2 𝑥 𝑥 2 4 +𝑎 𝑥 1 3 𝑥 2 +(1−𝑎) 𝑥 1 2 𝑥 2 2 +𝑏 𝑥 1 𝑥 2 3

Theorem: Under mild assumptions, this algorithm converges, i.e., the optimal value/ solution of the sequence of LPs/SOCPs converges to the optimal value/solution of the SDP. Lower bound on optimal value Example: minimizing a degree-4 polynomial in 4 variables

This talk Recent efforts to make sos more scalable by avoiding SDP
Using sum of squares to optimize over convex functions

Link between nonnegativity and convexity
Optimizing over nonnegative polynomials Link with convexity min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒙 ≥𝟎, ∀𝒙 min 𝑝 𝐶(𝑝 ) 𝑠.𝑡. 𝐴 𝑝 =𝑏 𝒑 𝒔𝒐𝒔 Relax Nonnegative polynomial in 𝒙 and 𝒚 𝑝(𝑥) convex 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦≥0, ∀𝑥,𝑦∈ ℝ 𝑛 ⇔ 𝐻 𝑝 𝑥 ≽0, ∀𝑥 ⇔ Relax 𝒚 𝑻 𝑯 𝒑 𝒙 𝒚 sos Sos-convexity (SDP)

Application 1: 3D geometry problems

3D point cloud containment (1/3)
Goal: contain a set of points { 𝑥 𝑖 ∈ ℝ 3 } with convex set of “minimum” volume. Applications: virtual and augmented reality, robotics, computer graphics. Idea: parametrize the convex set as sublevel set of a convex polynomial Sos-convex formulation min 𝑝 Volume Surrogate 𝑠.𝑡. 𝑝 𝑥 𝑖 ≤1 𝑝 sos-convex [In collaboration with Vikas Sindhwani, Ameesh Makadia, Google, NYC]

Euclidean distance between sets can be computed exactly using SDP. min 𝑥,𝑦 𝑥−𝑦 2 2 s.t. 𝑥∈ 𝑆 1 , 𝑦∈ 𝑆 2 𝑆 1 ≔ 𝑥 𝑔 1 𝑥 ≤1,…, 𝑔 𝑚 𝑥 ≤1} 𝑆 2 ≔ 𝑦 ℎ 1 𝑦 ≤1 ,…, ℎ 𝑝 𝑦 ≤1} 𝑔 1 ,…, 𝑔 𝑚 , ℎ 1 ,… ℎ 𝑝 sos-convex where Polynomial optimization problem where objective and constraints are sos-convex Solution can be computed exactly via SDP using first level of Lasserre’s hierarchy

Controlling convexity with a parameter 𝑐: min 𝑝 Volume Surrogate 𝑠.𝑡. 𝑝 𝑥 𝑖 ≤1 𝑝 𝑥 +𝑐 𝑖 𝑥 𝑖 2 𝑑 sos-convex When 𝑐=0, we get our previous problem with convex sets. As 𝑐↑, the shape can get less and less convex.

Application 2: Difference of convex programming
[INFORMS Computing Society Best Student Paper Prize 2016]

Difference of Convex (DC) programming
Problems of the form min 𝑓 0 (𝑥) 𝑠.𝑡. 𝑓 𝑖 𝑥 ≤0 where 𝑓 𝑖 𝑥 ≔ 𝑔 𝑖 𝑥 − ℎ 𝑖 𝑥 , 𝑔 𝑖 , ℎ 𝑖 convex. Applications: Machine Learning (Sparse PCA, Kernel selection, feature selection in SVM) Studied for quadratics, polynomials nice to study question computationally Hiriart-Urruty, 1985 Tuy, 1995

Difference of Convex (dc) decomposition
Difference of convex (dc) decomposition: given a polynomial 𝑓, find 𝑔 and ℎ such that 𝒇=𝒈−𝒉, where 𝑔,ℎ convex polynomials. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique?

Existence of dc decomposition (1/3)
Theorem: Any polynomial can be written as the difference of two sos-convex polynomials. Corollary: Any polynomial can be written as the difference of two convex polynomials.

Lemma: Let 𝐾 be a full dimensional cone in a vector space 𝐸. Then any 𝑣∈𝐸 can be written as 𝑣= 𝑘 1 − 𝑘 2 , with 𝑘 1 , 𝑘 2 ∈𝐾. Proof sketch: =:𝑘′ ∃ 𝛼<1 such that 1−𝛼 𝑣+𝛼𝑘∈𝐾 E K ⇔𝑣= 1 1−𝛼 𝑘 ′ − 𝛼 1−𝛼 𝑘 To change 𝒌 𝒌′ 𝒗 𝑘 1 ∈𝐾 𝑘 2 ∈𝐾

Here, 𝐸={polynomials of degree 2d, in n variables}, 𝐾={sos-convex polynomials of degree 2d and in n variables }. Remains to show that 𝐾 is full dimensional: Also shows that a decomposition can be obtained efficiently: In fact, we show that a decomposition can be found via LP and SOCP (not covered here). ∑ 𝒙 𝒊 𝟐 𝒅 can be shown to be in the interior of 𝐾. 𝒇=𝒈−𝒉, 𝒈,𝒉 sos-convex solving is an SDP.

Uniqueness of dc decomposition
Dc decomposition: given a polynomial 𝑓, find convex polynomials 𝑔 and ℎ such that 𝒇=𝒈−𝒉. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique?  Yes  Through sos-convexity Alternative decompositions 𝑓 𝑥 = 𝑔 𝑥 +𝑝 𝑥 − ℎ 𝑥 +𝑝 𝑥 𝑝(𝑥) convex Initial decomposition x𝑓 𝑥 =𝑔 𝑥 −ℎ(𝑥) “Best decomposition?”

Convex-Concave Procedure (CCP)
Heuristic for minimizing DC programming problems. Idea: Input 𝑘≔0 x 𝑥 0 , initial point 𝑓 𝑖 = 𝑔 𝑖 − ℎ 𝑖 , 𝑖=0,…,𝑚 Convexify by linearizing 𝒉 x 𝒇 𝒊 𝒌 𝒙 = 𝑔 𝑖 𝑥 −( ℎ 𝑖 𝑥 𝑘 +𝛻 ℎ 𝑖 𝑥 𝑘 𝑇 𝑥− 𝑥 𝑘 ) Solve convex subproblem Take 𝑥 𝑘+1 to be the solution of min 𝑓 0 𝑘 𝑥 𝑠.𝑡. 𝑓 𝑖 𝑘 𝑥 ≤0, 𝑖=1,…,𝑚 convex convex affine 𝑘≔𝑘+1 𝒇 𝒊 𝒌 𝒙 𝒇 𝒊 (𝒙)

Convex-Concave Procedure (CCP)
Toy example: min 𝑥 𝑓 𝑥 , where 𝑓 𝑥 ≔𝑔 𝑥 −ℎ(𝑥) Convexify 𝑓 𝑥 to obtain 𝑓 0 (𝑥) Initial point: 𝑥 0 =2 Minimize 𝑓 0 (𝑥) and obtain 𝑥 1 Reiterate 𝑥 ∞ 𝑥 3 𝑥 4 𝑥 2 𝑥 1 𝑥 0 𝑥 0

Picking the “best” decomposition for CCP
Algorithm Linearize 𝒉 𝒙 around a point 𝑥 𝑘 to obtain convexified version of 𝒇(𝒙) Idea Pick ℎ 𝑥 such that it is as close as possible to affine around 𝑥 𝑘 Mathematical translation Minimize curvature of ℎ at 𝑥 𝑘 Worst-case curvature* min g,h 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑥 𝑘 ) s.t. 𝑓=𝑔−ℎ 𝑔,ℎ convex Average curvature* min 𝑔,ℎ 𝑇𝑟 𝐻 ℎ ( 𝑥 𝑘 ) s.t. 𝑓=𝑔−ℎ, 𝑔,ℎ convex * 𝜆 𝑚𝑎𝑥 𝐻 ℎ 𝑥 𝑘 = max 𝑦∈ 𝑆 𝑛−1 𝑦 𝑇 𝐻 ℎ 𝑥 𝑘 𝑦 * 𝑇𝑟 𝐻 ℎ 𝑥 𝑘 = 𝑦∈ 𝑆 𝑛−1 𝑦 𝑇 𝐻 ℎ 𝑥 𝑘 𝑦 𝑑𝜎

Undominated decompositions (1/2)
Definition: g ,ℎ≔𝑔−f is an undominated decomposition of 𝑓 if no other decomposition of 𝑓 can be obtained by subtracting a (nonaffine) convex function from 𝑔. 𝒈 𝒙 = 𝒙 𝟒 + 𝒙 𝟐 , 𝒉 𝒙 =𝟒 𝒙 𝟐 +𝟐𝒙−𝟐 Convexify around 𝑥 0 =2 to get 𝒇 𝟎 𝒙 𝒇 𝒙 = 𝒙 𝟒 −𝟑 𝒙 𝟐 +𝟐𝒙−𝟐 Cannot substract something convex from g and get something convex again. DOMINATED BY 𝒈 ′ 𝒙 = 𝒙 𝟒 , 𝒉 ′ 𝒙 =𝟑 𝒙 𝟐 +𝟐𝒙−𝟐 Convexify around 𝑥 0 =2 to get 𝒇 𝟎′ 𝒙 If 𝒈′ dominates 𝒈 then the next iterate in CCP obtained using 𝒈 ′ always beats the one obtained using 𝒈.

Undominated decompositions (2/2)
Theorem: Given a polynomial 𝑓, consider min 1 𝐴 𝑛 𝑆 𝑛−1 𝑇𝑟 𝐻 𝑔 𝑑𝜎 , (where 𝐴 𝑛 = 2 𝜋 𝑛/2 Γ(𝑛/2) ) s.t. 𝑓=𝑔−ℎ, 𝑔 convex, ℎ convex Any optimal solution is an undominated dcd of 𝑓 (and an optimal solution always exists). Theorem: If 𝑓 has degree 4, it is strongly NP-hard to solve (⋆). Idea: Replace 𝑓=𝑔−ℎ, 𝑔, ℎ convex by 𝑓=𝑔−ℎ, 𝑔,ℎ sos-convex. (⋆) 𝑔,ℎ

Comparing different decompositions (1/2)
Solving the problem: min 𝐵= 𝑥 𝑥 ≤𝑅} 𝑓 0 , where 𝑓 0 has 𝑛=8 and 𝑑=4. Decompose 𝑓 0 , run CCP for 4 minutes and compare objective value. Feasibility 𝝀 𝒎𝒂𝒙 𝑯 𝒉 ( 𝒙 𝟎 ) Undominated min g,h 𝑡 s.t. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex 𝑡𝐼− 𝐻 ℎ 𝑥 0 ≽0 min 𝑔,ℎ 1 𝐴 𝑛 𝑆 𝑛−1 𝑇𝑟 𝐻 𝑔 𝑑𝜎 𝑠.𝑡. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex min g,h 0 s.t. 𝑓 0 =𝑔−ℎ 𝑔,ℎ sos-convex

Comparing different decompositions (2/2)
Average over 30 instances Solver: Mosek Computer: 8Gb RAM, 2.40GHz processor Feasibility 𝝀 𝒎𝒂𝒙 𝑯 𝒉 𝒙 𝟎 Undominated Conclusion: Performance of CCP strongly affected by initial decomposition.

Cholesky change of basis
Main messages (1/2) Optimizing over nonnegative polynomials has many applications. Sum of squares techniques are powerful relaxations but expensive (SDP). Present more scalable versions of sum of squares (iterative LPs and SOCPs). PSD Cholesky change of basis DD SDD

Main messages (2/2) Imposing convexity can be done using sum of squares techniques (leads to sos-convexity). Sos-convexity can be used for 3D geometry problems. Sos-convexity can be used in DCP, to decompose a polynomial into a difference of convex polynomials. The choice of the decomposition impacts performance of CCP. 𝑝(𝑥) convex 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦≥0, ∀𝑥,𝑦∈ ℝ 𝑛 ⇔

Thank you for listening
Questions? Want to learn more?

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Similar presentations

Presentation on theme: "Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Similar presentations

Presentation on theme: "Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi"— Presentation transcript:

Similar presentations

About project

Feedback