Amir Ali Ahmadi (Princeton University) Iterative LP and SOCP-based approximations to semidefinite and sum of squares programs Georgina Hall Princeton University Joint work with: Amir Ali Ahmadi (Princeton University) Sanjeeb Dash (IBM)
Semidefinite programming: definition A semidefinite program is an optimization problem of the form: Problem data: 𝐶, 𝐴 𝑖 ∈ 𝑆 𝑛×𝑛 , 𝑏 𝑖 ∈ℝ. min X∈ S 𝑛×𝑛 𝑇𝑟(𝐶𝑋) s.t. 𝑇𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 , 𝑖=1,…,𝑚 𝑋≽0
Semidefinite programming: two applications Nonnegativity of a polynomial 𝒑 of degree 𝟐𝒅 Copositivity of a square matrix 𝑴 𝑝(𝑥) nonnegative 𝑀 copositive ⇔ 𝑥 𝑇 𝑀𝑥≥0 ∀𝑥≥0 𝑝 𝑥 1 ,…, 𝑥 𝑛 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄≽0 (𝑧 𝑥 : monomial vector of degree 𝑑) 𝑀=𝑃+𝑁 with 𝑃≽0, 𝑁≥0 Polynomial optimization Stability number of a graph min 𝑥∈ ℝ 𝑛 𝑝(𝑥) max 𝛾 𝛾 s.t. 𝑝 𝑥 −𝛾≥0, ∀𝑥 𝛼 𝐺 =min 𝜆 s.t. 𝜆 𝐼+𝐴 −𝐽 copositive ⇔ ⇒ ⇐ min 𝜆 s.t. 𝜆 𝐼+𝐴 −𝐽−N≽0,𝑁≥0 max 𝛾 𝛾 s.t. 𝑝 𝑥 −𝛾=𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄≽0 (De Klerk, Pasechnik)
Semidefinite programming: pros and cons +: high-quality bounds for non convex problems. - : can take too long to solve if psd constraint is too big. Polynomial optimization 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 , 𝑄≽0, (𝑝 degree 2𝑑 and in 𝑛 vars) 𝑄 : size 𝑛+𝑑 𝑑 × 𝑛+𝑑 𝑑 Stability number of a graph 𝑃:=𝜆 𝐼+𝐴 −𝐽−𝑁≽0 𝑃 : size 𝑛×𝑛 (𝑛: # of nodes in the graph) Typical laptop cannot minimize a degree-4 polynomial in more than a couple dozen variables. (PC: 3.4 GHz, 16 Gb RAM)
LP and SOCP-based inner approximations (1/2) Problem SDPs do not scale well Idea Replace psd cone by LP or SOCP representable cones Candidate (Ahmadi, Majumdar) LP: A dd SOCP: 𝐴 sdd A is diagonally dominant (dd) if 𝑎 𝑖𝑖 ≥ 𝑗≠𝑖 |𝑎 𝑖𝑗 | , ∀𝑖. A is scaled diagonally dominant (sdd) if ∃ a diagonal matrix 𝐷>0 s.t. 𝐷𝐴𝐷 is dd. 𝐷 𝐷 𝑛 ⊆𝑆𝐷 𝐷 𝑛 ⊆𝑃𝑆 𝐷 𝑛
LP and SOCP-based inner approximations (2/2) Replacing psd cone by dd/sdd cones: +: fast bounds - : not always as good quality (compared to SDP) Iteratively construct a sequence of improving LP/SOCP-based cones Initialization: Start with the DD/SDD cone Method 1: Cholesky change of basis Method 2: Column Generation Method 3: r-s/dsos hierarchy
Method 1:Cholesky change of basis (1/7) dd in the “right basis” Goal: iteratively improve on basis 𝑧 𝑥 . psd but not dd
Method 1:Cholesky change of basis (2/7) ? 𝑝 𝑥 =𝑧 𝑥 𝑇 𝑄𝑧 𝑥 𝑄≽0 𝑝 𝑥 = 𝑧 𝑥 𝑇 𝑄 𝑧 𝑥 𝑄 diagonal or dd Cholesky decomposition 𝑄=𝐿 𝐿 𝑇 ⇒𝑝 𝑥 = 𝐿 𝑇 𝑧 𝑥 𝑇 𝐼( 𝐿 𝑇 𝑧 𝑥 ) LDL decomposition 𝑄=𝐿𝐷 𝐿 𝑇 ⇒𝑝 𝑥 = 𝐿 𝑇 𝑧 𝑥 𝑇 𝐷( 𝐿 𝑇 𝑧 𝑥 ) Eigenvalue decomposition 𝑄= Ω 𝑇 ΔΩ⇒𝑝 𝑥 = Ω𝑧 𝑥 𝑇 Δ(Ω𝑧 𝑥 ) Gives 𝑄 =𝐼 Computationally efficient …
Method 1: Cholesky change of basis (3/7) SDP to solve min 𝑇𝑟(𝐶𝑋 ) s.t. 𝑇𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 , 𝑋≽0 Initialize Solve: min 𝑇𝑟(𝐶𝑋 ) s.t. 𝑇𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 , 𝑋 dd/sdd Step 1 Compute: 𝑈 𝑘 =𝑐ℎ𝑜𝑙( 𝑋 ∗ ) Step 2 Solve: min 𝑇𝑟(𝐶𝑋 ) s.t. 𝑇𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 , 𝑋= 𝑈 𝑘 𝑇 𝑄 𝑈 𝑘 , 𝑄 dd/sdd 𝒌≔𝒌+𝟏 One iteration of this method on 𝐼+𝑥𝐴+𝑦𝐵, 𝐴,𝐵 fixed 10×10, generated randomly
Method 1: Cholesky change of basis (4/7) Example 1: minimizing a form over the sphere Theorem: The 1st iteration of Cholesky is always feasible for this problem. Proof: max 𝛾 𝑠.𝑡. 𝑝 𝑥 −𝛾 𝑖 𝑥 𝑖 2 𝑑 dsos First iteration of the method Need to show: is feasible. min 𝑝(𝑥) 𝑠.𝑡. 𝑖 𝑥 𝑖 2 =1 Initial problem max 𝛾 𝑠.𝑡. 𝑝 𝑥 −𝛾 𝑖 𝑥 𝑖 2 𝑑 sos SOS relaxation Idea: 𝑫𝑺𝑶 𝑺 𝒏,𝟐𝒅 ∃𝛼>0 s.t. 𝛼𝑠 𝑥 + 1−𝛼 𝑝 (𝑥) is dsos ⇒∃𝛼>0 s.t. 𝛼 1−𝛼 𝑠 𝑥 + 𝑝 (𝑥) is dsos 𝑠 𝑥 ≔ 𝑖 𝑥 𝑖 2 𝑑
Method 1: Cholesky change of basis (5/7) Example 1: minimizing a degree-4 polynomial in 4 variables (cont’d)
Method 1: Cholesky change of basis (6/7) Example 2: SDP relaxation for computing the stability number Theorem: The 1st iteration of Cholesky is always feasible for this problem. Proof: min 𝜆 𝑠.𝑡. 𝜆(𝐼+𝐴)−𝐽=𝑃+𝑁 𝑃≽0, 𝑁≥0 Initial problem min 𝜆 𝑠.𝑡. 𝜆(𝐼+𝐴)−𝐽=𝑃+𝑁 𝑃 𝑑𝑑, 𝑁≥0 1st iteration of the method Need to show: is feasible. = 𝜆−1 if 𝑖,𝑗 ∈𝐸 (𝑎𝑡 𝑚𝑜𝑠𝑡 𝑑 𝑖 ) −1 if 𝑖,𝑗 ∉𝐸 (𝑎𝑡 𝑚𝑜𝑠𝑡 𝑛− 𝑑 𝑖 ) 𝜆(𝐼+𝐴)−𝐽= 𝜆−1 ∗ ∗ ∗ ⋱ ∗ ∗ ∗ 𝜆−1 = 𝜆−1 ∗ ∗ ∗ ⋱ ∗ ∗ ∗ 𝜆−1 + 0 ∗ ∗ ∗ 0 ∗ ∗ ∗ 0 = 0 if 𝑖,𝑗 ∈𝐸 −1 if 𝑖,𝑗 ∉𝐸 , 𝜆=𝑛− min 𝑖 𝑑 𝑖 +1 = 𝜆−1 if 𝑖,𝑗 ∈𝐸 0 if 𝑖,𝑗 ∉𝐸 𝑷 𝑵
Method 1: Cholesky change of basis (7/7) Example 2: stability number of the Petersen graph 100 instances of an Erdös-Rényi graph with 𝑛,𝑝 =(20,0.5)
Method 2: Column generation (1/6) Facet description 𝑃= 𝑥 𝐴𝑥≤𝑏}, where 𝐴= 𝑎 1 𝑇 ⋮ 𝑎 𝑚 𝑇 , 𝑏∈ ℝ 𝑚 𝑃 𝑎 𝑖 𝑎 𝑗 Corner description 𝑃={ 𝑖 𝛼 𝑖 𝑦 𝑖 | 𝛼 𝑖 ≥0 , 𝑖 𝛼 𝑖 =1} 𝑦 𝑖 𝑦 𝑗 Polytope 𝑃 DD cone SDD cone Facet description 𝐷 𝐷 𝑛 = 𝑀 𝑀 𝑖𝑖 ≥ 𝑗≠𝑖 |𝑀 𝑖𝑗 |, ∀𝑖 } 𝑆𝐷 𝐷 𝑛 = 𝑀 ∃ diagonal 𝐷>0 s.t. 𝐷𝑀𝐷 dd} Extreme ray description 𝑣 𝑖 has at most two nonzero components =±1 𝑉 𝐷𝐷 :={ 𝑣 𝑖 } 𝑉 𝑖 is all zeros except for one component in each row=1 𝑉 𝑆𝐷𝐷 :={ 𝑉 𝑖 } 𝐷 𝐷 𝑛 = 𝑖 𝑣 𝑖 𝑣 𝑖 T | 𝛼 𝑖 ≥0 + 𝛼 𝑖 𝑆𝐷 𝐷 𝑛 = 𝑖 + 𝑉 𝑖 Λ 𝑖 𝑉 𝑖 𝑇 | Λ 𝑖 ≽0
Method 2: Column Generation (2/6) Main idea: add new atoms to the 𝐷 𝐷 𝑛 /𝑆𝐷 𝐷 𝑛 cones. Initial SDP max 𝑦∈ ℝ 𝑚 𝑏 𝑇 𝑦 𝑠.𝑡. 𝐶− 𝑖=1 𝑚 𝑦 𝑖 𝐴 𝑖 ≽0 Inner approximate with 𝐒𝑫 𝑫 𝒏 max 𝑦∈ ℝ 𝑚 𝑏 𝑇 𝑦 𝑠.𝑡. 𝐶− 𝑖=1 𝑚 𝑦 𝑖 𝐴 𝑖 = 𝑽 𝒊 𝚲 𝒊 𝑽 𝒊 𝑻 𝚲 𝒊 ≽𝟎, 𝑽 𝒊 ∈ 𝑽 𝑺𝑫𝑫 Inner approximate with 𝐷 𝐷 𝑛 max 𝑦∈ ℝ 𝑚 𝑏 𝑇 𝑦 𝑠.𝑡. 𝐶− 𝑖=1 𝑚 𝑦 𝑖 𝐴 𝑖 = 𝛼 𝑖 𝑣 𝑖 𝑣 𝑖 𝑇 𝛼 𝑖 ≥0, 𝑣 𝑖 ∈ 𝑉 𝐷𝐷 Take the dual min 𝑋∈ 𝑆 𝑛 𝑡𝑟(𝐶𝑋) 𝑠.𝑡. 𝑡𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 𝑣 𝑖 𝑇 𝑋 𝑣 𝑖 ≥0 Take the dual min 𝑋∈ 𝑆 𝑛 𝑡𝑟(𝐶𝑋) 𝑠.𝑡. 𝑡𝑟 𝐴 𝑖 𝑋 = 𝑏 𝑖 𝑽 𝒊 𝑻 𝑿 𝑽 𝒊 ≽𝟎 Find a new “atom” Pick 𝑉∈ ℝ 𝑛×2 s.t. 𝑽 𝑻 𝑿𝑽 𝟎. Find a new “atom” Pick 𝑣 s.t. 𝑣 𝑇 𝑋𝑣<0. Add 𝑣 to primal DD inner approx. max 𝑦∈ ℝ 𝑚 𝑏 𝑇 𝑦 𝑠.𝑡. 𝐶− 𝑖=1 𝑚 𝑦 𝑖 𝐴 𝑖 = 𝛼 𝑖 𝑣 𝑖 𝑣 𝑖 𝑇 +𝜶𝒗 𝒗 𝑻 𝛼 𝑖 ≥0, 𝑣 𝑖 ∈ 𝑉 𝐷𝐷 , 𝜶≥𝟎 Add 𝑉 to primal SDD inner approx. max 𝑦∈ ℝ 𝑚 𝑏 𝑇 𝑦 𝑠.𝑡. 𝐶− 𝑖=1 𝑚 𝑦 𝑖 𝐴 𝑖 = 𝑽 𝒊 𝚲 𝒊 𝑽 𝒊 𝑻 +𝑽𝚲 𝐕 𝐓 𝚲 𝒊 ≽𝟎, 𝑽 𝒊 ∈ 𝑽 𝑺𝑫𝑫 , 𝚲≽𝟎 𝑷𝑺𝑫=𝑷𝑺 𝑫 ∗ 𝑫𝑫+1CG 𝑫 𝑫 ∗ 𝑫𝑫
Method 2: Column Generation (3/6) Suggestion 1: Take the eigenvector corresponding to the most negative eigenvalue Idea: Add an atom 𝑣 that satisfies 𝑣 𝑇 𝑋𝑣<0. How to pick such a 𝒗? Suggestion 2: Take 𝑣 with three nonzero elements corresponding to ±1. …
Method 2: Column Generation (4/6) Five iterations of column generation for 𝐼+𝑥𝐴+𝑦𝐵, 𝐴,𝐵 fixed 10×10, generated randomly 𝑦 𝑥 Column generation method: adds atoms to the initial cones Cholesky change of basis method: “linearly transforms” existing atoms 𝑣 𝑖 → 𝑈 𝑇 𝑣 𝑖 , where 𝑈=𝑐ℎ𝑜𝑙( 𝑋 ∗ )
Method 2: Column Generation (5/6) Example 1: stability number of the Petersen graph Petersen Graph Upper bound on the stability number through SDP: min 𝜆 s.t. 𝜆 𝐼+𝐴 −𝐽−N≽0,𝑁≥0 Apply Column Generation technique to this SDP
Method 2: Column Generation (6/6) Example 2: minimizing a degree-4 polynomial 𝒏=𝟏𝟓 𝒏=𝟐𝟎 𝒏=𝟐𝟓 𝒏=𝟑𝟎 𝒏=𝟒𝟎 bd t(s) DSOS -10.96 0.38 -18.01 0.74 -26.45 15.51 -36.52 7.88 -62.30 10.68 SDSOS -10.43 0.53 -17.33 1.06 -25.79 8.72 -36.04 5.65 -61.25 18.66 𝐷𝑆𝑂 𝑆 𝑘 -5.57 31.19 -9.02 471.39 -20.08 600 -32.28 -35.14 SOS -3.26 5.60 -3.58 82.22 -3.71 1068.66 NA 𝒏=𝟏𝟓 𝒏=𝟐𝟎 𝒏=𝟐𝟓 𝒏=𝟑𝟎 𝒏=𝟒𝟎 bd t(s) 𝐷𝑆𝑂 𝑆 𝑘 -6.20 10.96 -12.38 70.70 -20.08 508.63 NA 1SDSOS -8.97 14.39 -15.29 82.30 -23.14 538.54
Method 3: r-s/dsos hierarchy (1/3) A polynomial 𝑝 is r-dsos if 𝑝 𝑥 𝑖 𝑥 𝑖 2 𝑟 is dsos. A polynomial 𝑝 is r-sdsos if 𝑝 𝑥 𝑖 𝑥 𝑖 2 𝑟 is sdsos. Defines a hierarchy based on 𝑟. Proof of positivity using LP. Theorem [Ahmadi, Majumdar] Any even positive definite form p is r-dsos for some r. Proof: Follows from a result by Polya.
Method 3: r-s/dsos hierarchy (2/3) Example: certifying stability of a switched linear system 𝑥 𝑘+1 = 𝐴 𝜎 𝑘 𝑥 𝑘 where 𝐴 𝜎 𝑘 ∈{ 𝐴 1 ,…, 𝐴 𝑚 } Recall: Theorem 2 [Parrilo, Jadbabaie]: 𝜌 𝐴 1 ,…, 𝐴 𝑚 <1 ⇔ ∃ a pd polynomial Lyapunov function 𝑉(𝑥) such that 𝑉 𝑥 −𝑉 𝐴 𝑖 𝑥 >0, ∀𝑥≠0. Theorem 1: A switched linear system is stable if and only if 𝜌 𝐴 1 …, 𝐴 𝑚 <1.
Method 3: r-s/dsos hierarchy (3/3) Theorem: For nonnegative A 1 ,…, A m , 𝜌 𝐴 1 ,…, 𝐴 𝑚 <1⇔ ∃ 𝑟∈ ℕ and a polynomial Lyapunov function 𝑉(𝑥) such that 𝑉 𝑥 . 2 r-dsos and 𝑉 𝑥 . 2 −𝑉 𝐴 𝑖 𝑥 . 2 r-dsos. Proof: ⇐ ⋆ ⇒ 𝑉 𝑥 ≥0 and 𝑉 𝑥 −𝑉 𝐴 𝑖 𝑥 ≥0 for any 𝑥≥0. Combined to 𝐴 𝑖 ≥0, this implies that 𝑥 𝑘+1 = 𝐴 𝜎(𝑘) 𝑥 𝑘 is stable for 𝑥 0 ≥0. This can be extended to any 𝑥 0 by noting that 𝑥 0 = 𝑥 0 + − 𝑥 0 − , 𝑥 0 + , 𝑥 0 − ≥0. (⇒) From Theorem 2, and using Polya’s result as 𝑉(𝑥 . 2 ) and 𝑉 𝑥 . 2 −𝑉 𝐴 𝑖 𝑥 . 2 are even forms. (⋆)
Cholesky change of basis Main messages Can construct iterative inner approximations of the cone of nonnegative polynomials using LPs and SOCPs. Presented three methods: Cholesky change of basis Column Generation r-s/dsos hierarchies Initialization Initialize with dsos/sdsos polynomials (or dd/sdd matrices) Method Rotate existing “atoms” of the cone of dsos/sdsos polynomials Add new atoms to the extreme rays of the cone of dsos/sdsos polynomials Use multipliers to certify nonnegativity of more polynomials. Size of the LP/SOCPs obtained Does not grow (but possibly denser) Grows slowly Grows quickly Objective taken into consideration Yes No Can beat the SOS bound
Thank you for listening Questions? Want to learn more? http://scholar.princeton.edu/ghall/home