Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Slides:

Advertisements

Similar presentations

1 A Convex Polynomial that is not SOS-Convex Amir Ali Ahmadi Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of.

Advertisements

Convex Programming Brookes Vision Reading Group. Huh? What is convex ??? What is programming ??? What is convex programming ???

Introduction to Support Vector Machines (SVM)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.

SVM—Support Vector Machines

Sum of Squares and SemiDefinite Programmming Relaxations of Polynomial Optimization Problems The 2006 IEICE Society Conference Kanazawa, September 21,

Games, Proofs, Norms, and Algorithms Boaz Barak – Microsoft Research Based (mostly) on joint works with Jonathan Kelner and David Steurer.

Rounding Sum of Squares Relaxations Boaz Barak – Microsoft Research Joint work with Jonathan Kelner (MIT) and David Steurer (Cornell) workshop on semidefinite.

A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz.

1 Computational Complexity (Ctnd.) ORF 523 Lecture 15 Instructor: Amir Ali Ahmadi TA: G. Hall Spring 2015.

Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.

Optimization in Financial Engineering Yuriy Zinchenko Department of Mathematics and Statistics University of Calgary December 02, 2009.

Continuous optimization Problems and successes

Learning Structural SVMs with Latent Variables Xionghao Liu.

Shape From Light Field meets Robust PCA

Probabilistic Graph and Hypergraph Matching

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Jie Gao Joint work with Amitabh Basu*, Joseph Mitchell, Girishkumar Stony Brook Distributed Localization using Noisy Distance and Angle Information.

Semidefinite Programming

The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University.

Computing the Rational Univariate Reduction by Sparse Resultants Koji Ouchi, John Keyser, J. Maurice Rojas Department of Computer Science, Mathematics.

Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!

Chebyshev Estimator Presented by: Orr Srour. References Yonina Eldar, Amir Beck and Marc Teboulle, "A Minimax Chebyshev Estimator for Bounded Error Estimation"

Pablo A. Parrilo ETH Zürich SOS Relaxations for System Analysis: Possibilities and Perspectives Pablo A. Parrilo ETH Zürich control.ee.ethz.ch/~parrilo.

Unconstrained Optimization Problem

Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.

Lecture 10: Support Vector Machines

Extensions of submodularity and their application in computer vision

Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.

Pablo A. Parrilo ETH Zürich Semialgebraic Relaxations and Semidefinite Programs Pablo A. Parrilo ETH Zürich control.ee.ethz.ch/~parrilo.

Low-Rank Solution of Convex Relaxation for Optimal Power Flow Problem

Bold Stroke January 13, 2003 Advanced Algorithms CS 539/441 OR In Search Of Efficient General Solutions Joe Hoffert

Sparsity in Polynomial Optimization IMA Annual Program Year Workshop "Optimization and Control" Minneapolis, January 16-20, 2007 Masakazu Kojima Tokyo.

Probabilistic Graphical Models

An Introduction to Support Vector Machines (M. Law)

Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

§1.4 Algorithms and complexity For a given (optimization) problem, Questions: 1)how hard is the problem. 2)does there exist an efficient solution algorithm?

Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.

Robust Optimization and Applications Laurent El Ghaoui IMA Tutorial, March 11, 2003.

1 Enclosing Ellipsoids of Semi-algebraic Sets and Error Bounds in Polynomial Optimization Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

半年工作小结报告人：吕小惠 2011 年 8 月 25 日. 报告提纲一．学习了 Non-negative Matrix Factorization convergence proofs 二．学习了 Sparse Non-negative Matrix Factorization 算法三．学习了线性代数中有关子空间等基础知.

Linear Programming Chapter 1 Introduction.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.

Iterative LP and SOCP-based approximations to semidefinite and sum of squares programs Georgina Hall Princeton University Joint work with: Amir Ali Ahmadi.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

ベーテ自由エネルギーに対するCCCPアルゴリズムの拡張

Sathya Ronak Alisha Zach Devin Josh

Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides

Multiplicative updates for L1-regularized regression

Polynomial Norms Amir Ali Ahmadi (Princeton University) Georgina Hall

Amir Ali Ahmadi (Princeton University)

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi

Nonnegative polynomials and applications to learning

Support Vector Machines Introduction to Data Mining, 2nd Edition by

Polynomial DC decompositions

Local Gain Analysis of Nonlinear Systems

Chapter 6. Large Scale Optimization

Mathematical Programming (towards programming with math)

Usman Roshan CS 675 Machine Learning

A particular discrete dynamical program used as a model for one specific situation in chess involving knights and rooks 22nd EUROPEAN CONFERENCE ON OPERATIONS.

Non-Negative Matrix Factorization

Chapter 6. Large Scale Optimization

Presentation transcript:

Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi Difference of Convex (DC) Decomposition of Nonconvex Polynomials with Algebraic Techniques Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi 7/13/2015 MOPTA 2015

Difference of Convex (DC) programming Problems of the form min 𝑓 0 (𝑥) 𝑠.𝑡. 𝑓 𝑖 𝑥 ≤0 where: 𝑓 𝑖 𝑥 ≔ 𝑔 𝑖 𝑥 − ℎ 𝑖 𝑥 , 𝑖=0,…,𝑚, 𝑔 𝑖 : ℝ 𝑛 →ℝ, ℎ 𝑖 : ℝ 𝑛 →ℝ are convex.

Concave-Convex Computational Procedure (CCCP) Heuristic for minimizing DC programming problems. Has been used extensively in: machine learning (sparse support vector machines (SVM), transductive SVMs, sparse principal component analysis) statistical physics (minimizing Bethe and Kikuchi free energies). Idea: Input 𝑘≔0 x 𝑥 0 , initial point 𝑓 𝑖 = 𝑔 𝑖 − ℎ 𝑖 , 𝑖=0,…,𝑚 Convexify by linearizing 𝒉 x 𝒇 𝒊 𝒌 𝒙 = 𝑔 𝑖 𝑥 −( ℎ 𝑖 𝑥 𝑘 +𝛻 ℎ 𝑖 𝑥 𝑘 𝑇 𝑥− 𝑥 𝑘 ) Solve convex subproblem Take 𝑥 𝑘+1 to be the solution of min 𝑓 0 𝑘 𝑥 𝑠.𝑡. 𝑓 𝑖 𝑘 𝑥 ≤0, 𝑖=1,…,𝑚 convex convex affine 𝑘≔𝑘+1 𝒇 𝒊 𝒌 𝒙 𝒇 𝒊 (𝒙)

Concave-Convex Computational Procedure (CCCP) Toy example: min 𝑥 𝑓 𝑥 , where 𝑓 𝑥 ≔𝑔 𝑥 −ℎ(𝑥) Convexify 𝑓 𝑥 to obtain 𝑓 0 (𝑥) Initial point: 𝑥 0 =2 Minimize 𝑓 0 (𝑥) and obtain 𝑥 1 Reiterate 𝑥 ∞ 𝑥 4 𝑥 3 𝑥 2 𝑥 1 𝑥 0 𝑥 0

CCCP for nonconvex polynomial optimization problems (1/2) CCCP relies on input functions being given as a difference of convex functions. We will consider polynomials in 𝑛 variables and of degree 𝑑. Any polynomial can be written as a difference of convex polynomials. Proof by Wang, Schwing and Urtasun Alternative proof given later in this presentation, as corollary of stronger theorem What if we don’t have access to such a decomposition?

CCCP for nonconvex polynomial optimization problems (2/2) In fact, for any polynomial, ∃ an infinite number of decompositions. 𝑓 𝑥 =𝑔 𝑥 −ℎ(𝑥) Example x𝑓(𝑥)= 𝑥 4 −3 𝑥 2 +2𝑥−2 Possible decompositions 𝑔 𝑥 = 𝑥 4 , ℎ 𝑥 =3 𝑥 2 −2𝑥+2 𝑔 𝑥 = 𝑥 4 + 𝒙 𝟐 , ℎ 𝑥 =3 𝑥 2 + 𝒙 𝟐 −2𝑥+2, etc. Which one would be a natural choice for CCCP?

Picking the “best” decomposition (1/2) Algorithm Linearize 𝒉 𝒙 around a point 𝑥 𝑘 to obtain convexified version of 𝒇(𝒙) Idea Pick ℎ 𝑥 such that it is as close as possible to affine Mathematical translation Minimize curvature of ℎ ( 𝐻 ℎ is the hessian of ℎ) At a point 𝒂 min g,h 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑎 ) s.t. 𝑓=𝑔−ℎ 𝑔,ℎ convex Over a region 𝛀 min 𝑔,ℎ max 𝑥∈Ω 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑥 ) s.t. 𝑓=𝑔−ℎ, 𝑔,ℎ convex

Picking the “best” decomposition (2/2) Theorem: Finding the “best” decomposition of a degree-4 polynomial over a box is NP-hard. Proof idea: Reduction via testing convexity of quartic polynomials is hard (Ahmadi, Olshevsky, Parrilo, Tsitsiklis). The same is likely to hold for the point version, but we have been unable to prove it. How can we efficiently find such a decomposition?

Convex relaxations for DC decompositions (1/6) SOS, DSOS, SDSOS polynomials (Ahmadi, Majumdar) Families of nonnegative polynomials. Type Characterization Testing membership Sum of squares (sos) ∃ 𝑞 𝑖 , polynomials, s.t. 𝑝(𝑥)=∑ 𝑞 𝑖 2 (𝑥) SDP Scaled diagonally dominant sum of squares (sdsos) p= 𝑖 𝛼 𝑖 𝑚 𝑖 2 + 𝑖,𝑗 𝛽 𝑖 + 𝑚 𝑖 + 𝛾 𝑗 + 𝑚 𝑗 2 + 𝛽 𝑖 − 𝑚 𝑖 − 𝛾 𝑗 − 𝑚 𝑗 2 𝑚 𝑖 , 𝑚 𝑗 monomials, 𝛼 𝑖 ≥0 SOCP Diagonally dominant sum of squares (dsos) p= 𝑖 𝛼 𝑖 𝑚 𝑖 2 + 𝑖,𝑗 𝛽 𝑖,𝑗 + 𝑚 𝑖 + 𝑚 𝑗 2 + 𝛽 𝑖,𝑗 − 𝑚 𝑖 − 𝑚 𝑗 2 𝑚 𝑖 , 𝑚 𝑗 monomials, 𝛼 𝑖 , 𝛽 𝑖𝑗 +,− ≥0 LP ⇓ ⇓

Convex relaxations for DC decompositions (2/6) DSOS-convex, SDSOS-convex, SOS-convex polynomials Definitions: 𝑝 is dsos-convex if 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦 is dsos. 𝑝 is sdsos-convex if 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦 is sdsos. 𝑝 is sos-convex if 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦 is sos. 𝑝(𝑥) convex 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦≥0, ∀𝑥,𝑦∈ ℝ 𝑛 𝑦 𝑇 𝐻 𝑝 𝑥 𝑦 sos/sdsos/dsos ⇔ 𝐻 𝑝 𝑥 ≽0, ∀𝑥 ⇔ ⇐ LP SOCP SDP

Convex relaxations for DC decompositions (3/6) Comparison of these sets on a parametric family of polynomials: 𝑝 𝑥 1 , 𝑥 2 =2 𝑥 1 4 +2 𝑥 2 4 +𝑎 𝑥 1 3 𝑥 2 +𝑏 𝑥 1 2 𝑥 2 2 +𝑐 𝑥 1 𝑥 2 3 𝑐=−0.5 𝑐=0 𝑐=1 𝑏 𝑏 𝑏 𝑎 𝑎 𝑎 dsos-convex sdsos-convex sos-convex=convex

Convex relaxations for DC decompositions (4/6) How to use these concepts to do DC decomposition at a point 𝑎? Original problem min 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑎 ) s.t. 𝑓=𝑔−ℎ 𝑔,ℎ convex min 𝑡 s.t. 𝐻 ℎ 𝑎 ≼𝑡𝐼 𝑓=𝑔−ℎ ⇔ Relaxation 1: sos-convex min 𝑡 s.t. 𝐻 ℎ 𝑎 ≼𝑡𝐼 𝑓=𝑔−ℎ 𝑔,ℎ sos-convex SDP Relaxation 2: sdsos-convex 𝑔,ℎ sdsos-convex SOCP + “small” SDP Relaxation 3: dsos-convex 𝑔,ℎ dsos-convex LP + “small” SDP Relaxation 4: sdsos-convex+sdd min 𝑡 s.t. 𝒕𝑰−𝑯 𝒉 𝒂 sdd (**) 𝑓=𝑔−ℎ 𝑔,ℎ, sdsos-convex SOCP Relaxation 5: dsos-convex + dd s.t. 𝒕𝑰− 𝑯 𝒉 𝒂 dd (*) 𝑔,ℎ, dsos-convex LP ∗ 𝑄 is diagonally dominant (dd) ⇔ 𝑗 𝑞 𝑖𝑗 < 𝑞 𝑖𝑖 , ∀𝑖 ∗∗ 𝑄 is sdd ⇔∃𝐷>0 diagonal, s.t. 𝐷𝑄𝐷 dd.

Convex relaxations for DC decompositions (5/6) Can any polynomial be written as the difference of two dsos/sdsos/sos convex polynomials? Lemma about cones: Let 𝐾⊆𝐸 a full dimensional cone (𝐸, any vector space). Then any 𝑣∈𝐸 can be written as 𝑣= 𝑘 1 − 𝑘 2 , 𝑘 1 , 𝑘 2 ∈𝐾. Proof sketch: =:𝑘′ ∃ 𝛼<1 such that 1−𝛼 𝑣+𝛼𝑘∈𝐾 E K ⇔𝑣= 1 1−𝛼 𝑘 ′ − 𝛼 1−𝛼 𝑘 𝒌 𝒌′ 𝒗 𝑘 1 ∈𝐾 𝑘 2 ∈𝐾

Convex relaxations for DC decompositions (6/6) Theorem: Any polynomial can be written as the difference of two dsos- convex polynomials. Corollary: Same holds for sdsos-convex, sos-convex and convex. Proof idea: Need to show that dsos-convex polynomials is full-dimensional cone. “Obvious” choices (i.e., 𝑝 𝑥 =( 𝑖 𝑥 𝑖 2 ) 𝑑/2 ) do not work. Induction on 𝑛: for 𝑛=2, take 𝑝 𝑥 1 , 𝑥 2 = 𝑎 0 𝑥 1 𝑑 + 𝑎 1 𝑥 1 𝑑−2 𝑥 2 2 +…+ 𝑎 𝑑 4 𝑥 1 𝑑 2 𝑥 2 𝑑 2 +…+ 𝑎 1 𝑥 1 2 𝑥 2 𝑑−2 + 𝑎 0 𝑥 2 𝑑 𝑎 0 > 2 𝑑−2 𝑑(𝑑−1) + 𝑑 4(𝑑−1) 𝑎 𝑑 4 𝑎 𝑘+1 = 𝑑−2𝑘 2𝑘+2 𝑎 𝑘 , 𝑘=1,…, 𝑑 4 −1 𝑎 1 =1

Comparing the different relaxations (1/4) Impact of relaxations on solving for random 𝑓 (𝑑=4). min 𝑡,𝑔,ℎ 𝑡 s.t. 𝑡𝐼− 𝐻 ℎ 𝑎 psd/sdd/dd 𝑓=𝑔−ℎ, 𝑔,ℎ s/d/sos-convex Type of relaxation 𝒏=𝟔 𝒏=𝟏𝟎 𝒏=𝟏𝟔 Time (s) Opt value Opt Value dsos-convex + dd 1.05 17578.54 2.79 21191.55 20.80 168327.89 dsos-convex + psd 1.19 15855.77 3.19 19426.13 25.36 146847.73 sdsos-convex + sdd 1.21 1089.41 5.17 1962.64 34.66 7936.57 sdsos-convex + psd 1069.79 5.29 1957.03 39.43 7935.72 sos-convex + psd MOSEK 2.02 193.07 93.74 317.63 +∞ ------------------ sos-convex + psd SEDUMI 11.48 193.06 10324.12 Computer: 8Gb RAM, 2.40GHz processor Nearly 3hrs

Comparing the different relaxations (2/4) Iterative decomposition algorithm implemented for unconstrained 𝑓. Decompose 𝒇=𝒈−𝒉, using one of the relaxations at point 𝑥 𝑘 Minimize convexified 𝒇 𝒌 , using an SDP subroutine [Lasserre; de Klerk and Laurent] Value of the objective after 3 mins. Algorithm given above. 5 different relaxations used 𝑓 random with 𝑛=9, 𝑑=4 Average over 25 iterations Solver: Mosek

Comparing the different relaxations (3/4) Constrained case: min 𝑥∈𝐵 𝑓(𝑥) , where 𝐵= 𝑥 𝑖 𝑥 𝑖 2 ≤ 𝑅 2 }. Single decomposition vs Iterative decomposition vs One min-max decomp. Decompose 𝒇=𝒈−𝒉, once at 𝑥 0 Relaxation: min 𝑡 s.t. 𝐻 ℎ 𝑎 ≼𝑡𝐼 𝑓=𝑔−ℎ 𝑔,ℎ sdsos convex Decompose 𝒇=𝒈−𝒉, at a point 𝑥 𝑘 Relaxation: min 𝑡 s.t. 𝐻 ℎ 𝑎 ≼𝑡𝐼 𝑓=𝑔−ℎ 𝑔,ℎ sdsos convex Decompose 𝒇=𝒈−𝒉 over B What relaxation to use? Minimize convexified 𝒇 𝒌 Minimize convexified 𝒇 𝒌 Minimize convexified 𝒇 𝒌 Second relaxation: min 𝑡,𝑔,ℎ 𝑡 𝒕𝑰− 𝑯 𝒉 𝒙 ≽ 𝑹 𝟐 −∑ 𝒙 𝒊 𝟐 𝝉(𝒙) 𝒚 𝑻 𝝉(𝒙)𝒚 sos 𝑓=𝑔−ℎ 𝑔,ℎ sdsos-convex First relaxation: min 𝑡,𝑔,ℎ 𝑡 𝑥∈𝐵⇒𝑡𝐼− 𝐻 ℎ 𝑥 ≽0 𝑓=𝑔−ℎ 𝒈,𝒉 sdsos-convex Equivalent formulation: min 𝑡,𝑔,ℎ 𝑡 𝒙∈𝑩⇒𝒕𝑰− 𝑯 𝒉 𝒙 ≽𝟎 𝑓=𝑔−ℎ 𝑔,ℎ convex Original problem: min 𝑔,ℎ max 𝑥∈Ω 𝜆 𝑚𝑎𝑥 ( 𝐻 ℎ 𝑥 ) s.t. 𝑓=𝑔−ℎ, 𝑔,ℎ convex

Comparing the different relaxations (4/4) Constrained case: single decomposition vs. iterative decomposition vs. min-max decomposition Value of the objective after 3 mins. Algorithms described above. 𝑓 random with 𝑛=10, 𝑑=4 Radius 𝑅 random integer between 100 and 400. Average over 200 iterations

Main messages To apply CCCP to polynomial optimization, a DC decomposition is needed. Choice of decomposition impacts convergence speed. Not computationally tractable to find “best” decomposition. Efficient convex relaxations based on the concepts of dsos-convex (LP), sdsos-convex (SOCP), and sos-convex (SDP) polynomials. Dsos-convex and sdsos-convex scale to a larger number of variables.

Thank you for listening Questions?