Download presentation
Presentation is loading. Please wait.
Published byRoman DomaΕski Modified over 6 years ago
1
Nonnegative polynomials and applications to learning
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi Mihaela Curmei Princeton, ORFE Ex-Princeton, ORFE
2
Nonnegative polynomials
A polynomial π π₯ βπ π₯ 1 ,β¦, π₯ π is nonnegative if π π₯ β₯0,βπ₯β β π . π π₯ = π₯ 4 β5 π₯ 2 βπ₯+10 Is this polynomial nonnegative?
3
Optimizing over nonnegative polynomials
Interested in more than checking nonnegativity of a given polynomial Problems of the type: Linear objective and affine constraints in the coefficients of π (e.g., sum of coefs =1) min π πΆ(π ) π .π‘. π΄ π =π π π β₯π, βπ Decision variables are the coefficients of the polynomial π Nonnegativity condition Why would we be interested in problems of this type?
4
1. Shape-constrained regression
Impose e.g., monotonicity or convexity on the regressor Example: price of a car with respect to age How does this relate to optimizing over nonnegative polynomials? Monotonicity of a polynomial regressor over a range Nonnegativity of partial derivatives over that range Convexity of a polynomial regressor π» π₯ β½0,βπ₯, i.e., π¦ π π» π₯ π¦β₯0, β π₯,π¦
5
2. Difference of Convex (DC) programming
Problems of the form min π 0 (π₯) π .π‘. π π π₯ β€0 where π π π₯ β π π π₯ β β π π₯ , π π , β π convex convex β π π» π― π πβ₯π, βπ,π ML Applications: Sparse PCA, Kernel selection, feature selection in SVM Studied for quadratics, polynomials nice to study question computationally Hiriart-Urruty, 1985 Tuy, 1995
6
Outline of the rest of the talk
Very brief introduction to sum of squares Revisit shape-constrained regression Revisit difference of convex programming
7
Sum of squares polynomials
Is this polynomial nonnegative? NP-hard to decide for degree β₯4. What if π can be written as a sum of squares (sos)? Sufficient condition for nonnegativity Can optimize over the set of sos polynomials using SDP.
8
Revisiting Monotone Regression
[Ahmadi, Curmei, GH, 2017]
9
Monotone regression: problem definition
N data points: π₯ π , π¦ π with π₯ π β β π , π¦ π ββ noisy measurements of a monotone function π¦ π =π π₯ π + π π Feature domain: box π΅β β π Monotonicity profile: π π = 1 β1 0 for π=1,β¦,π. if π is monotonically increasing w.r.t. π₯ π if π is monotonically decreasing w.r.t. π₯ π if no monotonicity requirements on π w.r.t. π₯ π Can this be done computationally? How good is this approximation? Goal: Fit a polynomial to the data that has monotonicity profile π over B.
10
NP-hardness and SOS relaxation
Theorem: Given a cubic polynomial π, a box π΅, and a monotonicity profile π, it is NP-hard to test whether π has profile π over π΅. SOS relaxation: π has odd degree ππ(π₯) π π₯ π = π 0 π₯ + π π π π₯ π π + β π₯ π π₯ π β π π β where π π ,π=0,β¦,π are sos polynomials ππ(π₯) π π₯ π β₯0, βπ₯βπ΅, where π΅= π 1 β , π 1 + Γβ¦Γ π π β , π π + π has even degree ππ(π₯) π π₯ π = π 0 π₯ + π π π π₯ π π + β π₯ π + π π π (π₯) π₯ π β π π β where π π , π π are sos polynomials
11
Approximation theorem
Theorem: For any π>0, and any πΆ 1 function π with monotonicity profile π, there exists a polynomial π with the same profile π, such that max π₯βπ΅ π π₯ βπ π₯ <π . Moreover, one can certify its monotonicity profile using SOS. Proof uses results from approximation theory and Putinarβs Positivstellensatz
12
Numerical experiments (1/2)
Low noise environment High noise environment
13
Numerical experiments (2/2)
Low noise environment High noise environment n=4, d=7
14
Revisiting difference of convex programming
[Ahmadi, GH*, 2016] * Winner of the 2016 INFORMS Computing Society Best Student Paper Award
15
Difference of Convex (dc) decomposition
Interested in problems of the form: min π 0 (π₯) π .π‘. π π π₯ β€0 where π π π₯ β π π π₯ β β π π₯ , π π , β π convex. Leads to difference of convex (dc) decomposition problem: Given a polynomial π, find π and β such that π=πβπ, where π,β convex polynomials. Does such a decomposition always exist? Can it be efficiently computed? Is it unique?
16
Existence of dc decomposition (1/3)
Recall: Theorem: Any polynomial can be written as the difference of two sos-convex polynomials. Corollary: Any polynomial can be written as the difference of two convex polynomials. π(π₯) convex β π¦ π π» π π₯ π¦β₯0, βπ₯,π¦β β π β π¦ π π» π π₯ π¦ sos SDP SOS-convexity
17
Existence of dc decomposition (2/3)
Lemma: Let πΎ be a full dimensional cone in a vector space πΈ. Then any π£βπΈ can be written as π£= π 1 β π 2 , π 1 , π 2 βπΎ. Proof sketch: =:πβ² β πΌ<1 such that 1βπΌ π£+πΌπβπΎ E K βπ£= 1 1βπΌ π β² β πΌ 1βπΌ π To change π πβ² π π 1 βπΎ π 2 βπΎ
18
Existence of dc decomposition (3/3)
Here, πΈ={polynomials of degree 2π in π variables}, πΎ={sos-convex polynomials of degree 2π in π variables}. Remains to show that πΎ is full dimensional: Also shows that a decomposition can be obtained efficiently: In fact, we show that a decomposition can be found via LP and SOCP (not covered here). β π π π π
can be shown to be in the interior of πΎ. π=πβπ, π,π sos-convex solving is an SDP.
19
Uniqueness of dc decomposition
Dc decomposition: given a polynomial π, find convex polynomials π and β such that π=πβπ. Questions: Does such a decomposition always exist? Can I obtain such a decomposition efficiently? Is this decomposition unique? οΌ Yes οΌ Through sos-convexity Alternative decompositions π π₯ = π π₯ +π π₯ β β π₯ +π π₯ π(π₯) convex Initial decomposition xπ π₯ =π π₯ ββ(π₯) βBest decomposition?β
20
Convex-Concave Procedure (CCP)
min π 0 (π₯) π .π‘. π π π₯ β€0, π=1,β¦,π Heuristic for minimizing DC programming problems. Idea: Input πβ0 x π₯ 0 , initial point π π = π π β β π , π=0,β¦,π Convexify by linearizing π x π π π π = π π π₯ β( β π π₯ π +π» β π π₯ π π π₯β π₯ π ) Solve convex subproblem Take π₯ π+1 to be the solution of min π 0 π π₯ π .π‘. π π π π₯ β€0, π=1,β¦,π convex convex affine πβπ+1 π π π π π π (π)
21
Convex-Concave Procedure (CCP)
Toy example: min π₯ π π₯ , where π π₯ βπ π₯ ββ(π₯) Convexify π π₯ to obtain π 0 (π₯) Initial point: π₯ 0 =2 Minimize π 0 (π₯) and obtain π₯ 1 Reiterate π₯ β π₯ 3 π₯ 4 π₯ 2 π₯ 1 π₯ 0 π₯ 0
22
Picking the βbestβ decomposition for CCP
Algorithm Linearize π π around a point π₯ π to obtain convexified version of π(π) Algorithm Linearize π π around a point π₯ π to obtain convexified version of π(π) Algorithm Linearize π π around a point π₯ π to obtain convexified version of π(π) Idea Pick β π₯ such that it is as close as possible to affine around π₯ π Mathematical translation Minimize curvature of β
23
Undominated decompositions (1/2)
Definition: g ,ββπβf is an undominated decomposition of π if no other decomposition of π can be obtained by subtracting a (nonaffine) convex function from β. π π = π π + π π , π π =π π π +ππβπ Convexify around π₯ 0 =2 to get π π π π π = π π βπ π π +ππβπ Cannot substract something convex from g and get something convex again. DOMINATED BY π β² π = π π , π β² π =π π π +ππβπ Convexify around π₯ 0 =2 to get π πβ² π If πβ² dominates π then the next iterate in CCP obtained using π β² always beats the one obtained using π.
24
Undominated decompositions (2/2)
Theorem: Given a polynomial π, consider min 1 π΄ π π πβ1 ππ π» β ππ , (where π΄ π = 2 π π/2 Ξ(π/2) ) s.t. π=πββ, π convex, β convex Any optimal solution is an undominated dcd of π (and an optimal solution always exists). Theorem: If π has degree 4, it is NP-hard to solve (β). Idea: Replace π=πββ, π, β convex by π=πββ, π,β sos-convex. (β) π,β
25
Comparing different decompositions (1/2)
Solving the problem: min π΅= π₯ π₯ β€π
} π 0 , where π 0 has π=8 and π=4. Decompose π 0 , run CCP for 4 minutes and compare objective value. Feasibility Undominated min π,β 1 π΄ π π πβ1 ππ π» π ππ π .π‘. π 0 =πββ π,β sos-convex min g,h 0 s.t. π 0 =πββ π,β sos-convex
26
Comparing different decompositions (2/2)
Average over 30 instances Solver: Mosek Computer: 8Gb RAM, 2.40GHz processor Feasibility Undominated Conclusion: Performance of CCP strongly affected by initial decomposition.
27
Main messages Optimization over nonnegative polynomials has many applications. Powerful SDP/SOS-based relaxations available. Two particular applications here: monotone regression and difference of convex programming. Future directions: Recent algorithmic developments to improve scalability of SDP. Using DC programming for sparse regression min π₯ ||π΄π₯βπ β π π₯ β 0
28
Thank you for listening
Questions? Want to learn more?
29
Imposing monotonicity
Example: For what values of π and π is the following polynomial monotone? π π = π π +π π π +π π π β π+π π Theorem: A polynomial π(π₯) of degree 2π is monotone on [0,1] if and only if π β² π₯ =π₯ π 1 π₯ + 1βπ₯ π 2 π₯ , where π 1 (π₯) and π 2 (π₯) are some sos polynomials of degree 2πβ2. Search for sos polynomials using SDP!
30
1. Polynomial optimization
πΈ β min π₯ π(π₯) π .π‘. π π π₯ β€0 π π π₯ =0 max πΎ πΎ π .π‘. π π₯ βπΎβ₯0, βπ₯β{ π π π₯ β€0, π π π₯ =0} ML applications: Low-rank matrix completion Training deep nets with polynomial activation functions Nonnegative matrix factorization Dictionary learning Sparse recovery with nonconvex regularizers
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.