Convex functions Lecture 4 Dr. Zvi Lotker
In the last lecture Operations that preserve convexity Carathéodory's theorem Radon's theorem Helly's theorem Separating hyperplane theorem The isolation theorem
Carathéodory's theorem If a point xRd lies in the convex hull of a set P, there is a subset P’of P consisting of no more than d+1 points such that x lies in the convex hull of P′. (0,1) (1,1) (0,0) (1,0)
Carathéodory's theorem Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1,…, xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk (xk-x1)=0
Carathéodory's theorem Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0
Carathéodory's theorem Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0 Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μi=0
Carathéodory's theorem Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μ=0 What is 1-μ1+…+ k-μk=?
Convex of Compact set Theorem If SRd is a compact set, then conv(S) is a compact set Proof Let be the standard simplex in Rd+1. is compact Sd+1 is compact, Sj={(x1,…,xj):xiS) Consider the map : Sd+1Rd (u1,…,ud+1;a1,…,ad+1)=a1u1+…+ad+1ud+1 Carathéodory's theorem implies that the Image of is convex. Since is continuous the Image of is Compact.
Question Is it necessary to have d+1 points Conv(S)=Conv(Conv(S)) ABConv(A) Conv(B) Is the set {t x+(1-t)y: x,yP, 1>t>0} is convex?
Radon's theorem (1887, 1956) Any set of d + 2 points in Rd can be partitioned into two (disjoint) sets whose convex hulls intersect.
Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof Suppose X={x1,x2,…,xd+2} Rd Since any set of d+2 points in Rd is affinely dependent, there exists a set of multipliers a1,…,ad+2 not all of them 0 s.t a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0
Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0 Let I={i: ai>0}, J={i: ai<0}, X1={xi: ai>0}, X2={xi: ai<0} z=(iIaixi )/(iIai)Conv(X1)Conv(x2)
Helly's theorem (1884-1943) Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.
Proof of Helly's theorem The proof is by induction on m. If m=d+1, then the statement is true. Suppose the statement is true if m-1>d.
Proof of Helly's theorem The sets Bj=ijAi by inductive hypothesis. Pick a point pi from each of Bi, {p1,…,pm} By Radon's lemma, there is a partition of p's into two sets P1,P2 s.t. xX=conv(P1)conv(P2) I1={i:pi P1}, I2={i:pi P1} Let xX. We claim that xAi.
Proof of Helly's theorem Note that for all ji, pjAi. Consider i{1,2,…,n} Then i I1 or i I2 . Assume that i I1, i I2 So xconv(P2) Ai Therefore xAi.
Separating hyperplane theorem if C and D are disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x D strict separation requires additional assumptions (e.g., C is closed, D is a singleton)
The isolation theorem Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. Proof. We can assume u=0.
Summary theorem Carathéodory's theorem: Radon's theorem: For all xRdConv(P), there exist subset P’P consisting of no more than d+1, s.t. xConv(P‘). Radon's theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Helly's theorem: Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.
Summary theorem Separating hyperplane theorem: The isolation theorem: if C and D are compact disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x D The isolation theorem: Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. We can proof the Separating form the isolation by def A=C-D
How this is connected to optimization If we can check if the intersect is not empty we can search for optimal. Example
Outline of the lecture Convex function Examples First-order condition Second-order conditions Jensen's inequality Operations that preserve convexity
Convex function A real-valued function f defined on an interval (or on any convex subset C of some vector space) is called convex, if for any two points x and y in its domain C and any t in [0,1], we have f[tx+(1-t)y]≤tf[x]+(1-t)f[y] f is concave if -f is convex
Examples: convex functions on R Affine: ax + b on R, for any a,bR. exponential: eax, for any aR. powers: xa on R++, for a≥1,a<0. powers of absolute value: |x| on R, for p≥1 negative entropy: xlog(x) on R++.
Examples: concave functions on R Affine: ax + b on R, for any a,bR. powers: xa on R++, for 0≤a<1. negative entropy: log(x) on R++.
Examples on Rn and Rmn Affine function f(x) = a’x + b Norms: ||x|| Max: f(x)=max{x1,…,xn} f(X) = tr(A’X) + b =Ai,j Xi,j+b spectral (maximum singular value) norm f(X) = ||X||2 = max(X)=max(X’X)
Max: f(x)=max{x1,…,xn} f(tx+(1-t)y)=max{txi+(1-t)yi} ≤t max{xi}+(1-t)max{yi} =tf(x)+(1-t)f(y)
Extended-value extension extended-value extension of f is (x) = f(x) for all x dom f (x) = for all x dom f often simplifies the notation for example- the condition (tx+(1-t)y)≤t[x]+(1-t)[y], for all t[0,1]
Properties of Convex functions A Convex function on an open neighborhood is continuous
First-order condition f is differentiable if dom f is open and the gradient f(x) exists at each xdom f 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f f(x)+f(x)(y-x) f(x) (x, f(x))
First-order condition 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f is convex and y,xdom f tx+(1-t)ydom f (1-t)f(x)+tf(y)≥f(x+t(y-x)) So f(y)≥f(x)+(f(tx+(1-t)y)-f(x))/t
First-order condition 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f(y)≥f(x)+f’(x)(y-x) for all y,xdom f Let z=tx+(1-t)y f(y)≥f(z)+f’(z)(y-z), f(x)≥f(z)+f’(z)(x-z) (1-t)f(y)≥(1-t)f(z)+(1-t)f’(z)(y-z), tf(x)≥tf(z)+tf’(z)(x-z) tf(x)+(1-t)f(y) ≥f(tx+(1-t)y)
First-order condition Now we prove for the general d f:RdR, y,xRd, Consider f to be the line passing through x,y. g(t)=f(ty+(1-t)x) g’(t)=f(ty+(1-t)x)t(y-x), If f is convex then g is convex and we can use d=1 on g g(1)≥g(0)+g’(0)f(y)≥f(x)+f(x)t(y-x)
First-order condition Now we prove the case for a general d Now we assume that f(y)≥f(x)+f(x)t(y-x) Let x,ydom f, t,s [0,1] If f(ty+(1-t)x)≥f(sy+(1-s)x)+f(sy+(1-s)x)t(y-x)(t-s) i.e. g(t) ≥ g(s)+g’(s)(t-s) and g is convex.
Restriction of a convex function to a line f : RnR is convex if and only if the function g : RR, g(t)=f(x+tv) Dom(g)={t:x+tvdom f } is convex (in t) for any x+tvdom f, vRn You can check convexity of f by checking convexity of functions of one variable
Second-order conditions f is twice differentiable if domf is open and the Hessian 2f is symmetric 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2f(x)≥0 for all xdom f
Example f(x)=xlog(x) f(x)’=log(x)+1,f’’(x)=1/x>0 for all x>0 Norm if f:RnR is norm and 0≤t≤1 then f(tx+(1-t)y) ≤f(tx)+f((1-t)y)=tf(x)+(1-t)f(y)
Example f(x) = 1/2xtPx + qtx + r, (with PSn) least-squares objective: f(x) = Px + q, 2f(x) = P Convex iff P≥0 least-squares objective: f(x) = ||Ax- b||22 f(x) = 2At(Ax- b), 2f(x) = 2AtA f(x) convex for all A.
Example Example. f:SnR with f(X)=log detX, domX=Sn++ g(t) = log det(X+tV ) =log det(X1/2(I+tX-1/2VX-1/2)X1/2) =log detX + log det(I+tX-1/2VX-1/2) =log detX + log(1+ti) where I are the eigenvalues of X-1/2VX-1/2 Therefore g’(t)=i/(1+ti), g’’(t)=-i2/(1+ti)2<0
geometric mean is concave f(x)=(xi)1/n 2fi,i=-(xi)1/n(n-1)/(n2xi2) 2fi,j=(xi)1/n/(n2xixj) for ij 2f=(xi)1/n(n diag[x1-2,…, xn-2]-qtq)/n2 Where qi=1/xi We show that 2f≤0 v2fv ≤0=-(xi)1/n/n2(nvi2/xi2- (vi/xi)2)≤0 This follows from from Cauchy-Schwarz inequality (ata)(btb)≥0, applied for a=1,bi=vi/xi
Epigraph and sublevel set -sublevel set of f:RnR: C={xdom f:f(x)≤} sublevel sets of convex functions are convex (converse is false) epigraph of f:RnR: epi f = {(x,t) Rn:f(x)≤t} f is convex if and only if epi f is a convex set
Jensen's inequality basic inequality: if f is convex, then for 0≤t≤1, f(tx+(1-t)y)≤tf(x)+(1-t)f(y) It can be extend to convex combinations of more than two points: For all i, 0≤ti≤, s.t, ti=1, f(tixi)≤tif(xi)
Jensen's inequality For all i, 0≤ti≤, s.t, Proof by induction ti=1, f(tixi)≤tif(xi) Proof by induction Assume the theorem is true for n f(tixi)=f(t1x1+(1-t1)ti/(1-t1)xi) ≤ t1f(x1)+(1-t1)tif(ti/(1-t1)xi) And we can use the induction. Another way to write Jensen's inequality is f(E[x])≤ E[f(x)]
Example (ab)1/2≤(a+b)/2 For a,b≥0, we look on the function –logx This function is convex -log((a+b)/2)) ≤(-log a-log(b))/2 Taking the exponential of both sides yields (ab)1/2≤(a+b)/2
Information theory If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives E[φ(y)]≥φ(E[y]) p(x)log(p(x)/q(x))dx≥ -logp(x)q(x)/p(x))dx = -logq(x)dx=0 And therefore p(x)log(p(x)dx≥ p(x)log(q(x)dx
Operations that preserve convexity practical methods for establishing convexity of a function check definition (often simplified by restricting to a line) for twice differentiable functions, show 2f(x)≥0 for all xdom f show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective