Download presentation
Presentation is loading. Please wait.
1
Convex functions Lecture 4
Dr. Zvi Lotker
2
In the last lecture Operations that preserve convexity
Carathéodory's theorem Radon's theorem Helly's theorem Separating hyperplane theorem The isolation theorem
3
Carathéodory's theorem
If a point xRd lies in the convex hull of a set P, there is a subset P’of P consisting of no more than d+1 points such that x lies in the convex hull of P′. (0,1) (1,1) (0,0) (1,0)
4
Carathéodory's theorem
Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1,…, xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk (xk-x1)=0
5
Carathéodory's theorem
Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0
6
Carathéodory's theorem
Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0
7
Carathéodory's theorem
Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0
8
Carathéodory's theorem
Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0 Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μi=0
9
Carathéodory's theorem
Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μ=0 What is 1-μ1+…+ k-μk=?
10
Convex of Compact set Theorem If SRd is a compact set, then conv(S) is a compact set Proof Let be the standard simplex in Rd+1. is compact Sd+1 is compact, Sj={(x1,…,xj):xiS) Consider the map : Sd+1Rd (u1,…,ud+1;a1,…,ad+1)=a1u1+…+ad+1ud+1 Carathéodory's theorem implies that the Image of is convex. Since is continuous the Image of is Compact.
11
Question Is it necessary to have d+1 points Conv(S)=Conv(Conv(S))
ABConv(A) Conv(B) Is the set {t x+(1-t)y: x,yP, 1>t>0} is convex?
12
Radon's theorem (1887, 1956) Any set of d + 2 points in Rd can be partitioned into two (disjoint) sets whose convex hulls intersect.
13
Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof Suppose X={x1,x2,…,xd+2} Rd Since any set of d+2 points in Rd is affinely dependent, there exists a set of multipliers a1,…,ad+2 not all of them 0 s.t a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0
14
Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0 Let I={i: ai>0}, J={i: ai<0}, X1={xi: ai>0}, X2={xi: ai<0} z=(iIaixi )/(iIai)Conv(X1)Conv(x2)
15
Helly's theorem ( ) Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.
16
Proof of Helly's theorem
The proof is by induction on m. If m=d+1, then the statement is true. Suppose the statement is true if m-1>d.
17
Proof of Helly's theorem
The sets Bj=ijAi by inductive hypothesis. Pick a point pi from each of Bi, {p1,…,pm} By Radon's lemma, there is a partition of p's into two sets P1,P2 s.t. xX=conv(P1)conv(P2) I1={i:pi P1}, I2={i:pi P1} Let xX. We claim that xAi.
18
Proof of Helly's theorem
Note that for all ji, pjAi. Consider i{1,2,…,n} Then i I1 or i I2 . Assume that i I1, i I2 So xconv(P2) Ai Therefore xAi.
19
Separating hyperplane theorem
if C and D are disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x D strict separation requires additional assumptions (e.g., C is closed, D is a singleton)
20
The isolation theorem Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. Proof. We can assume u=0.
21
Summary theorem Carathéodory's theorem: Radon's theorem:
For all xRdConv(P), there exist subset P’P consisting of no more than d+1, s.t. xConv(P‘). Radon's theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Helly's theorem: Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.
22
Summary theorem Separating hyperplane theorem: The isolation theorem:
if C and D are compact disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x D The isolation theorem: Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. We can proof the Separating form the isolation by def A=C-D
23
How this is connected to optimization
If we can check if the intersect is not empty we can search for optimal. Example
24
How this is connected to optimization
If we can check if the intersect is not empty we can search for optimal. Example
25
Outline of the lecture Convex function Examples First-order condition
Second-order conditions Jensen's inequality Operations that preserve convexity
26
Convex function A real-valued function f defined on an interval (or on any convex subset C of some vector space) is called convex, if for any two points x and y in its domain C and any t in [0,1], we have f[tx+(1-t)y]≤tf[x]+(1-t)f[y] f is concave if -f is convex
27
Examples: convex functions on R
Affine: ax + b on R, for any a,bR. exponential: eax, for any aR. powers: xa on R++, for a≥1,a<0. powers of absolute value: |x| on R, for p≥1 negative entropy: xlog(x) on R++.
28
Examples: concave functions on R
Affine: ax + b on R, for any a,bR. powers: xa on R++, for 0≤a<1. negative entropy: log(x) on R++.
29
Examples on Rn and Rmn Affine function f(x) = a’x + b Norms: ||x||
Max: f(x)=max{x1,…,xn} f(X) = tr(A’X) + b =Ai,j Xi,j+b spectral (maximum singular value) norm f(X) = ||X||2 = max(X)=max(X’X)
30
Max: f(x)=max{x1,…,xn} f(tx+(1-t)y)=max{txi+(1-t)yi}
≤t max{xi}+(1-t)max{yi} =tf(x)+(1-t)f(y)
31
Extended-value extension
extended-value extension of f is (x) = f(x) for all x dom f (x) = for all x dom f often simplifies the notation for example- the condition (tx+(1-t)y)≤t[x]+(1-t)[y], for all t[0,1]
32
Properties of Convex functions
A Convex function on an open neighborhood is continuous
33
First-order condition
f is differentiable if dom f is open and the gradient f(x) exists at each xdom f 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f f(x)+f(x)(y-x) f(x) (x, f(x))
34
First-order condition
1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f is convex and y,xdom f tx+(1-t)ydom f (1-t)f(x)+tf(y)≥f(x+t(y-x)) So f(y)≥f(x)+(f(tx+(1-t)y)-f(x))/t
35
First-order condition
1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f(y)≥f(x)+f’(x)(y-x) for all y,xdom f Let z=tx+(1-t)y f(y)≥f(z)+f’(z)(y-z), f(x)≥f(z)+f’(z)(x-z) (1-t)f(y)≥(1-t)f(z)+(1-t)f’(z)(y-z), tf(x)≥tf(z)+tf’(z)(x-z) tf(x)+(1-t)f(y) ≥f(tx+(1-t)y)
36
First-order condition
Now we prove for the general d f:RdR, y,xRd, Consider f to be the line passing through x,y. g(t)=f(ty+(1-t)x) g’(t)=f(ty+(1-t)x)t(y-x), If f is convex then g is convex and we can use d=1 on g g(1)≥g(0)+g’(0)f(y)≥f(x)+f(x)t(y-x)
37
First-order condition
Now we prove the case for a general d Now we assume that f(y)≥f(x)+f(x)t(y-x) Let x,ydom f, t,s [0,1] If f(ty+(1-t)x)≥f(sy+(1-s)x)+f(sy+(1-s)x)t(y-x)(t-s) i.e. g(t) ≥ g(s)+g’(s)(t-s) and g is convex.
38
Restriction of a convex function to a line
f : RnR is convex if and only if the function g : RR, g(t)=f(x+tv) Dom(g)={t:x+tvdom f } is convex (in t) for any x+tvdom f, vRn You can check convexity of f by checking convexity of functions of one variable
39
Second-order conditions
f is twice differentiable if domf is open and the Hessian 2f is symmetric 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2f(x)≥0 for all xdom f
40
Example f(x)=xlog(x) f(x)’=log(x)+1,f’’(x)=1/x>0 for all x>0
Norm if f:RnR is norm and 0≤t≤1 then f(tx+(1-t)y) ≤f(tx)+f((1-t)y)=tf(x)+(1-t)f(y)
41
Example f(x) = 1/2xtPx + qtx + r, (with PSn) least-squares objective:
f(x) = Px + q, 2f(x) = P Convex iff P≥0 least-squares objective: f(x) = ||Ax- b||22 f(x) = 2At(Ax- b), 2f(x) = 2AtA f(x) convex for all A.
42
Example Example. f:SnR with f(X)=log detX, domX=Sn++
g(t) = log det(X+tV ) =log det(X1/2(I+tX-1/2VX-1/2)X1/2) =log detX + log det(I+tX-1/2VX-1/2) =log detX + log(1+ti) where I are the eigenvalues of X-1/2VX-1/2 Therefore g’(t)=i/(1+ti), g’’(t)=-i2/(1+ti)2<0
43
geometric mean is concave
f(x)=(xi)1/n 2fi,i=-(xi)1/n(n-1)/(n2xi2) 2fi,j=(xi)1/n/(n2xixj) for ij 2f=(xi)1/n(n diag[x1-2,…, xn-2]-qtq)/n2 Where qi=1/xi We show that 2f≤0 v2fv ≤0=-(xi)1/n/n2(nvi2/xi2- (vi/xi)2)≤0 This follows from from Cauchy-Schwarz inequality (ata)(btb)≥0, applied for a=1,bi=vi/xi
44
Epigraph and sublevel set
-sublevel set of f:RnR: C={xdom f:f(x)≤} sublevel sets of convex functions are convex (converse is false) epigraph of f:RnR: epi f = {(x,t) Rn:f(x)≤t} f is convex if and only if epi f is a convex set
45
Jensen's inequality basic inequality: if f is convex, then for 0≤t≤1, f(tx+(1-t)y)≤tf(x)+(1-t)f(y) It can be extend to convex combinations of more than two points: For all i, 0≤ti≤, s.t, ti=1, f(tixi)≤tif(xi)
46
Jensen's inequality For all i, 0≤ti≤, s.t, Proof by induction
ti=1, f(tixi)≤tif(xi) Proof by induction Assume the theorem is true for n f(tixi)=f(t1x1+(1-t1)ti/(1-t1)xi) ≤ t1f(x1)+(1-t1)tif(ti/(1-t1)xi) And we can use the induction. Another way to write Jensen's inequality is f(E[x])≤ E[f(x)]
47
Example (ab)1/2≤(a+b)/2 For a,b≥0, we look on the function –logx
This function is convex -log((a+b)/2)) ≤(-log a-log(b))/2 Taking the exponential of both sides yields (ab)1/2≤(a+b)/2
48
Information theory If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives E[φ(y)]≥φ(E[y]) p(x)log(p(x)/q(x))dx≥ -logp(x)q(x)/p(x))dx = -logq(x)dx=0 And therefore p(x)log(p(x)dx≥ p(x)log(q(x)dx
49
Operations that preserve convexity
practical methods for establishing convexity of a function check definition (often simplified by restricting to a line) for twice differentiable functions, show 2f(x)≥0 for all xdom f show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.