Presentation is loading. Please wait.

Presentation is loading. Please wait.

Convex functions Lecture 4

Similar presentations


Presentation on theme: "Convex functions Lecture 4"— Presentation transcript:

1 Convex functions Lecture 4
Dr. Zvi Lotker

2 In the last lecture Operations that preserve convexity
Carathéodory's theorem Radon's theorem Helly's theorem Separating hyperplane theorem The isolation theorem

3 Carathéodory's theorem
If a point xRd lies in the convex hull of a set P, there is a subset P’of P consisting of no more than d+1 points such that x lies in the convex hull of P′. (0,1) (1,1) (0,0) (1,0)

4 Carathéodory's theorem
Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1,…, xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk (xk-x1)=0

5 Carathéodory's theorem
Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0

6 Carathéodory's theorem
Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0

7 Carathéodory's theorem
Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0

8 Carathéodory's theorem
Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0 Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μi=0

9 Carathéodory's theorem
Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μ=0 What is 1-μ1+…+ k-μk=?

10 Convex of Compact set Theorem If SRd is a compact set, then conv(S) is a compact set Proof Let  be the standard simplex in Rd+1.  is compact Sd+1 is compact, Sj={(x1,…,xj):xiS) Consider the map : Sd+1Rd (u1,…,ud+1;a1,…,ad+1)=a1u1+…+ad+1ud+1 Carathéodory's theorem implies that the Image of  is convex. Since  is continuous the Image of  is Compact.

11 Question Is it necessary to have d+1 points Conv(S)=Conv(Conv(S))
ABConv(A) Conv(B) Is the set {t x+(1-t)y: x,yP, 1>t>0} is convex?

12 Radon's theorem (1887, 1956) Any set of d + 2 points in Rd can be partitioned into two (disjoint) sets whose convex hulls intersect.

13 Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof Suppose X={x1,x2,…,xd+2} Rd Since any set of d+2 points in Rd is affinely dependent, there exists a set of multipliers a1,…,ad+2 not all of them 0 s.t a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0

14 Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0 Let I={i: ai>0}, J={i: ai<0}, X1={xi: ai>0}, X2={xi: ai<0} z=(iIaixi )/(iIai)Conv(X1)Conv(x2)

15 Helly's theorem ( ) Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.

16 Proof of Helly's theorem
The proof is by induction on m. If m=d+1, then the statement is true. Suppose the statement is true if m-1>d.

17 Proof of Helly's theorem
The sets Bj=ijAi by inductive hypothesis. Pick a point pi from each of Bi, {p1,…,pm} By Radon's lemma, there is a partition of p's into two sets P1,P2 s.t. xX=conv(P1)conv(P2) I1={i:pi P1}, I2={i:pi P1} Let xX. We claim that xAi.

18 Proof of Helly's theorem
Note that for all ji, pjAi. Consider i{1,2,…,n} Then i  I1 or i I2 . Assume that i I1, i I2 So xconv(P2)  Ai Therefore xAi.

19 Separating hyperplane theorem
if C and D are disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x  D strict separation requires additional assumptions (e.g., C is closed, D is a singleton)

20 The isolation theorem Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. Proof. We can assume u=0.

21 Summary theorem Carathéodory's theorem: Radon's theorem:
For all xRdConv(P), there exist subset P’P consisting of no more than d+1, s.t. xConv(P‘). Radon's theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Helly's theorem: Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.

22 Summary theorem Separating hyperplane theorem: The isolation theorem:
if C and D are compact disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x  D The isolation theorem: Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. We can proof the Separating form the isolation by def A=C-D

23 How this is connected to optimization
If we can check if the intersect is not empty we can search for optimal. Example

24 How this is connected to optimization
If we can check if the intersect is not empty we can search for optimal. Example

25 Outline of the lecture Convex function Examples First-order condition
Second-order conditions Jensen's inequality Operations that preserve convexity

26 Convex function A real-valued function f defined on an interval (or on any convex subset C of some vector space) is called convex, if for any two points x and y in its domain C and any t in [0,1], we have f[tx+(1-t)y]≤tf[x]+(1-t)f[y] f is concave if -f is convex

27 Examples: convex functions on R
Affine: ax + b on R, for any a,bR. exponential: eax, for any aR. powers: xa on R++, for a≥1,a<0. powers of absolute value: |x| on R, for p≥1 negative entropy: xlog(x) on R++.

28 Examples: concave functions on R
Affine: ax + b on R, for any a,bR. powers: xa on R++, for 0≤a<1. negative entropy: log(x) on R++.

29 Examples on Rn and Rmn Affine function f(x) = a’x + b Norms: ||x||
Max: f(x)=max{x1,…,xn} f(X) = tr(A’X) + b =Ai,j Xi,j+b spectral (maximum singular value) norm f(X) = ||X||2 = max(X)=max(X’X)

30 Max: f(x)=max{x1,…,xn} f(tx+(1-t)y)=max{txi+(1-t)yi}
≤t max{xi}+(1-t)max{yi} =tf(x)+(1-t)f(y)

31 Extended-value extension
extended-value extension  of f is (x) = f(x) for all x  dom f (x) =  for all x  dom f often simplifies the notation for example- the condition (tx+(1-t)y)≤t[x]+(1-t)[y], for all t[0,1]

32 Properties of Convex functions
A Convex function on an open neighborhood is continuous

33 First-order condition
f is differentiable if dom f is open and the gradient f(x) exists at each xdom f 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f f(x)+f(x)(y-x) f(x) (x, f(x))

34 First-order condition
1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f is convex and y,xdom f tx+(1-t)ydom f (1-t)f(x)+tf(y)≥f(x+t(y-x)) So f(y)≥f(x)+(f(tx+(1-t)y)-f(x))/t

35 First-order condition
1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f(y)≥f(x)+f’(x)(y-x) for all y,xdom f Let z=tx+(1-t)y f(y)≥f(z)+f’(z)(y-z), f(x)≥f(z)+f’(z)(x-z) (1-t)f(y)≥(1-t)f(z)+(1-t)f’(z)(y-z), tf(x)≥tf(z)+tf’(z)(x-z) tf(x)+(1-t)f(y) ≥f(tx+(1-t)y)

36 First-order condition
Now we prove for the general d f:RdR, y,xRd, Consider f to be the line passing through x,y. g(t)=f(ty+(1-t)x) g’(t)=f(ty+(1-t)x)t(y-x), If f is convex then g is convex and we can use d=1 on g g(1)≥g(0)+g’(0)f(y)≥f(x)+f(x)t(y-x)

37 First-order condition
Now we prove the case for a general d Now we assume that f(y)≥f(x)+f(x)t(y-x) Let x,ydom f, t,s [0,1] If f(ty+(1-t)x)≥f(sy+(1-s)x)+f(sy+(1-s)x)t(y-x)(t-s) i.e. g(t) ≥ g(s)+g’(s)(t-s) and g is convex.

38 Restriction of a convex function to a line
f : RnR is convex if and only if the function g : RR, g(t)=f(x+tv) Dom(g)={t:x+tvdom f } is convex (in t) for any x+tvdom f, vRn You can check convexity of f by checking convexity of functions of one variable

39 Second-order conditions
f is twice differentiable if domf is open and the Hessian 2f is symmetric 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2f(x)≥0 for all xdom f

40 Example f(x)=xlog(x) f(x)’=log(x)+1,f’’(x)=1/x>0 for all x>0
Norm if f:RnR is norm and 0≤t≤1 then f(tx+(1-t)y) ≤f(tx)+f((1-t)y)=tf(x)+(1-t)f(y)

41 Example f(x) = 1/2xtPx + qtx + r, (with PSn) least-squares objective:
f(x) = Px + q, 2f(x) = P Convex iff P≥0 least-squares objective: f(x) = ||Ax- b||22  f(x) = 2At(Ax- b), 2f(x) = 2AtA f(x) convex for all A.

42 Example Example. f:SnR with f(X)=log detX, domX=Sn++
g(t) = log det(X+tV ) =log det(X1/2(I+tX-1/2VX-1/2)X1/2) =log detX + log det(I+tX-1/2VX-1/2) =log detX + log(1+ti) where I are the eigenvalues of X-1/2VX-1/2 Therefore g’(t)=i/(1+ti), g’’(t)=-i2/(1+ti)2<0

43 geometric mean is concave
f(x)=(xi)1/n 2fi,i=-(xi)1/n(n-1)/(n2xi2) 2fi,j=(xi)1/n/(n2xixj) for ij 2f=(xi)1/n(n diag[x1-2,…, xn-2]-qtq)/n2 Where qi=1/xi We show that 2f≤0 v2fv ≤0=-(xi)1/n/n2(nvi2/xi2- (vi/xi)2)≤0 This follows from from Cauchy-Schwarz inequality (ata)(btb)≥0, applied for a=1,bi=vi/xi

44 Epigraph and sublevel set
-sublevel set of f:RnR: C={xdom f:f(x)≤} sublevel sets of convex functions are convex (converse is false) epigraph of f:RnR: epi f = {(x,t) Rn:f(x)≤t} f is convex if and only if epi f is a convex set

45 Jensen's inequality basic inequality: if f is convex, then for 0≤t≤1, f(tx+(1-t)y)≤tf(x)+(1-t)f(y) It can be extend to convex combinations of more than two points: For all i, 0≤ti≤, s.t, ti=1, f(tixi)≤tif(xi)

46 Jensen's inequality For all i, 0≤ti≤, s.t, Proof by induction
ti=1, f(tixi)≤tif(xi) Proof by induction Assume the theorem is true for n f(tixi)=f(t1x1+(1-t1)ti/(1-t1)xi) ≤ t1f(x1)+(1-t1)tif(ti/(1-t1)xi) And we can use the induction. Another way to write Jensen's inequality is f(E[x])≤ E[f(x)]

47 Example (ab)1/2≤(a+b)/2 For a,b≥0, we look on the function –logx
This function is convex -log((a+b)/2)) ≤(-log a-log(b))/2 Taking the exponential of both sides yields (ab)1/2≤(a+b)/2

48 Information theory If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives E[φ(y)]≥φ(E[y]) p(x)log(p(x)/q(x))dx≥ -logp(x)q(x)/p(x))dx = -logq(x)dx=0 And therefore p(x)log(p(x)dx≥ p(x)log(q(x)dx

49 Operations that preserve convexity
practical methods for establishing convexity of a function check definition (often simplified by restricting to a line) for twice differentiable functions, show 2f(x)≥0 for all xdom f show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective


Download ppt "Convex functions Lecture 4"

Similar presentations


Ads by Google