Convex functions Lecture 4

Slides:



Advertisements
Similar presentations
APPENDIX A: REVIEW OF LINEAR ALGEBRA APPENDIX B: CONVEX AND CONCAVE FUNCTIONS V. Sree Krishna Chaitanya 3 rd year PhD student Advisor: Professor Biswanath.
Advertisements

Basic Properties of Relations
+ Convex Functions, Convex Sets and Quadratic Programs Sivaraman Balakrishnan.
Chain Rules for Entropy
Visual Recognition Tutorial
Exercise 1- 1’ Prove that if a point B belongs to the affine / convex hull Aff/Conv (A 1, A 2, …, A k ) of points A 1, A 2,…, A k, then: Aff/Conv (A 1,
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.
1 Set Theory. Notation S={a, b, c} refers to the set whose elements are a, b and c. a  S means “a is an element of set S”. d  S means “d is not an element.
1 Set Theory. Notation S={a, b, c} refers to the set whose elements are a, b and c. a  S means “a is an element of set S”. d  S means “d is not an element.
1 Section 3.3 Mathematical Induction. 2 Technique used extensively to prove results about large variety of discrete objects Can only be used to prove.
1 Preliminaries Precalculus Review I Precalculus Review II
Mathematics Review Exponents Logarithms Series Modular arithmetic Proofs.
Elementary Linear Algebra Anton & Rorres, 9th Edition
C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
2.4 Sequences and Summations
Linear Programming System of Linear Inequalities  The solution set of LP is described by Ax  b. Gauss showed how to solve a system of linear.
Pareto Linear Programming The Problem: P-opt Cx s.t Ax ≤ b x ≥ 0 where C is a kxn matrix so that Cx = (c (1) x, c (2) x,..., c (k) x) where c.
Three different ways There are three different ways to show that ρ(A) is a simple eigenvalue of an irreducible nonnegative matrix A:
Chapter 9: Geometric Selection Theorems 11/01/2013
OR Backgrounds-Convexity  Def: line segment joining two points is the collection of points.
CSE 311 Foundations of Computing I Lecture 9 Proofs and Set Theory Autumn 2012 CSE
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
Hon Wai Leong, NUS (CS6234, Spring 2009) Page 1 Copyright © 2009 by Leong Hon Wai CS6234: Lecture 4  Linear Programming  LP and Simplex Algorithm [PS82]-Ch2.
Introduction to Optimization
Linear Algebra Chapter 2 Matrices.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 3 Iterative Method for Nonlinear problems EE692 Parallel and Distribution.
CSE 311: Foundations of Computing Fall 2013 Lecture 8: Proofs and Set theory.
Department of Statistics University of Rajshahi, Bangladesh
Let X be a metric space. A subset M of X is said to be  Rare(Nowhere Dense) in X if its closure M has no interior points,  Meager(of First Category)
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
Regularized Least-Squares and Convex Optimization.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Section 9.1. Section Summary Relations and Functions Properties of Relations Reflexive Relations Symmetric and Antisymmetric Relations Transitive Relations.
1 Chapter 4 Geometry of Linear Programming  There are strong relationships between the geometrical and algebraic features of LP problems  Convenient.
Chapter 1 Logic and Proof.
Relations Chapter 9 Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
Chapter 3 – Algebra III 03 Learning Outcomes
Information geometry.
Convex functions Lecture 3
Set Theory.
Introduction to LP Dr Zvi Lotker.
Chapter 11 Optimization with Equality Constraints
Advanced Algorithms Analysis and Design
Convex functions Lecture 7
Computational Optimization
Matrices and Vectors Review Objective
Chapter 5 Induction and Recursion
Systems of First Order Linear Equations
Lecture 7 Functions.
Additive Combinatorics and its Applications in Theoretical CS
Basis and Dimension Basis Dimension Vector Spaces and Linear Systems
CSE15 Discrete Mathematics 04/26/17
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Chapter 5. The Duality Theorem
I.4 Polyhedral Theory (NW)
CSCI B609: “Foundations of Data Science”
Properties of Relations
Quantum Foundations Lecture 3
Quantum Foundations Lecture 2
I.4 Polyhedral Theory.
Advanced Analysis of Algorithms
Part II General Integer Programming
(Convex) Cones Def: closed under nonnegative linear combinations, i.e.
Chapter 2. Simplex method
BASIC FEASIBLE SOLUTIONS
8/7/2019 Berhanu G (Dr) 1 Chapter 3 Convex Functions and Separation Theorems In this chapter we focus mainly on Convex functions and their properties in.
Chapter 2. Simplex method
CSE 203B: Convex Optimization Week 4 Discuss Session
Presentation transcript:

Convex functions Lecture 4 Dr. Zvi Lotker

In the last lecture Operations that preserve convexity Carathéodory's theorem Radon's theorem Helly's theorem Separating hyperplane theorem The isolation theorem

Carathéodory's theorem If a point xRd lies in the convex hull of a set P, there is a subset P’of P consisting of no more than d+1 points such that x lies in the convex hull of P′. (0,1) (1,1) (0,0) (1,0)

Carathéodory's theorem Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1,…, xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk (xk-x1)=0

Carathéodory's theorem Let xConv(P). Then, x is a convex combination of points in P. I.e. x=1x1+…+ kxk where every xjP, every λj0 and 1+…+ k=1. Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0

Carathéodory's theorem Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0

Carathéodory's theorem Suppose k>d+1 x2-x1+…+ xk-x1, are linearly dependent so there are real scalars μ2, ..., μk, not all zero, s.t μ2(x2-x1)+…+μk,(xk-x1)=0 Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0

Carathéodory's theorem Let μ1:=-(μ2+…+μk) μ1+μ2+…+μk=0 μ1x1+μ2x2+…+μkxK=0 and not all of the μj are equal to zero Therefore, at least one μj>0 Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μi=0

Carathéodory's theorem Then x= 1x1+…+ kxk-(μ1x1+μ2x2+…+μkxK) Def =min{i/μi: μi>0 } For all i i-μi0 and for some i, i-μ=0 What is 1-μ1+…+ k-μk=?

Convex of Compact set Theorem If SRd is a compact set, then conv(S) is a compact set Proof Let  be the standard simplex in Rd+1.  is compact Sd+1 is compact, Sj={(x1,…,xj):xiS) Consider the map : Sd+1Rd (u1,…,ud+1;a1,…,ad+1)=a1u1+…+ad+1ud+1 Carathéodory's theorem implies that the Image of  is convex. Since  is continuous the Image of  is Compact.

Question Is it necessary to have d+1 points Conv(S)=Conv(Conv(S)) ABConv(A) Conv(B) Is the set {t x+(1-t)y: x,yP, 1>t>0} is convex?

Radon's theorem (1887, 1956) Any set of d + 2 points in Rd can be partitioned into two (disjoint) sets whose convex hulls intersect.

Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof Suppose X={x1,x2,…,xd+2} Rd Since any set of d+2 points in Rd is affinely dependent, there exists a set of multipliers a1,…,ad+2 not all of them 0 s.t a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0

Radon's theorem (1887, 1956) Theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Proof a1x1+…+ ad+2xd+2=0, a1+…+ ad+2=0 Let I={i: ai>0}, J={i: ai<0}, X1={xi: ai>0}, X2={xi: ai<0} z=(iIaixi )/(iIai)Conv(X1)Conv(x2)

Helly's theorem (1884-1943) Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.

Proof of Helly's theorem The proof is by induction on m. If m=d+1, then the statement is true. Suppose the statement is true if m-1>d.

Proof of Helly's theorem The sets Bj=ijAi by inductive hypothesis. Pick a point pi from each of Bi, {p1,…,pm} By Radon's lemma, there is a partition of p's into two sets P1,P2 s.t. xX=conv(P1)conv(P2) I1={i:pi P1}, I2={i:pi P1} Let xX. We claim that xAi.

Proof of Helly's theorem Note that for all ji, pjAi. Consider i{1,2,…,n} Then i  I1 or i I2 . Assume that i I1, i I2 So xconv(P2)  Ai Therefore xAi.

Separating hyperplane theorem if C and D are disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x  D strict separation requires additional assumptions (e.g., C is closed, D is a singleton)

The isolation theorem Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. Proof. We can assume u=0.

Summary theorem Carathéodory's theorem: Radon's theorem: For all xRdConv(P), there exist subset P’P consisting of no more than d+1, s.t. xConv(P‘). Radon's theorem: Let SRd be a set containing at least d+2 points. Then there are two non intersecting subsets R,BS s.t conv(R)conv(B) Helly's theorem: Suppose A1,…AmRd is a family of convex sets, and every d+1 of them have a non-empty intersection. Then Ai is non-empty.

Summary theorem Separating hyperplane theorem: The isolation theorem: if C and D are compact disjoint convex sets, then there exists a=0, b such that a’x ≤ b for xC and a’x≥b for x  D The isolation theorem: Let ARd be an open convex set, Let uA be a point in R then there exists an affine hyperplain H which contains u and strictly isolates A. We can proof the Separating form the isolation by def A=C-D

How this is connected to optimization If we can check if the intersect is not empty we can search for optimal. Example

How this is connected to optimization If we can check if the intersect is not empty we can search for optimal. Example

Outline of the lecture Convex function Examples First-order condition Second-order conditions Jensen's inequality Operations that preserve convexity

Convex function A real-valued function f defined on an interval (or on any convex subset C of some vector space) is called convex, if for any two points x and y in its domain C and any t in [0,1], we have f[tx+(1-t)y]≤tf[x]+(1-t)f[y] f is concave if -f is convex

Examples: convex functions on R Affine: ax + b on R, for any a,bR. exponential: eax, for any aR. powers: xa on R++, for a≥1,a<0. powers of absolute value: |x| on R, for p≥1 negative entropy: xlog(x) on R++.

Examples: concave functions on R Affine: ax + b on R, for any a,bR. powers: xa on R++, for 0≤a<1. negative entropy: log(x) on R++.

Examples on Rn and Rmn Affine function f(x) = a’x + b Norms: ||x|| Max: f(x)=max{x1,…,xn} f(X) = tr(A’X) + b =Ai,j Xi,j+b spectral (maximum singular value) norm f(X) = ||X||2 = max(X)=max(X’X)

Max: f(x)=max{x1,…,xn} f(tx+(1-t)y)=max{txi+(1-t)yi} ≤t max{xi}+(1-t)max{yi} =tf(x)+(1-t)f(y)

Extended-value extension extended-value extension  of f is (x) = f(x) for all x  dom f (x) =  for all x  dom f often simplifies the notation for example- the condition (tx+(1-t)y)≤t[x]+(1-t)[y], for all t[0,1]

Properties of Convex functions A Convex function on an open neighborhood is continuous

First-order condition f is differentiable if dom f is open and the gradient f(x) exists at each xdom f 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f f(x)+f(x)(y-x) f(x) (x, f(x))

First-order condition 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f is convex and y,xdom f tx+(1-t)ydom f (1-t)f(x)+tf(y)≥f(x+t(y-x)) So f(y)≥f(x)+(f(tx+(1-t)y)-f(x))/t

First-order condition 1st-order condition: differentiable f with convex domain is convex iff f(y)≥f(x)+f(x)(y-x) for all y,xdom f Proof first we prove this for d=1 Assume f(y)≥f(x)+f’(x)(y-x) for all y,xdom f Let z=tx+(1-t)y f(y)≥f(z)+f’(z)(y-z), f(x)≥f(z)+f’(z)(x-z) (1-t)f(y)≥(1-t)f(z)+(1-t)f’(z)(y-z), tf(x)≥tf(z)+tf’(z)(x-z) tf(x)+(1-t)f(y) ≥f(tx+(1-t)y)

First-order condition Now we prove for the general d f:RdR, y,xRd, Consider f to be the line passing through x,y. g(t)=f(ty+(1-t)x) g’(t)=f(ty+(1-t)x)t(y-x), If f is convex then g is convex and we can use d=1 on g g(1)≥g(0)+g’(0)f(y)≥f(x)+f(x)t(y-x)

First-order condition Now we prove the case for a general d Now we assume that f(y)≥f(x)+f(x)t(y-x) Let x,ydom f, t,s [0,1] If f(ty+(1-t)x)≥f(sy+(1-s)x)+f(sy+(1-s)x)t(y-x)(t-s) i.e. g(t) ≥ g(s)+g’(s)(t-s) and g is convex.

Restriction of a convex function to a line f : RnR is convex if and only if the function g : RR, g(t)=f(x+tv) Dom(g)={t:x+tvdom f } is convex (in t) for any x+tvdom f, vRn You can check convexity of f by checking convexity of functions of one variable

Second-order conditions f is twice differentiable if domf is open and the Hessian 2f is symmetric 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2f(x)≥0 for all xdom f

Example f(x)=xlog(x) f(x)’=log(x)+1,f’’(x)=1/x>0 for all x>0 Norm if f:RnR is norm and 0≤t≤1 then f(tx+(1-t)y) ≤f(tx)+f((1-t)y)=tf(x)+(1-t)f(y)

Example f(x) = 1/2xtPx + qtx + r, (with PSn) least-squares objective: f(x) = Px + q, 2f(x) = P Convex iff P≥0 least-squares objective: f(x) = ||Ax- b||22  f(x) = 2At(Ax- b), 2f(x) = 2AtA f(x) convex for all A.

Example Example. f:SnR with f(X)=log detX, domX=Sn++ g(t) = log det(X+tV ) =log det(X1/2(I+tX-1/2VX-1/2)X1/2) =log detX + log det(I+tX-1/2VX-1/2) =log detX + log(1+ti) where I are the eigenvalues of X-1/2VX-1/2 Therefore g’(t)=i/(1+ti), g’’(t)=-i2/(1+ti)2<0

geometric mean is concave f(x)=(xi)1/n 2fi,i=-(xi)1/n(n-1)/(n2xi2) 2fi,j=(xi)1/n/(n2xixj) for ij 2f=(xi)1/n(n diag[x1-2,…, xn-2]-qtq)/n2 Where qi=1/xi We show that 2f≤0 v2fv ≤0=-(xi)1/n/n2(nvi2/xi2- (vi/xi)2)≤0 This follows from from Cauchy-Schwarz inequality (ata)(btb)≥0, applied for a=1,bi=vi/xi

Epigraph and sublevel set -sublevel set of f:RnR: C={xdom f:f(x)≤} sublevel sets of convex functions are convex (converse is false) epigraph of f:RnR: epi f = {(x,t) Rn:f(x)≤t} f is convex if and only if epi f is a convex set

Jensen's inequality basic inequality: if f is convex, then for 0≤t≤1, f(tx+(1-t)y)≤tf(x)+(1-t)f(y) It can be extend to convex combinations of more than two points: For all i, 0≤ti≤, s.t, ti=1, f(tixi)≤tif(xi)

Jensen's inequality For all i, 0≤ti≤, s.t, Proof by induction ti=1, f(tixi)≤tif(xi) Proof by induction Assume the theorem is true for n f(tixi)=f(t1x1+(1-t1)ti/(1-t1)xi) ≤ t1f(x1)+(1-t1)tif(ti/(1-t1)xi) And we can use the induction. Another way to write Jensen's inequality is f(E[x])≤ E[f(x)]

Example (ab)1/2≤(a+b)/2 For a,b≥0, we look on the function –logx This function is convex -log((a+b)/2)) ≤(-log a-log(b))/2 Taking the exponential of both sides yields (ab)1/2≤(a+b)/2

Information theory If p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives E[φ(y)]≥φ(E[y]) p(x)log(p(x)/q(x))dx≥ -logp(x)q(x)/p(x))dx = -logq(x)dx=0 And therefore p(x)log(p(x)dx≥ p(x)log(q(x)dx

Operations that preserve convexity practical methods for establishing convexity of a function check definition (often simplified by restricting to a line) for twice differentiable functions, show 2f(x)≥0 for all xdom f show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective