Chapter 3 Convex Functions and Separation Theorems In this chapter we focus mainly on Convex functions and their properties in relation with optimization
Convex functions Def n : Let S n be a convex set. A functional f : S R is said to be convex (over S) if for any x 1, x 2 S and any λ [0,1], f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ). f is said to be concave iff f is convex [ so, f is concave iff … f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ) ] Neither convex nor concave y=f(x) Convex function y=f(x) Concave function f is strictly convex if for every distinct x 1, x 2 S, and λ (0,1), f(λx 1 + (1 λ)x 2 ) < λf(x 1 ) + (1 λ)f(x 2 ).
Examples: 1. Given x 0 R n, let d: R n R be given by d(x) = ||x – x o ||. Then d is a convex function, 2. Let q(x) = x T Ax, where A is an n n symmetric matrix. (i) q is convex if A is positive semi definite (ii) q is strictly convex if A is positive definite. [Note: A is PSD 2x T Ay x T Ax + y T Ay, as (x y) T A(x y) 0; and the inequalities are strict when A is PD and x y. ] 3. Let f(x) = a T x+ b, where a R n, b R (affine function) Then, f is convex. (In fact, affine function is also concave.) Note: f : S R is convex iff for every points x 1, x 2, …, x n in S
hyp(f) Def n : Let f : S R. 1. The epigraph of f is the set epi(f):= (x,y) SxR : f(x) y . 2. The hypograph of f is the set hyp(f):= (x,y) SxR : f(x) ) y . y=f(x) hyp(f) y=f(x) hyp(f) epi(f) Theorem : Let S R n be convex and f : S R. f is convex function epi(f) is convex set epi(f)
4.2: Some Properties of Convex functions Theorem 1: If f: S R is a convex function, then the level set S = { x S | f(x) }, where , is a convex set. In this section, let S n be a convex set unless stated otherwise. Proof: ( Direct computations ) Theorem 3: If f 1, f 2 : S R are convex functions, then f(x) = 1 f 1 (x) + 2 f 2 (x), where 1, 2 0, is a convex function Proof: Take x 1, x 2 S , [0,1] and x= x 1 +(1 )x 2 By convexity, f( x 1 +(1 )x 2 ) f(x 1 )+ (1 )f(x 2 ) +(1 ) = x 1 +(1 )x 2 S Theorem 2: Let g i : S R, i=1,2,…m be convex functions. Then, S = { x S | g i (x) 0, i=1,2,…m} is a convex set. Proof: Let S i0 = {x S | g i (x) 0 }, i=1,2,…m. S i0 is convex for each i (Thm 1) S= i S i0 is convex.
f is convex iff its hessian 2 f(x) is positive semi definite at each x S. Theorem 5: Let S be open and f: S R be twice differentiable. Proof: Follows from 2 nd order Taylor's Theorem, continuity of 2 f(x), and Theorem 4 above. f is convex iff f(x) f(x 0 ) + f(x 0 ) T (x x 0 ) for every x 0, x S (called subgradient inequality) Theorem 4: Let S be open and f: S R is differentiable. Proof: ( ) f(x 0 + (x x o ) f(x 0 )+ [f(x) f(x 0 )] Df(x 0 ; x x 0 ) f(x) f(x 0 ) ( ) Let x 0 = x 1 + (1 )x 2. Then, f(x 1 ) f(x 0 ) + f(x 0 ) T (x 1 -x 0 ), f(x 2 ) f(x 0 ) + f(x 0 ) T (x 2 -x 0 ) Then multiply the 1 st inequality by and the 2 nd by (1 ) and add. f(x 1 ) + (1 ) f(x 2 ) f(x 0 ). Example: 1. Show that f(x,y, z) = x 4 + y 2 + e z 5y is convex on Find a domain (set) on which f(x,y) = x 3 + y 2 +y is convex
4.3: Minimizing Convex functions Theorem 6: Let S be a convex set and f: S R a convex function. x o is a local minimum of f over S iff x 0 is global minimum of f over S. Proof: A problem If minimizing a convex function over a convex set is called is called convex programming. That is, if S is a convex set and f is a convex function (on S), then min f(x) s.t. x S is convex programming. In particular, if f, g i : R n R, i =1,2,…,m are all convex functions and h j : R n R, j =1,2,…,k are all affine functions, then min f(x) s.t. g i (x) 0, i=1,2,…,m h j (x) = 0, j =1,2,…,k x 0 is a convex programming problem. (Follows from Theorem 2)
: (Convex Programming Optimality Conditions) Theorem 8: (Convex Programming Optimality Conditions) Let S be convex Let S R n be convex and f: S R be a convex function. Then, 1. x o S minimizes f on S iff f(x o ), x – x o 0, x S. 2. x o int(S) minimizes f on S iff f(x o ) = 0. ( In particular, if f(x o ) = 0, at x o R n, then x o is the minimizer of f on R n.) Theorem 7: Let if f : R n R be a differentiable convex function. x o is a minimum of f (over R n ) iff f(x 0 )=0
Quasi-Convex functions Definition: Let S be a convex set and f : S R. f is said to be quasi-convex if S (f) is convex, for every R. 1. f : R R, f(x) = x 3 f is quasi- convex Examples: 2.Every convex function is quasi-convex. 3. If f : R R is monotonic (increasing /decreasing), then f is quasi-convex. A quasi-convex function can be characterized also as follows: Theorem 8: Let S be convex and f : S R. f is quasi-convex iff for each x 1,x 2 S and [ 0,1], f(λx 1 + (1- λ)x 2 ) max f(x 1 ), f(x 2 ) .
Theorem 9: Let S be convex and f : S R be quasi-convex function. Suppose M = x o S | f(x o ) f(x), x S . (set of minimal points) Then, M is convex. Theorem 10: Let S be convex and f : S R be strictly quasi-convex. If x o S is a local minimizer of f over S, then it is global minimizer of f over S. Note: Some important properties of Convex functions hold also for quasi-convex functions. For instance, the following two Theorem: Definition: Let S be convex and f : S R. f is said to be strictly quasi-convex if for every x 1, x 2 S with f(x 1 ) ≠ f(x 2 ) and [ 0,1], we have f(λx 1 + (1- λ)x 2 ) < max f(x 1 ), f(x 2 ) . Proof: Let α = f(x 0 ), where x 0 is a minimizer of f on S. Notice that M = S α (f) and hence convex since f is quasi-convex. Proof: Similar to the prove of Theorem 6
Approximations and Separation Theorems Given a nonempty S V, and y V \ S, the theory of best approximation deals with the problem of finding an x S which is closest to y. In the sequel, V is a normed real vector space Definition: Let S V and y V \ S. x is called the best approximation of y in S, if x S and || y x || || y x ||, x S. i.e., best approximation of y in S is a solution of the minimization problem min { || y x || : x S }
Examples of problem best approximation: 1. Let V = { f : [ -1, 1] R | |f (x)| < }, S = C 1 [-1,1]. b. Let y 2 (x) = |x|. What is the best approximation of y 2 (x) in S ? What is the best approximation of y 1 (x) in S ?. a. 2. Let V= R 2, S = { X R 2 | ||X|| 1 } and Y= (2,3). What is the best approximation of the point Y in S ? Definition: Let S V. S is said to be proximinal if for every y V there is a best approximation of y in S. That is, a set S V is proximinal if the problem min { || y x || : x S } has a solution for any y V.
Theorem 11: Let S R n, S ≠ Ø, and closed. Then, 1. S is proximinal. 2. additionally if S is convex, then for any y R n its best approximation in S is unique. Proof: (1) Given any y R n, define d: S ℝ by d(x)= ||x y ||. Pick an x 0 S, and let α = d(x 0 )= ||x 0 y ||. S α = { x S: d(x) α } is compact. Hence, d has a minimizer on S α. Consequently, d has a minimizer on S. (2) The uniqueness follows from the fact that d is strictly convex. The following theorem gives us a sufficient condition for existence of solution for approximation problem in R n.
Definition: Let V be an inner product space and S V. A nonzero u V is said to be normal to the set S at x S if u, x x ≤ 0, x S. Examples: Let S = { (x,y) T R 2 : 0 ≤ x, y ≤ 2 } 1) u = (-1,0) T is normal to S at x= (0,1) T. 2) u = (0,1) T is normal to S at x = (1,2) T. 3) u = (a,b) T, where a, b ≤ 0 (but not both 0 ) is normal to S at x =(0,0) T. Theorem 12: Let S be a nonempty closed convex subset of V and y V \ S. x S is the best approximation of y in S iff y – x is normal to S at x. Proof:
Proof of Theorem 12: ( By Thm 11 the best approximation exists) y – x, x – x 0. ( ) Let y – x be normal to S at x, and take arbitrary x S. Now, ||y x || 2 = || y – x – (x – x) || 2 = || y – x || 2 + || x – x || 2 – 2 y – x, x – x || y – x || 2 ( ) Let || y x || || y x ||, x S. Take arbitrary x S, λ [0,1], and let x λ = x + λ(x – x). x λ S || y – x || 2 || y – x λ || 2 = || y – x – λ(x – x) || 2 y – x, x – x (λ/2 ) ||x – x || 2 Thus, taking λ 0 +, we get y – x, x – x 0.
Definition:Let S V, and H = { x V | u, x = α } be a hyperplane for some nonzero u V* and α R. H is said to be support S at x o iff x o S ∩ H and either S H – or S H +. ● x o S Supporting line at more than one point S S Several Supporting lines at x o ●xo●xo S No supporting line at x o Examples: Exercise: Let S be a nonempty closed convex subset of R n and y R n \ S. Show that x o S is the best approximation of y in S if and only if H = {x R n | u, x- x o = 0 } supports S at x o, where u = y– x o and S H – One unique supporting line x1●x1● x2 ●x2 ● ● x o
8/7/2019 Berhanu G (Dr) 17 Definition: Let S 1 and S 2 be nonempty subsets of V. S 1 and S 2 are said to be 1) separable if there is a nonzero u V* and α R such that u, x ≤ α ≤ u, y x S 1, y S 2. 2) strongly separable if there is a nonzero u V* such that Sup { u, x | x S 1 } < Inf { u, x | x S 2 }. That is, considering the hyperplane H = { x V | u, x = α }, S 1 and S 2 are 1) separable if S 1 H – and S 2 H +. ( In this case, we say H separates S 1 and S 2 ) 2) strongly separable if S 1 H –, S 2 H + and H supports neither S 1 nor S 2. ( In this case, we say H strongly separates S 1 and S 2 ) Separable S1S1 S2S2 H u Strongly Separable S1S1 S2S2 H u S1S1 S2S2 Not separable
8/7/2019 Berhanu G (Dr) 18 Theorem 14: Let S R n be a closed convex set and y S.Then {y} and S can be strongly separable. Theorem 13: Let S R n be convex and 0 cl(S). Then, 1) If a cl(S) is the element of minimal norm, then a, x ||a|| 2 > 0 x cl(S). 2) {0} and S are strongly separable. Theorem 15: Let S R n be a closed convex set and x o bd(S). Then there is a hyperplane that supports S at x o. Proof:(1) --- (2) follows directly from (1). Notice that if we take α = ½ ||a|| 2, then H = { x R n | a, x = α } strongly separates {0} and S. ( {0} H -, S H + and H supports neither of them. )
8/7/2019 Berhanu G (Dr) 19 Theorem 16: Let S R n be a convex set and x o (S). Then there is a nonzero u R n such that u, x - x o ≤ 0, x S. Theorem 17: (Separation Theorem for two sets) Proof: Let S 1 and S 2 be two disjoint convex subsets of R n. Then there is a hyperplane that separates S 1 and S 2. Proof: Directly follows from Theorem Corollary 18: Let S 1 and S 2 be two disjoint convex subsets of R n. Then there is a a nonzero u R n such that Sup { u, x | x S 1 } ≤ Inf { u, x | x S 2 }.
8/7/2019 Berhanu G (Dr) Subdifferentials Defn: Let S V convex, Defn: Let S V convex, x o S and f : S R is a function. A vector V (or V* ) is called a subgradient of f at x o if f(x) f(x o ) + , x – x o , for all x S. The set of all subgradients of f at x o, denoted by f(x o ), is called subdifferential of at x o. i.e., f(x o ) = V : f(x) f(x o ) + , x – x o , for all x S If f(x o ) , then f is said to be subdifferentiable at x o. Example: Let f(x) = ||x|| on R n. Then, f(0)= { R n : || || 1 }. Theorem 19: f(x o ) is a convex set. Note: subgradient at a point may not be unique. We will show that a subgradient is unique at a point where f is differentiable.
8/7/2019 Berhanu G (Dr) 21 : Let S V be a convex set Theorem 20: Let S V be a convex set and f : S R. If f is subdifferentiable on int(S), then f is convex. : Let S V be convex Theorem 21: Let S V be convex and f : S R be a convex functional. Then, f(x o ) at every x o int(S). : Let S V be convex Theorem 23: Let S V be convex and f: S R be a convex functional. Then, x o S minimizes f on S iff f(x o ) s.t. , x – x o 0, x S. : Let S V be convex Corollary 24: Let S V be convex and f: S R be a convex functional. Then, x o int(S) minimizes f on S iff 0 f(x o ). : Let S V be convex Corollary 22: Let S V be convex and open ; f: S R be a functional. Then, f is convex iff f(x o ) at each x o S.
8/7/2019 Berhanu G (Dr) Subgradient Optimization Method Let S be a convex set. Consider a Convex Programming Problem: (P) min { f(x) : x S } where, f : S R is convex, but not necessarily differentiable. The subgradient method to solve (P) : Step 1: Start with initial point x 0 S. Step 2: At the current iterate point x k, find a subgradient k of f at x k. Step 3: If k =0, STOP (x k is optimal solution). Otherwise, set d k = – k / || k ||, and Let y = x k + k d k, where k > 0 is a suitable step length. The next iterate point is x k+1 = y, if y S ; else x k+1 = P S ( y) where P S (y) = x k ; i.e., x k S is the best approximation of y in S. Step 4: Repeat Step 2 and 3 until a stopping condition holds.
8/7/2019 Berhanu G (Dr) 23 Note: 1. For the subgradient method to be practical, there should be tractable way to identify a subgradient of f at every iterate point; and to perform the projection operation P S (y). This depends on the specific problem. 2. The step direction at each iterate point x k is d k = – k / || k ||, where k f(x k ) This direction need not necessarily be a descent direction; However, if x* is an optimal solution of (P) and k > 0 is small, we get || x k+1 – x* || < || x k – x* ||, for each k. i.e., for each k, x k+1 gets closer to x* than x k is. This is so, because for every non-optimal x k we have – k, x* – x k > 0 since f(x*) f(x k ) + k, x* – x k and f(x*) < f(x k ).
