Download presentation
Presentation is loading. Please wait.
1
Seminar in Advanced Machine Learning Rong Jin
2
Course Description Introduction to the state-of-the-art techniques in machine learning Focus of this semester Convex optimization Semi-supervised learning
3
Course Description
4
Course Organization Each group has 1 to 3 students Each group covers one or two topics Usually each topic will take two lectures Please send me the information about each group and interesting topics to cover by the end of this week May take 1~2 credits as enrolling in independent study (CSE890)
5
Course Organization Course website: http://www.cse.msu.edu/~rongjin/adv_ml http://www.cse.msu.edu/~rongjin/adv_ml The best way to learn is discussion, discussion and discussion Never hesitate to raise questions Never ignore any details Let’s have fun!
6
Convex Programming and Classification Problems Rong Jin
7
Outline Connection between classification and linear programming (LP), convex quadratic programming (QP) has a long history Recent progresses in convex optimization: conic and semi-definite programming; robust optimization The purpose of this lecture is to outline some connections between convex optimization and classification problems
8
Support Vector Machine (SVM) Training examples Can be solved efficiently by quadratic programming D = f( x 1 ; y 1 ) ; ( x 2 ; y 2 ) ;:::; ( x n ; y n )g w h ere x i 2 R d ; y i 2 f ¡ 1 ; + 1 g m i n k w k 2 2 s. t. y i ( w > x i ¡ b ) ¸ 1 ; i = 1 ; 2 ;:::; n
9
SVM: Robust Optimization SVMs are a way to handle noise in data points assume each data point is unknown-but-bounded in a sphere of radius and center x i find the largest such that separation is still possible between the two classes of perturbed points ½
10
SVM: Robust Optimization How to solve it? max½ s. t. 8 i = 1 ; 2 ;:::; n j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1
11
SVM: Robust Optimization j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1 y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1 max½ s. t. 8 i = 1 ; 2 ;:::; n y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1
12
Robust Optimization Linear programming (LP) Assume a i 's are unknown-but-bounded in ellipsoids Robust LP m i n c > x ; : a > i x · b i ; i = 1 ; 2 ;:::; n m i n c > x ; : 8 a i 2 E i ; a > i x · b i ; i = 1 ; 2 ;:::; n E i = f a i j( a i ¡ ^ a i ) > § ¡ 1 i ( a i ¡ ^ a i ) · 1 g
13
Minimax Probability Machine (MPM) How to decide the decision boundary ? x > a · b N ega t i vec l ass: x ¡ » N ( ¹ x ¡ ; § ¡ ) x > a ¸ b Negative Class Positive Class P os i t i vec l ass: x + » N ( ¹ x + ; § + ) x > a = b
14
Minimax Probability Machine (MPM) w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i nmax ( ² + ; ² ¡ ) s. t. P r ( x > + a · b ) = 1 ¡ ² + P r ( x > ¡ a ¸ b ) = 1 ¡ ² ¡
15
Minimax Probability Machine (MPM) w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i n² s. t. P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²
16
Minimax Probability Machine (MPM) Assume x follows the Gaussian distribution N ( ¹ x ; § ) P r ( x > a · b ) · 1 ¡ ² ¹ x > a + · k § a k 2 · b w h ere· = © ¡ 1 ( 1 ¡ ² ) x > a · b x > a ¸ b Negative Class Positive Class
17
Minimax Probability Machine (MPM) max· s. t. x > + a + · k § + a k 2 2 · b x > ¡ a ¡ · k § ¡ a k 2 ¸ b Second order cone constraints m i n® + ¯ s. t. a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2
18
Second Order Cone Programming (SOCP) x o ¸ p x 2 1 + x 2 2 ® ¸ k § ¡ a k 2 y = § ¡ a ® ¸ k y k 2 z 2 Q Ã! z º Q 0 C one: Q = f z j z 0 ¸ k ¹ z k 2 g
19
SOCPLP Generalize the inequality definition
20
Minimax Probability Machine (MPM) m i n² s. t. P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ² w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) m i n² s. t. i n f x + » ( ¹ x + ; § + ) P r ( x > + a · b ) ¸ 1 ¡ ² i n f x ¡ » ( ¹ x ¡ ; § ¡ ) P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²
21
MPM Chebychev inequality i n f x » ( ¹ x ; § ) P r ( x > a · b ) = ( b ¡ ¹ x > a ) 2 + ( b ¡ ¹ x > a ) 2 + + a > § a w h ere [ x ] + ou t pu t s 0 w h enx < 0 an d xw h enx ¸ 0. m i n® + ¯ s. t. a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2
22
Pattern Invariance In Images Translation Rotation Shear
23
Learning from Invariance Trans. Á1Á1 Á2Á2
24
Incorporating Invariance Trans. Invariance transformation SVM incorporating invariance trans. Infinite number of examples T ( x ; µ = 0 ) = x x ( µ ) = T ( x ; µ ) : R d £ R ! R d m i n k w k 2 2 s. t. 8µ 2 R ; i = 1 ; 2 ;:::; n y i ( w > x i ( µ ) ¡ b ) ¸ 1
25
Taylor Approximation of Invariance Taylor Expansion about 0 =0 gives
26
Polynomial Approximation What is the necessary and sufficient condition for a polynomial to always be non-negative? 8µ 2 R :y w > x ( µ ) ¡ 1 ¸ 0
27
Non-Negative Polynomials (I) Theorem (Nesterov,2000): If r =2 l, the necessary and sufficient condition for polynomial p ( µ ) to be non-negative everywhere is Example: 9 P º 0 ; s. t. p ( µ ) = µ > P µ
28
Semidefinite Programming Machines A j := g 1,j gi,jgi,j gm,jgm,j B:= 1 1 1 1 G 1,j Gi,jGi,j Gm,jGm,j 10 00 10 00 10 00 Semi-definite programming
29
Semidefinite Programming (SDP) LP SDP Generalize the inequality definition
30
Beyond Convex Programming In most cases, problems are non-convex optimization Approximation Linear programming approximation LMI relaxation (drop rank constraints) Submodular function approximation Difference of two convex functions (DC)
31
Example: MAXCUT Problem Exponential # of points = NP-hard problem ! m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g m i n x > Q x s. t. x 2 i = 1
32
LMI Relaxation x 2 i = 1 X = xx > X i ; i = 1 m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 X º 0 ; ran k ( X ) = 1 m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 ; X º 0
33
How Good is the Approximation? Nesterov prove recently d ¤ = m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g g ¤ = m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 ; X º 0 1 ¸ g ¤ d ¤ ¸ 2 ¼ = 0 : 6366
34
What you should learn ? Basic concepts of convex sets and functions Basic theory of convex optimization How to formulate a problem into the standard convex optimization? How to efficiently approximate the solution given large datasets? (optional) How to approximate the non-convex programming problems into a convex one?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.