Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.

Seminar in Advanced Machine Learning Rong Jin

Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this semester Convex optimization Semi-supervised learning

Course Description

Course Organization  Each group has 1 to 3 students  Each group covers one or two topics Usually each topic will take two lectures Please send me the information about each group and interesting topics to cover by the end of this week  May take 1~2 credits as enrolling in independent study (CSE890)

Course Organization  Course website: http://www.cse.msu.edu/~rongjin/adv_ml http://www.cse.msu.edu/~rongjin/adv_ml  The best way to learn is discussion, discussion and discussion Never hesitate to raise questions Never ignore any details Let’s have fun!

Convex Programming and Classification Problems Rong Jin

Outline  Connection between classification and linear programming (LP), convex quadratic programming (QP) has a long history  Recent progresses in convex optimization: conic and semi-definite programming; robust optimization  The purpose of this lecture is to outline some connections between convex optimization and classification problems

Support Vector Machine (SVM)  Training examples  Can be solved efficiently by quadratic programming D = f( x 1 ; y 1 ) ; ( x 2 ; y 2 ) ;:::; ( x n ; y n )g w h ere x i 2 R d ; y i 2 f ¡ 1 ; + 1 g m i n k w k 2 2 s. t. y i ( w > x i ¡ b ) ¸ 1 ; i = 1 ; 2 ;:::; n

SVM: Robust Optimization  SVMs are a way to handle noise in data points assume each data point is unknown-but-bounded in a sphere of radius  and center x i find the largest  such that separation is still possible between the two classes of perturbed points ½

SVM: Robust Optimization  How to solve it? max½ s. t. 8 i = 1 ; 2 ;:::; n j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1

SVM: Robust Optimization j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1 y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1 max½ s. t. 8 i = 1 ; 2 ;:::; n y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1

Robust Optimization  Linear programming (LP)  Assume a i 's are unknown-but-bounded in ellipsoids  Robust LP m i n c > x ; : a > i x · b i ; i = 1 ; 2 ;:::; n m i n c > x ; : 8 a i 2 E i ; a > i x · b i ; i = 1 ; 2 ;:::; n E i = f a i j( a i ¡ ^ a i ) > § ¡ 1 i ( a i ¡ ^ a i ) · 1 g

Minimax Probability Machine (MPM)  How to decide the decision boundary ? x > a · b N ega t i vec l ass: x ¡ » N ( ¹ x ¡ ; § ¡ ) x > a ¸ b Negative Class Positive Class P os i t i vec l ass: x + » N ( ¹ x + ; § + ) x > a = b

Minimax Probability Machine (MPM) w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i nmax ( ² + ; ² ¡ ) s. t. P r ( x > + a · b ) = 1 ¡ ² + P r ( x > ¡ a ¸ b ) = 1 ¡ ² ¡

Minimax Probability Machine (MPM) w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i n² s. t. P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²

Minimax Probability Machine (MPM)  Assume x follows the Gaussian distribution N ( ¹ x ; § ) P r ( x > a · b ) · 1 ¡ ² ¹ x > a + · k § a k 2 · b w h ere· = © ¡ 1 ( 1 ¡ ² ) x > a · b x > a ¸ b Negative Class Positive Class

Minimax Probability Machine (MPM) max· s. t. x > + a + · k § + a k 2 2 · b x > ¡ a ¡ · k § ¡ a k 2 ¸ b Second order cone constraints m i n® + ¯ s. t. a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2

Second Order Cone Programming (SOCP) x o ¸ p x 2 1 + x 2 2 ® ¸ k § ¡ a k 2 y = § ¡ a ® ¸ k y k 2 z 2 Q Ã! z º Q 0 C one: Q = f z j z 0 ¸ k ¹ z k 2 g

SOCPLP Generalize the inequality definition

Minimax Probability Machine (MPM) m i n² s. t. P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ² w h ere x ¡ » N ( ¹ x ¡ ; § ¡ ) an d x + » N ( ¹ x + ; § + ) m i n² s. t. i n f x + » ( ¹ x + ; § + ) P r ( x > + a · b ) ¸ 1 ¡ ² i n f x ¡ » ( ¹ x ¡ ; § ¡ ) P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²

MPM  Chebychev inequality i n f x » ( ¹ x ; § ) P r ( x > a · b ) = ( b ¡ ¹ x > a ) 2 + ( b ¡ ¹ x > a ) 2 + + a > § a w h ere [ x ] + ou t pu t s 0 w h enx < 0 an d xw h enx ¸ 0. m i n® + ¯ s. t. a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2

Pattern Invariance In Images Translation Rotation Shear

Learning from Invariance Trans. Á1Á1 Á2Á2

Incorporating Invariance Trans.  Invariance transformation  SVM incorporating invariance trans. Infinite number of examples T ( x ; µ = 0 ) = x x ( µ ) = T ( x ; µ ) : R d £ R ! R d m i n k w k 2 2 s. t. 8µ 2 R ; i = 1 ; 2 ;:::; n y i ( w > x i ( µ ) ¡ b ) ¸ 1

Taylor Approximation of Invariance  Taylor Expansion about  0 =0 gives

Polynomial Approximation  What is the necessary and sufficient condition for a polynomial to always be non-negative? 8µ 2 R :y w > x ( µ ) ¡ 1 ¸ 0

Non-Negative Polynomials (I)  Theorem (Nesterov,2000): If r =2 l, the necessary and sufficient condition for polynomial p ( µ ) to be non-negative everywhere is  Example: 9 P º 0 ; s. t. p ( µ ) = µ > P µ

Semidefinite Programming Machines A j := g 1,j gi,jgi,j gm,jgm,j B:= 1 1 1 1 G 1,j Gi,jGi,j Gm,jGm,j 10 00 10 00 10 00 Semi-definite programming

Semidefinite Programming (SDP) LP SDP Generalize the inequality definition

Beyond Convex Programming  In most cases, problems are non-convex optimization  Approximation Linear programming approximation LMI relaxation (drop rank constraints) Submodular function approximation Difference of two convex functions (DC)

Example: MAXCUT Problem  Exponential # of points = NP-hard problem ! m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g m i n x > Q x s. t. x 2 i = 1

LMI Relaxation x 2 i = 1 X = xx > X i ; i = 1 m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 X º 0 ; ran k ( X ) = 1 m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 ; X º 0

How Good is the Approximation?  Nesterov prove recently d ¤ = m i n x > Q x s. t. x i 2 f ¡ 1 ; + 1 g g ¤ = m i n X i ; j Q i ; j X i ; j s. t. X i ; i = 1 ; X º 0 1 ¸ g ¤ d ¤ ¸ 2 ¼ = 0 : 6366

What you should learn ?  Basic concepts of convex sets and functions  Basic theory of convex optimization  How to formulate a problem into the standard convex optimization?  How to efficiently approximate the solution given large datasets?  (optional) How to approximate the non-convex programming problems into a convex one?

Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.

Similar presentations

Presentation on theme: "Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.

Similar presentations

Presentation on theme: "Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this."— Presentation transcript:

Similar presentations

About project

Feedback