Download presentation
Presentation is loading. Please wait.
Published byDayna Parks Modified over 9 years ago
1
Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009
2
1.Introduction 2.Preliminaries 3.Our algorithm 4.Experiment 5.Summary
3
Example Given a web page, predict if the topic is “DS 2009”. hypothesis set = words DS 2009? yn +1 DS 2009 ? yn +1 0.5* ALT? yn +1 + 0.3* Porto? y n +1 + 0.2* Weighted majority vote Modern Machine Learning Approach Find a weighted majority of hypotheses (or hyperplane) which enlarges the margin
4
1-norm soft margin optimization (aka 1 norm SVM) Popular formulation as well as 2 norm soft margin opt. “find a hyperplane which separates positive and negative instances well” margin ρ ξiξi loss ξ i Note ・ Linear Program ・ good generarization guarantee [Schapire et al. 98] margin loss normalized with 1 norm
5
1-norm soft margin optimization(2) Advantage of 1 norm soft margin opt. Solution likely to be sparse ⇒ useful for feature selection sparse hyperplane 0.5 *(DS 2009?) + 0.3 *(ALT?) + 0.2* (Porto?) non-sparse hyperplane 0.2 *(DS 2009?) + 0.1 *(ALT?) + 0.1 * (Porto?)* 0.1* (wine?)+0.05*(tasting?) +0.05* (discovery?)+ 0.03*(science?)+0.02*()+… 2 norm soft margin opt. 1 norm soft marign opt.
6
Recent Results Our result new algorithm for 1 norm soft margin optimization 2-norm soft margin optimization ・ Quadratic Programming ・ SMO [Platt, 1999] ・ SVM light [Joachims, 1999] ・ SVM Perf [Joachims, 2006] ・ Pegasos [Shai-Schwartz et al., 2007] There are state-of-the-art solvers 1-norm soft margin optimization ・ Linear Programming ・ LPBoost [Dem iriz et al, 2003] ・ Entropy Regularized LPBoost [Warmuth et al., 2008] ・ others [Mangasarian 2006][Sra 2006] not efficient enough for large data room for improvements
7
1.Introduction 2.Preliminaries 3.Our algorithm 4.Experiment 5.Summary
8
Boosting Classification: frog “+1”, others “-1” Hypotheses ・・・ color and size color size +1 1 . d 1 : uniform distribution 2 . For t=1,…,T (i) Choose hypothesis h t maximizing the edge w.r.t.d t (ii) Update distribution d t to d t+1 3. Output weighting of chosen hypotheses +1
9
Boosting (2) color size +1 1 . d 1 : uniform distribution 2 . For t=1,…,T (i) Choose hypothesis h t maximizing the edge w.r.t.d t (ii) Update distribution d t to d t+1 3. Output weighting of chosen hypotheses +1 h1h1 h 1 (x i ) Edge of hypothesis h w.r.t. distribution d y i h(x i )>0 if correct frog -1,or +1 ∈ [-1,+1]
10
Boosting (2) color size +1 1 . d 1 : uniform distribution 2 . For t=1,…,T (i) Choose hypothesis h t maximizing the edge w.r.t.d t (ii) Update distribution d t to d t+1 3. Output weighting of chosen hypotheses +1 h1h1 h 1 (x i ) More weights on Misclassified instances
11
frog Boosting (3) color size +1 1 . d 1 : uniform distribution 2 . For t=1,…,T (i) Choose hypothesis h t maximizing the edge w.r.t.d t (ii) Update distribution d t to d t+1 3. Output weighting of chosen hypotheses +1 h2h2 Note: more weights on “diffucult” instances
12
Boosting (4) color size +1 1 . d 1 : uniform distribution 2 . For t=1,…,T (i) Choose hypothesis h t maximizing the edge w.r.t.d t (ii) Update distribution d t to d t+1 3. Output weighting of chosen hypotheses +1 0.4h 1 +0.6h 2
13
Boosting & 1-norm soft margin optimizatoin “find the large-margin hyperplane which separates pos. and neg. instances as much as possible” find the distiribution which minimizes the maximum edge of hypotheses (find the most “difficult” distribution w.r.t. hypotheses) ≈ solvable using Boosting equivalent by duality of LP
14
LPBoost [Demiriz et al, 2003] Update: solve the dual problem w.r.t. the hypothesis set {h 1,…,h t } Output: the convex combination where α is the solution of the primal problem Theorem[Demiriz et al.] Given ε>0, LPBoost outputs ε-approximation of the optimal solution.
15
Properties of the optimal solution margin ρ* ξ* i loss ξ* i KKT conditions imply: Note: Optimal solution can be recovered using only instances with positive weigthts ν= 0.2 m (高々2割の事例がマージン ρ 未満) => d i <= 1/ν より ν 個は事例が必要
16
Properties of the optimal solution ( 2 ) By KKT conditions Sparseness of the optimal solution 1. Sparseness w.r.t. hypotheses weighting 2. Sparseness w.r.t. instances Note: Optimal solution can be recovered using only hypotheses with positive coefficients.
17
1.Introduction 2.Preliminaries 3.Our algorithm 4.Experiment 5.Summary
18
Our idea: Sparse LPBoost Take advantage of the sparseness w.r.t. hypotheses and instances 2.For t=1…. (i)Pick up instances with margin <ρ t (iii) solve the dual problem w.r.t. the past chosen instances by Boosting (ρ t+1 : the solution) 3 . Output the solution of the primal problem. Theorem Given ε>0, Sparse LPBoost outputs ε-approximation of the optimal solution.
19
Our idea (matrix form) Each row i corresponds to instance i Each column j corresponds to hypothesis j LPLPBoostSparse LPBoost “effective” constraints for optimal sol. whole matrixcolumns intersections of columns and rows intersections of column and rows Inequality Constraints of the dual problem
20
How to choose examples (hypotheses)? 1 st attempt: add an instance one by one less efficient than LP solver! Assumptions - # of hypotheses is constant ・ time complexity of a LP solver : m k (m: # of instances) Our method : Choose at most 2 t instances with margin < ρ (t:# of iterations) If the algorithm terminate after it chooose cm instances ( 0<c < 1 ) Note: same argument holds for hypotheses unknown
21
1.Introduction 2.Preliminaries 3.Our algorithm 4.Experiment 5.Summary
22
Experiments (new experiments not in the proceedings) Data set# of examples m # of hypotheses n Reuters-2157810,17030,389 RCV120,24247,237 news2019,9961,335,193 parameters : ν=0.2m, ε=0.001 each algorithm implemented with C++ and LP solver CPLEX 11.0
23
Experimental Results (sec.) Sparse LPBoost improves the computation time by 3 to 100 times.
24
1.Introduction 2.Preliminaries 3.Our algorithm 4.Experiment 5.Summary
25
Summary & Open problem Our result Sparse LPBoost: provable decompotion algorithm which ε-approximates 1-norm soft margin optimization faster than 3 to 100 times than LP solver or LPBoost. Open problem Theoretical guarantee on # of iterations Better method for choosing instances (hypotheses)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.