Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University
Paper Collobert, R., Sinz, F., Weston, J., and Bottou, L Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, June , 2006). ICML '06, vol ACM Press, New York, NY,
Introduction Previously in Machine Learning Non-convex cost function in MLP Difficult to optimize Work efficiently SVM are defined by a convex function Easier optimization (algorithms) Unique solution (we can write theorems) Goal of the paper Sometimes non-convexity has benefits Faster == training and testing (less support vectors) Non-convex SVMs (faster and sparser) Fast transductive SVMs
From SVM Decision function Primal formulation Minimize ||w|| so that margin is maximized w is a combination of a small number of data (sparsity) Decision boundary is determined by the support vectors Dual formulation s.t.
SVM problem Number of support vectors increases linearly with L Cost attributed to one example (x,y): From:
Ramp Loss Function Given: Outliers Non SV
Concave-Convex Procedure (CCCP) Given a cost function: Decompose into a convex part and a concave part Is guaranteed to decrease at each iteration
Using the Ramp Loss
CCCP for Ramp Loss
Results
Speedup
Time and Number of SVs
Transductive SVMs
Loss Function Cost to be minimized:
Balancing Constraint Necessary for TSVMs
Results
Training Time
Quadratic Fit