Download presentation
Presentation is loading. Please wait.
Published byCamilla Reed Modified over 9 years ago
1
Towards Scalable Support Vector Machines Using Squashing Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen
2
Outline 1. Motivation 2. Objective 3. Introduction 4. SVM 5. Squashing for SVM 6.EXPERIMENTS 7. conclusion
3
Motivation SVM provide classification model with strong theoretical foundation and excellent empirical performance. But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.
4
Objective This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.
5
Introduction The applicability of SVMs to large datasets is limited,because the high computational cost. Speed-up training algorithms: Chunking,Osuna’s decomposition method SMO They can accelerate the training, but cannot scale well with the size of the training data.
6
Introduction Reducing the computational cost : Sampling Boosting Squashing(DuMouchel et. al.,Madigan et. al.) 本文作者提出 Squashing-SMO, 以解決 SVM 的高計算成本問題
7
SVM Training data:D= { (xi,yi):i=1,…,N } xi is a vector, yi=+1,-1 In linear SVM :The linear separating classify y= +b w is the normal vector b is the intercept of the hyperplane
8
SVM(non-separable)
9
SVM(a prior on w)
10
Squashing for SVM (1).Select a probabilistic model P((X,Y) ∣ θ) (2).Our objective is to find mle θ ML
11
Squashing for SVM (3). Training data:D= { (xi,yi):i=1,…,N } can be grouped into N c groups (Xc,Yc) sq :The squashed data point placed at the cluster C βc :the wieght
12
Squashing for SVM If take the prior of w is P(w) ~ exp(- ∥ w ∥ 2 )
13
Squashing for SVM (4).The optimization model for the squashed data:
14
Squashing for SVM Important design issues for the squashing algorithm: (1).the choice of the number and location of the squashing points (2).to sample the values of w from the prior p(w) (3).b can be made from the optimization model (4).fixed w,b,we evaluate the likelihood of training point, and repeat the selection procedure L times(L is length)
15
EXPERIMENTS experiment datasets: Synthetic data UCI machine learning UCI KKD repositories
16
EXPERIMENTS Evalute: Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO Run:over 100 runs Performance: Misclassification rate,learning time,the memory
17
EXPERIMENTS(Results on Synthetic data) (Wf,bf):estimated by full-SMO (Ws,bs): :estimated by squashed or sampled data
18
EXPERIMENTS(Results on Synthetic data)
20
EXPERIMENTS(Results on Benchmark data)
22
EXPERIMENTS(Results on Benchmark data
23
EXPERIMENTS(Results on Benchmark data)
24
conclusion 1.we describe how the use of squashing make the training of SVM applicable to large datasets. 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory. 3.srs-SMO has a higher misclassification rate. 4.squash-SMO and boost-SMO can tune parameter in cross-validation,it is impossible to full-SMO
25
conclusion 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems. 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.
26
opinion It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets. 我們可以根據資料性質來改變 w 的 prior distribution, 例如指數分配,Log-normal, 或 用無母數方法去做
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.