Download presentation
Presentation is loading. Please wait.
Published byEaster Fowler Modified over 9 years ago
1
Support Vector Machines Optimization objective Machine Learning
2
Andrew Ng If, we want, Alternative view of logistic regression If, we want,
3
Andrew Ng Alternative view of logistic regression Cost of example: If (want ):
4
Andrew Ng Support vector machine Logistic regression: Support vector machine:
5
Andrew Ng SVM hypothesis Hypothesis:
6
Large Margin Intuition Machine Learning Support Vector Machines
7
Andrew Ng If, we want (not just ) Support Vector Machine 1 1
8
Andrew Ng SVM Decision Boundary Whenever : 1 1
9
Andrew Ng SVM Decision Boundary: Linearly separable case x1x1 x2x2 Large margin classifier
10
Andrew Ng Large margin classifier in presence of outliers x1x1 x2x2
11
The mathematics behind large margin classification (optional) Machine Learning Support Vector Machines
12
Andrew Ng Vector Inner Product
13
Andrew Ng SVM Decision Boundary
14
Andrew Ng SVM Decision Boundary
15
Kernels I Machine Learning Support Vector Machines
16
Andrew Ng Non-linear Decision Boundary x1x1 x2x2 Is there a different / better choice of the features ?
17
Andrew Ng Kernel x1x1 x2x2 Given, compute new feature depending on proximity to landmarks
18
Andrew Ng Kernels and Similarity
19
Andrew Ng Example:
20
Andrew Ng x1x1 x2x2
21
Support Vector Machines Kernels II Machine Learning
22
Andrew Ng x1x1 x2x2 Choosing the landmarks Given : Predict if Where to get ?
23
Andrew Ng SVM with Kernels Given choose Given example : For training example :
24
Andrew Ng SVM with Kernels Hypothesis: Given, compute features Predict “y=1” if Training:
25
Andrew Ng Small : Features vary less smoothly. Lower bias, higher variance. SVM parameters: C ( ). Large C: Lower bias, high variance. Small C: Higher bias, low variance. Large : Features vary more smoothly. Higher bias, lower variance.
26
Support Vector Machines Using an SVM Machine Learning
27
Andrew Ng Use SVM software package (e.g. liblinear, libsvm, …) to solve for parameters. Need to specify: Choice of parameter C. Choice of kernel (similarity function): E.g. No kernel (“linear kernel”) Predict “y = 1” if Gaussian kernel:, where. Need to choose.
28
Andrew Ng Kernel (similarity) functions: function f = kernel(x1,x2) return Note: Do perform feature scaling before using the Gaussian kernel. x1 x2
29
Andrew Ng Other choices of kernel Note: Not all similarity functions make valid kernels. (Need to satisfy technical condition called “Mercer’s Theorem” to make sure SVM packages’ optimizations run correctly, and do not diverge). Many off-the-shelf kernels available: -Polynomial kernel: -More esoteric: String kernel, chi-square kernel, histogram intersection kernel, …
30
Andrew Ng Multi-class classification Many SVM packages already have built-in multi-class classification functionality. Otherwise, use one-vs.-all method. (Train SVMs, one to distinguish from the rest, for ), get Pick class with largest
31
Andrew Ng Logistic regression vs. SVMs number of features ( ), number of training examples If is large (relative to ): Use logistic regression, or SVM without a kernel (“linear kernel”) If is small, is intermediate: Use SVM with Gaussian kernel If is small, is large: Create/add more features, then use logistic regression or SVM without a kernel Neural network likely to work well for most of these settings, but may be slower to train.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.