Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Queens College Lecture 13: SVM Again.

Similar presentations


Presentation on theme: "Machine Learning Queens College Lecture 13: SVM Again."— Presentation transcript:

1 Machine Learning Queens College Lecture 13: SVM Again

2 Today Completion of Support Vector Machines Project Description and Topics 1

3 Support Vectors Support Vectors are those input points (vectors) closest to the decision boundary 1. They are vectors 2. They “support” the decision hyperplane 2

4 Support Vectors Define this as a decision problem The decision hyperplane: No fancy math, just the equation of a hyperplane. 3

5 Support Vectors The decision hyperplane: Scale invariance 4

6 Support Vectors The decision hyperplane: Scale invariance 5 This scaling does not change the decision hyperplane, or the support vector hyperplanes. But we will eliminate a variable from the optimization

7 What are we optimizing? We will represent the size of the margin in terms of w. This will allow us to simultaneously –Identify a decision boundary –Maximize the margin 6

8 Max Margin Loss Function If constraint optimization then Lagrange Multipliers Optimize the “Primal” 7

9 Visualization of Support Vectors 8

10 Interpretability of SVM parameters What else can we tell from alphas? –If alpha is large, then the associated data point is quite important. –It’s either an outlier, or incredibly important. But this only gives us the best solution for linearly separable data sets… 9

11 Basis of Kernel Methods The decision process doesn’t depend on the dimensionality of the data. We can map to a higher dimensionality of the data space. Note: data points only appear within a dot product. The error is based on the dot product of data points – not the data points themselves. 10

12 Basis of Kernel Methods Since data points only appear within a dot product. Thus we can map to another space through a replacement The error is based on the dot product of data points – not the data points themselves. 11

13 Learning Theory bases of SVMs Theoretical bounds on testing error. –The upper bound doesn’t depend on the dimensionality of the space –The lower bound is maximized by maximizing the margin, γ, associated with the decision boundary. 12

14 Why we like SVMs They work –Good generalization Easily interpreted. –Decision boundary is based on the data in the form of the support vectors. Not so in multilayer perceptron networks Principled bounds on testing error from Learning Theory (VC dimension) 13

15 SVM vs. MLP SVMs have many fewer parameters –SVM: Maybe just a kernel parameter –MLP: Number and arrangement of nodes and eta learning rate SVM: Convex optimization task –MLP: likelihood is non-convex -- local minima 14

16 Soft margin classification There can be outliers on the other side of the decision boundary, or leading to a small margin. Solution: Introduce a penalty term to the constraint function 15

17 Soft Max Dual 16 Still Quadratic Programming!

18 Points are allowed within the margin, but cost is introduced. Soft margin example 17 Hinge Loss

19 Probabilities from SVMs Support Vector Machines are discriminant functions –Discriminant functions: f(x)=c –Discriminative models: f(x) = argmax c p(c|x) –Generative Models: f(x) = argmax c p(x|c)p(c)/p(x) No (principled) probabilities from SVMs SVMs are not based on probability distribution functions of class instances. 18

20 Efficiency of SVMs Not especially fast. Training – n^3 –Quadratic Programming efficiency Evaluation – n –Need to evaluate against each support vector (potentially n) 19

21 Research Projects Run a machine learning experiment –Identify a problem/task. –Find appropriate data –Implement one or more ML algorithm –Evaluate the performance. Write a report of the experiment –4 pages including references –Abstract One paragraph describing the experiment –Introduction Describe the problem/task –Data Describe the data set, features extracted, cleaning processes –Method Describe the algorithm/approach –Results Present and Discuss results –Conclusion Summarize the experiment and results Teams of two people are acceptable. –Requires a report from each participant (written independently) describing who was responsible for the components of the work. 20

22 Sample Problems/Tasks Vision/Graphics –Object Classification –Facial Recognition –Fingerprint Identification –Fingerprint ID –Handwriting recognition Non English languages? Language –Topic classification –Sentiment analysis –Speech recognition –Speaker identification –Punctuation restoration –Semantic Segmentation –Recognition of Emotion, Sarcasm, etc. –SMS Text normalization –Chat participant Id –Twitter classification –Twitter threading 21

23 Sample Problems/Tasks Games –Chess –Checkers –Poker –Blackjack –Go Recommenders (Collaborative Filtering) –Netflix –Courses –Jokes –Books –Facebook Video Classification –Motion classification –Segmentation 22

24 ML Topics to explore in the project L1-regularization Non-linear kernels Loopy belief propagation Non-parametric Belief propagation Soft-decision trees Analysis of Neural Network Hidden Layers Structured Learning Generalized Expectation One-class learning Evaluation Measures –Cluster Evaluation –Semi-supervised evaluation –Skewed Data Graph Embedding Dimensionality Reduction Feature Selection Graphical Model Construction Non-parametric Bayesian Methods Latent Dirichlet Allocation Deep-Learning – Boltzman Machines SVM Regression 23

25 Data UCI Machine Learning Repository –http://archive.ics.uci.edu/ml/http://archive.ics.uci.edu/ml/ Ask Me Collect some of your own 24

26 Next Time Kernel Methods 25


Download ppt "Machine Learning Queens College Lecture 13: SVM Again."

Similar presentations


Ads by Google