CHAPTER 10: Linear Discrimination

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Christoph F. Eick Questions and Topics Review Nov. 30, Give an example of a problem that might benefit from feature creation 2.How does DENCLUE.
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
Classification and Decision Boundaries
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Machine (SVM) Classification
Sparse Kernels Methods Steve Gunn.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Outline Separating Hyperplanes – Separable Case
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PREDICT 422: Practical Machine Learning
Support Vector Machine
Geometrical intuition behind the dual problem
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
COSC 4335: Other Classification Techniques
INTRODUCTION TO Machine Learning
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
COSC 4368 Machine Learning Organization
Linear Discrimination
SVMs for Document Ranking
Introduction to Machine Learning
Presentation transcript:

CHAPTER 10: Linear Discrimination Eick/Aldaydin: Topic13

Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) Discriminant-based: Assume a model for gi(x|Φi); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Linear Discriminant Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Generalized Linear Model Quadratic discriminant: Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Two Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Geometry Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Support Vector Machines One Possible Solution

Support Vector Machines Another possible solution

Support Vector Machines Other possible solutions

Support Vector Machines Which one is better? B1 or B2? How do you define better?

Support Vector Machines Find hyperplane maximizes the margin => B1 is better than B2

Support Vector Machines Examples are: (x1,..,xn,y) with y{-1,1} Support Vector Machines

Support Vector Machines We want to maximize: Which is equivalent to minimizing: But subjected to the following N constraints: This is a constrained convex quadratic optimization problem that can be solved in polynominal time Numerical approaches to solve it (e.g., quadratic programming) exist The function to be optimized has only a single minimum no local minimum problem

Support Vector Machines What if the problem is not linearly separable?

Linear SVM for Non-linearly Separable Problems Measures prediction error What if the problem is not linearly separable? Introduce slack variables Need to minimize: Subject to (i=1,..,N): C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Parameter Inverse size of margin between hyperplanes Slack variable allows constraint violation to a certain degree

Nonlinear Support Vector Machines What if decision boundary is not linear? Non-linear function Alternative 1: Use technique that Employs non-linear decision boundaries

Nonlinear Support Vector Machines Transform data into higher dimensional space Find the best hyperplane using the methods introduced earlier Alternative 2: Transform into a higher dimensional attribute space and find linear decision boundaries in this space

Nonlinear Support Vector Machines Choose a non-linear kernel function f to transform into a different, usually higher dimensional, attribute space Minimize but subjected to the following N constraints: Find a good hyperplane in the transformed space

Example: Polynomial Kernel Function F(x1,x2)=(x12,x22,sqrt(2)*x1,sqrt(2)*x2,1) K(u,v)=F(u)F(v)= (uv + 1)2 A Support Vector Machine with polynomial kernel function classifies a new example z as follows: sign((S liyi*F(xi)F(z))+b) = sign((S liyi *(xiz +1)2))+b) Remark: li and b are determined using the methods for linear SVMs that were discussed earlier Kernel function trick: perform computations in the original space, although we solve an optimization problem in the transformed space  more efficient; More detailsTopic14.

Summary Support Vector Machines Support vector machines learn hyperplanes that separate two classes maximizing the margin between them (the empty space between the instances of the two classes). Support vector machines introduce slack variables, in the case that classes are not linear separable and trying to maximize margins while keeping the training error low. The most popular versions of SVMs use non-linear kernel functions to map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space. Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/2000/5000… In general, support vector machines accomplish quite high accuracies, if compared to other techniques.

Useful Support Vector Machine Links Lecture notes are much more helpful to understand the basic ideas: http://www.ics.uci.edu/~welling/teaching/KernelsICS273B/Kernels.html  http://cerium.raunvis.hi.is/~tpr/courseware/svm/kraekjur.html   Some tools are often used in publications                  livsvm:     http://www.csie.ntu.edu.tw/~cjlin/libsvm/                 spider:     http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html Tutorial Slides:      http://www.support-vector.net/icml-tutorial.pdf Surveys:                http://www.svms.org/survey/Camp00.pdf More General Material: http://www.learning-with-kernels.org/ http://www.kernel-machines.org/ http://kernel-machines.org/publications.html http://www.support-vector.net/tutorial.html Remarks: Thanks to Chaofan Sun for providing these links!

Optimal Separating Hyperplane Alpaydin transparencies on Support Vector Machines—not used in lecture! (Cortes and Vapnik, 1995; Vapnik, 1995) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ||w||=1and to max margin Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Most αt are 0 and only a small number have αt >0; they are the support vectors Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Soft Margin Hyperplane Not linearly separable Soft error New primal is Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Kernel Machines Preprocess input x by basis functions z = φ(x) g(z)=wTz g(x)=wT φ(x) The SVM solution Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Kernel Functions Polynomials of degree q: Radial-basis functions: Sigmoidal functions: (Cherkassky and Mulier, 1998) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

SVM for Regression Use a linear model (possibly kernelized) f(x)=wTx+w0 Use the є-sensitive error function Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)