Recent Results in Support Vector Machines Dave Musicant Graphic generated with Lucent Technologies Demonstration 2-D Pattern Recognition Applet at

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Linear Classifiers/SVMs
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Decision Boundaries
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Machines Kernel Machines
Support Vector Machine (SVM) Classification
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Jordan Smith MUMT February 2008.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Data Mining and Machine Learning via Support Vector Machines
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
SUPPORT VECTOR MACHINES
Support Vector Machine
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
COSC 4335: Other Classification Techniques
COSC 4368 Machine Learning Organization
Presentation transcript:

Recent Results in Support Vector Machines Dave Musicant Graphic generated with Lucent Technologies Demonstration 2-D Pattern Recognition Applet at

Slide 2 Simple Linear Perceptron Class 1 Class -1 Goal: Find the best line (or hyperplane) to separate the training data. How to formalize? –In two dimensions, equation of the line is given by: –Better notation for n dimensions: treat each data point and the coefficients as vectors. Then equation is given by:

Slide 3 Simple Linear Perceptron (cont.) The Simple Linear Perceptron is a classifier as shown in the picture –Points that fall on the right are classified as “1” –Points that fall on the left are classified as “-1” Therefore: using the training set, find a hyperplane (line) so that Class 1 Class -1 This is a good starting point. But we can do better!

Slide 4 Finding the Best Plane Not all planes are equal. Which of the two following planes shown is better? Both planes accurately classify the training set. The solid green plane is the better choice, since it is more likely to do well on future test data. The solid green plane is further away from the data.

Slide 5 Separating the planes Construct the bounding planes: –Draw two parallel planes to the classification plane. –Push them as far apart as possible, until they hit data points. –The classification plane with bounding planes furthest apart is the best one. Class 1 Class -1

Slide 6 Recap: Finding the Best Plane Details –All points in class 1 should be to the right of bounding plane 1. –All points in class -1 should be to the left of bounding plane -1. –Pick y i to be +1 or -1 depending on the classification. Then the above two inequalities can be written as one: –The distance between bounding planes should be maximized. –The distance between bounding planes is given by: Class 1 Class -1

Slide 7 The Optimization Problem The previous slide can be rewritten as: This is a mathematical program. –Optimization problem subject to constraints –More specifically, this is a quadratic program –There are high powered software tools for solving this kind of problem (both commercial and academic) –These general purpose tools are slow for this particular problem

Slide 8 Data Which is Not Linearly Separable What if a separating plane does not exist? Find the plane that maximizes the margin and minimizes the errors on the training points. Take original inequality and add a slack variable to measure error: error

Slide 9 The Support Vector Machine Push the planes apart and minimize the error at the same time: C is a positive number that is chosen to balance these two goals. This problem is called a Support Vector Machine, or SVM.

Slide 10 Terminology Those points that touch the bounding plane, or lie on the wrong side, are called support vectors. If all the data points except the support vectors were removed, the solution would turn out the same. The SVM is mathematically equivalent to force and torque equilibrium (hence the name support vectors).

Slide 11 Example from Carleton College 1850 students 4 year undergraduate liberal arts college Ranked 4th in the nation by US News and World Report computer science majors per year All research assistants are full-time undergraduates

Slide 12 Student Research Example Goal: automatically generate “frequently asked questions” list from discussion groups Subgoal #1: Given a corpus of discussion group postings, identify those messages that contain questions –Recruit student volunteers to identify questions –Learn classification Work by students Sarah Allen, Janet Campbell, Ester Gubbrud, Rachel Kirby, Lillie Kittredge

Slide 13 Building A Training Set

Slide 14 Building A Training Set Which sentences are questions in the following text? From: (Wonko the Sane) I was recently talking to a possible employer ( mine! :-) ) and he made a reference to a 48-bit graphics computer/image processing system. I seem to remember it being called IMAGE or something akin to that. Anyway, he claimed it had 48-bit color + a 12-bit alpha channel. That's 60 bits of info--what could that possibly be for? Specifically the 48-bit color? That's 280 trillion colors, many more than the human eye can resolve. Is this an anti-aliasing thing? Or is this just some magic number to make it work better with a certain processor.

Slide 15 Representing the training set Each document is a point Each potential word is a column (bag of words) Other pre-processing tricks –Remove punctuation –Remove "stop words" such as "is", "a", etc. –Use stemming to remove "ing" and "ed", etc. from similar words

Slide 16 Results If you just guess brain-dead: "every message contains a question", get 55% right If you use a Support Vector Machine, get 66.5% of them right What words do you think were strong indicators of questions? –anyone, does, any, what, thanks, how, help, know, there, do, question What words do you think were strong contra- indicators of questions? –re, sale, m, references, not, your

Slide 17 Nonlinear SVMs Some datasets may not be best separated by a plane. How can we do nonlinear separating surfaces? Simple method: Map into a higher dimensional space, and do the same thing we have already done. Generated with Lucent Technologies Demonstration 2-D Pattern Recognition Applet at

Slide 18 Finding nonlinear surfaces How to modify algorithm to find nonlinear surfaces? First idea (simple and effective): map each data point into a higher dimensional space, and find a linear fit there Example: Find a quadratic surface for Use new coordinates in regular linear SVM A plane in this quadratic space is equivalent to a quadratic surface in our original space

Slide 19 Problems with this method If dimensionality of space is high, lots of calculations –For a high polynomial space, combinations of coordinates explodes –Need to do all these calculations for all training points, and for each testing point –Infinite dimensional spaces impossible Nonlinear surfaces can be used without these problems through the use of a kernel function. –Demonstration:

Slide 20 Example: Checkerboard

Slide 21 5-Nearest Neighbor

Slide 22 Sixth degree polynomial kernel