Linear Methods, cont’d; SVMs intro. Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery.

Slides:



Advertisements
Similar presentations
Classification / Regression Support Vector Machines
Advertisements


Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
The loss function, the normal equation,
Support Vector Machines
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
Intro to Linear Methods Reading: DH&S, Ch 5.{1-4,8} hip to be hyperplanar...
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Machine Learning CMPT 726 Simon Fraser University
CS 4700: Foundations of Artificial Intelligence
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab 5 posted, due Weds, Jan 13. Sunset detector posted,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation;
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
PREDICT 422: Practical Machine Learning
Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
Linear Discrimination
Presentation transcript:

Linear Methods, cont’d; SVMs intro

Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery (genomics, taxonomy, etc.) Reinforcement learning Control Robot navigation Learning behavior

Reminder... Finally, can write: Want the “best” set of w : the weights that minimize the above Q: how do you find the minimum of a function w.r.t. some parameter?

Reminder... Derive the vector derivative expressions: Find an expression for the minimum squared error weight vector, w, in the loss function:

Solution to LSE regression Differentiating the loss function...

LSE followup The quantity is called a Gram matrix and is positive semidefinite and symmetric The quantity is the pseudoinverse of X The complete “learning algorithm” is 2 whole lines of Matlab code So far, we have a regressor -- estimates a real valued y for each X Can convert to a classifier by assigning y=+1 or -1 to binary class training data Q: How do you handle non-binary data?

Handling non-binary data DTs and k -NN can handle multi-class data Linear discriminants (& many other) learners only work on binary 3 ways to “hack” binary classifiers to p -ary data: 1 against many: Train p classifiers to recognize “class 1 vs anything else”; “class 2 vs everything else”... Cheap, easy May drastically unbalance the classes for each classifier What if two classifiers make diff predictions?

Multiclass trouble

Handling non-binary data All against all: Train O(p^2) classifiers, one for each pair of classes Run every test point through all classifiers Majority vote for final classifier More stable than 1 vs many Lot more overhead, esp for large p Data may be more balanced Each classifier trained on very small part of data

Handling non-binary data Coding theory approach Given p classes, choose b≥lg(p) Assign each class a b -bit “code word” Train one classifier for each bit Apply each classifier to a test instance => new code => reconstruct class x1x2x3y green3.2-9apple yello w lemon yello w 6.9-3banana red0.8 grape green3.40.9pear x1x2x3 y1y1 y2y2 y3 green yellow yellow red green

Support Vector Machines

Linear separators are nice... but what if your data looks like this:

Linearly nonseparable data 2 possibilities: Use nonlinear separators (diff hypothesis space) Possibly intersection of multiple linear separators, etc. (E.g., decision tree)

Linearly nonseparable data 2 possibilities: Use nonlinear separators (diff hypothesis space) Possibly intersection of multiple linear separators, etc. (E.g., decision tree) Change the data Nonlinear projection of data These turn out to be flip sides of each other Easier to think about (do math for) 1st case

Nonlinear data projection Suppose you have a “projection function”: Original feature space “Projected” space Usually Do learning w/ linear model in Ex:

Common projections Degree- k polynomials: Fourier expansions:

Example nonlinear surfaces SVM images from lecture notes by S. Dreiseitl

Example nonlinear surfaces SVM images from lecture notes by S. Dreiseitl

Example nonlinear surfaces SVM images from lecture notes by S. Dreiseitl

Example nonlinear surfaces Text SVM images from lecture notes by S. Dreiseitl

The catch... How many dimensions does have? For degree- k polynomial expansions: E.g., for k =4, d =256 (16x16 images), Yike! For “radial basis functions”,