CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation;

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab 5 posted, due Weds, Jan 13. Sunset detector posted,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
CSSE463: Image Recognition Day 20
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 20
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 18
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 13
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 18
CSSE463: Image Recognition Day 18
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 17
CSSE463: Image Recognition Day 18
Presentation transcript:

CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation; favorite class; enjoying Projects/labs reinforce theory; interesting examples, topics, presentation; favorite class; enjoying Lecture and assignments OK or slightly too fast. Lecture and assignments OK or slightly too fast. Project intro Project intro Wrap up SVM and do demo, start lab 5 on your own Wrap up SVM and do demo, start lab 5 on your own Tuesday: Neural nets Tuesday: Neural nets Thursday: lightning talks (see other slides now), Lab 5 due Thursday: lightning talks (see other slides now), Lab 5 due Friday: lab for sunset detector Friday: lab for sunset detector 7 th week: Mid-term exam 7 th week: Mid-term exam

Review: SVMs: “Best” decision boundary The “best” hyperplane is the one that maximizes the margin, , between the classes. Equivalent to: The “best” hyperplane is the one that maximizes the margin, , between the classes. Equivalent to: Solve using quadratic programming Solve using quadratic programming margin 

Non-separable data Allow data points to be misclassifed Allow data points to be misclassifed But assign a cost to each misclassified point. But assign a cost to each misclassified point. The cost is bounded by the parameter C (which you can set) The cost is bounded by the parameter C (which you can set) You can set different bounds for each class. Why? You can set different bounds for each class. Why? Can weigh false positives and false negatives differently

Can we do better? Cover’s Theorem from information theory says that we can map nonseparable data in the input space to a feature space where the data is separable, with high probability, if: Cover’s Theorem from information theory says that we can map nonseparable data in the input space to a feature space where the data is separable, with high probability, if: The mapping is nonlinear The mapping is nonlinear The feature space has a higher dimension The feature space has a higher dimension The mapping is called a kernel function. The mapping is called a kernel function. Replace every instance of x i x j in derivation with K(x i x j ) Replace every instance of x i x j in derivation with K(x i x j ) Lots of math would follow here to show it works Lots of math would follow here to show it works Example: Example: separate x 1 XOR x 2 by adding a dimension x 3 = x 1 x 2 separate x 1 XOR x 2 by adding a dimension x 3 = x 1 x 2

Most common kernel functions Polynomial Polynomial Gaussian Radial-basis function (RBF) Gaussian Radial-basis function (RBF) Two-layer perceptron Two-layer perceptron You choose p, , or  i You choose p, , or  i My experience with real data: use Gaussian RBF! My experience with real data: use Gaussian RBF! EasyDifficulty of problemHard p=1, p=2, higher pRBF Q5

Demo Software courtesy of (GNU public license) Software courtesy of (GNU public license) Lab 5 (start today!): Lab 5 (start today!): Download the Matlab functions that train and apply the SVM. Download the Matlab functions that train and apply the SVM. The demo script contains examples of how to call the system The demo script contains examples of how to call the system Write a similar script to classify data in another toy problem Write a similar script to classify data in another toy problem Directly applicable to sunset detector Directly applicable to sunset detector Q1-2

Kernel functions Note that a hyperplane (which by definition is linear) in the feature space = a nonlinear boundary in the input space Note that a hyperplane (which by definition is linear) in the feature space = a nonlinear boundary in the input space Recall the RBFs Recall the RBFs Note how choice of  affects the classifier Note how choice of  affects the classifier

Comparison with neural nets Expensive Expensive Training can take a long time with large data sets. Consider that you’ll want to experiment with parameters… Training can take a long time with large data sets. Consider that you’ll want to experiment with parameters… But the classification runtime and space are O(sd), where s is the number of support vectors, and d is the dimensionality of the feature vectors. But the classification runtime and space are O(sd), where s is the number of support vectors, and d is the dimensionality of the feature vectors. In the worst case, s = size of whole training set (like nearest neighbor) In the worst case, s = size of whole training set (like nearest neighbor) But no worse than implementing a neural net with s perceptrons in the hidden layer. But no worse than implementing a neural net with s perceptrons in the hidden layer. Empirically shown to have good generalizability even with relatively-small training sets and no domain knowledge. Empirically shown to have good generalizability even with relatively-small training sets and no domain knowledge. Q3

Speaking of neural nets: Back to a demo of matlabNeuralNetDemo.m Back to a demo of matlabNeuralNetDemo.m Project discussion? Project discussion?