Linear Discriminators

Slides:



Advertisements
Similar presentations
Introduction to Neural Networks Computing
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
Linear Classifiers (perceptrons)
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Support Vector Machines (and Kernel Methods in general)
Simple Neural Nets For Pattern Classification
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Binary Classification Problem Learn a Classifier from the Training Set
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Artificial Neural Networks
Lecture 10: Support Vector Machines
Linear Discriminant Functions Chapter 5 (Duda et al.)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Linear Discriminators Chapter 20 From Data to Knowledge.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks and support vector machines
Support vector machines
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Gradient descent David Kauchak CS 158 – Fall 2016.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Fun with Hyperplanes: Perceptrons, SVMs, and Friends
Large Margin classifiers
Dan Roth Department of Computer and Information Science
Learning with Perceptrons and Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Dan Roth Department of Computer and Information Science
Support Vector Machines
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Perceptrons Support-Vector Machines
CS 4/527: Artificial Intelligence
An Introduction to Support Vector Machines
Kernels Usman Roshan.
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Computational Intelligence
Linear machines 28/02/2017.
Linear Classifier by Dr
Large Scale Support Vector Machines
Perceptron as one Type of Linear Discriminants
CSSE463: Image Recognition Day 14
Support vector machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
Support Vector Machines and Kernels
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
Peer Instruction I pose a challenge question (usually multiple choice), which will help solidify understanding of topics we have studied Might not just.
CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Introduction to Machine Learning
Presentation transcript:

Linear Discriminators Chapter 20 Only relevant parts

Concerns Generalization Accuracy Efficiency Noise Irrelevant features Generality: when does this work?

Linear Model Let f1, …fn be the feature values of an example. Let class be denoted {+1, -1}. Define f0 = -1. (bias weight) Linear model defines weights w0,w1,..wn. -w0 is the threshold Classification rule: If w0*f0+w1*f1..+wn*fn> 0, predict class + else predict class -. Briefly: W*F>0 where * is inner product of weight vector and feature weights and F has been augmented with extra 1.

Augmentation Trick Suppose data defined features f1 and f2. 2* f1 + 3*f2 > 4 is classifier Equivalently: <4,2,3> *<-1,f1,f2> > 0 Mapping data <f1,f2> to <-1,f1,f2> allows learning/representing threshold as just another featuer. Mapping data into higher dimensions is key idea behind SVMs

Mapping to enable Linear Separation Let xi be m vectors in R^N. Map xi into R^{N+M} by xi -> <xi,0,..1,0..> where 1 in n+i position. For any labelling of xi by classes +/-, the embedding makes data linearly separable. Define wi = 0 i<N w(i+n) = 1 if xi is + else 0. W(i+n) = -1 if xi is negative else 0.

Representational Power “Or” of n features Wi = 1, threshold = 0 “And” of n features Wi = 1 threshold = n -1 K of n features (prototype) Wi =1 threshold = k -1 Can’t do XOR Combining linear threshold units yields any boolean function.

Classical Perceptron Goal: Any W which separates the data. Algorithm (X is augmented with 1) W = 0 Repeat If X positive and W*X wrong, W = W+X; Else if X negative & W*X wrong, W = W-X. Until no errors or very large number of times.

Classical Perceptron Theorem: If concept linearly separable, then algorithm finds a solution. Training time can be exponential in number of features. Epoch is single pass through entire data. Convergence can take exponentially many epochs, but guaranteed to work. If |xi|<R and margin = m, then number of mistake is < R^2/m^2.

Hill-Climbing Search This is an optimization problem. The solution is by hill-climbing so there is no guarantee of finding the optimal solution. While derivates tell you the direction (the negative gradient) they do not tell you how much to change each Xi. On the plus side it is fast. On the negative side, no guarantee of separation

Hill-climbing View Goal: minimize Squared-error = Err^2. Let class yi be 1 or -1. Let Err = sum(W*Xi –Yi) where Xi is ith example. This is a function only of the weights. Use Calculus; take partial derivates wrt Wj. To move to lower value, move in direction of negative gradient, i.e. change in Xi is -2*Err*Xj

Support Vector Machine Goal: maximize the margin. Assuming the line separates the data, the margin is the minimum of the closest positive and negative example to the line. Good News: This can be solved by quadratic program. Implemented in Weka as SOM. If not linearly separable, SVM will add more features.

If not Linearly Separable Add more nodes: Neural Nets Can Represent any boolean function: why? No guarantees about learning Slow Incomprehensible Add more features: SVM Can represent any boolean function Learning guarantees Fast Semi-comprehensible

Adding features Suppose pt (x,y) is positive if it lies in the unit disk else negative. Clearly very unlinearly separable Map (x,y) -> (x,y, x^2+y^2) Now in 3-space, easily separable. This works for any learning algorithm, but SVM will almost do it for you. (set parameters).