Linear Discriminators Chapter 20 From Data to Knowledge.

Slides:

Advertisements

Similar presentations

Introduction to Neural Networks Computing

Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu

Linear Classifiers (perceptrons)

Support Vector Machines

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.

Linear Discriminant Functions

Support Vector Machines (and Kernel Methods in general)

Simple Neural Nets For Pattern Classification

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

Kernel Technique Based on Mercer’s Condition (1909)

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.

Support Vector Machine (SVM) Classification

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Binary Classification Problem Learn a Classifier from the Training Set

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Artificial Neural Networks

SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.

Linear Discriminant Functions Chapter 5 (Duda et al.)

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

CS 4700: Foundations of Artificial Intelligence

Neural Networks Lecture 8: Two simple learning algorithms

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

This week: overview on pattern recognition (related to machine learning)

SVM by Sequential Minimal Optimization (SMO)

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Discriminant Functions

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Linear Discrimination Reading: Chapter 2 of textbook.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

SVMs in a Nutshell.

CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.

Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

Support Vector Machine

Fun with Hyperplanes: Perceptrons, SVMs, and Friends

Dan Roth Department of Computer and Information Science

Learning with Perceptrons and Neural Networks

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Dan Roth Department of Computer and Information Science

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Perceptrons Support-Vector Machines

CS 4/527: Artificial Intelligence

An Introduction to Support Vector Machines

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Linear Discriminators

Large Scale Support Vector Machines

Perceptron as one Type of Linear Discriminants

Support Vector Machines

Support Vector Machines and Kernels

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Introduction to Machine Learning

Presentation transcript:

Linear Discriminators Chapter 20 From Data to Knowledge

Concerns Generalization Accuracy Efficiency Noise Irrelevant features Generality: when does this work?

Linear Model Let f1, …fn be the feature values of an example. Let class be denoted {+1, -1}. Define f0 = -1. (bias weight) Linear model defines weights w0,w1,..wn. –-w0 is the threshold Classification rule: –If w0*f0+w1*f1..+wn*fn> 0, predict class + else predict class -. Briefly: W*F>0 where * is inner product of weight vector and feature weights and F has been augmented with extra 1.

Augmentation Trick Suppose data defined features f1 and f2. 2* f1 + 3*f2 > 4 is classifier Equivalently: * > 0 Mapping data to allows learning/representing threshold as just another featuer. Mapping data into higher dimensions is key idea behind SVMs

Mapping to enable Linear Separation Let xi be m vectors in R^N. Map xi into R^{N+M} by xi -> where 1 in n+i position. For any labelling of xi by classes +/-, the embedding makes data linearly separable. –Define wi = 0 i<N –w(i+n) = 1 if xi is + else 0. –W(i+n) = -1 if xi is negative else 0.

Representational Power “Or” of n features – Wi = 1, threshold = 0 “And” of n features –Wi = 1 threshold = n -1 K of n features (prototype) –Wi =1 threshold = k -1 Can’t do XOR Combining linear threshold units yields any boolean function.

Classical Perceptron Goal: Any W which separates the data. Algorithm (X is augmented with 1) W = 0 Repeat –If X positive and W*X wrong, W = W+X; –Else if X negative & W*X wrong, W = W-X. Until no errors or very large number of times.

Classical Perceptron Theorem: If concept linearly separable, then algorithm finds a solution. Training time can be exponential in number of features. Epoch is single pass through entire data. Convergence can take exponentially many epochs. If |xi|<R and margin = m, then number of mistake is < R^2/m^2.

Neural Net view Goal: minimize Squared-error = Err^2. Let class yi be 1 or -1. Let Err = sum(W*Xi –Yi) where Xi is ith example. This is a function only of the weights. Use Calculus; take partial derivates wrt Wj. To move to lower value, move in direction of negative gradient, i.e. change in Xi is -2*Err*Xj

Neural Net View This is an optimization problem. The solution is by hill-climbing so there is no guarantee of finding the optimal solution. While derivates tell you the direction (the negative gradient) they do not tell you how much to change each Xi. On the plus side it is fast. On the negative side, no guarantee of separation

Support Vector Machine Goal: maximize the margin. Assuming the line separates the data, the margin is the minimum of the closest positive and negative example to the line. Good News: This can be solved by quadratic program. Implemented in Weka as SOM. If not linearly separable, SVM will add more features.

If not Linearly Separable 1.Add more nodes: Neural Nets 1.Can Represent any boolean function: why? 2.No guarantees about learning 3.Slow 4.Incomprehensible 2.Add more features: SVM 1.Can represent any boolean function 2.Learning guarantees 3.Fast 4.Semi-comprehensible

Adding features Suppose pt (x,y) is positive if it lies in the unit disk else negative. Clearly very unlinearly separable Map (x,y) -> (x,y, x^2+y^2) Now in 3-space, easily separable. This works for any learning algorithm, but SVM will almost do it for you. (set parameters).