Perceptrons “From the heights of error, To the valleys of Truth” Piyush Kumar Computational Geometry.

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
G53MLE | Machine Learning | Dr Guoping Qiu
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Linear Separators.
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
Separating Hyperplanes
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Linear Discriminant Functions
Linear Separators.
Support Vector Machines (and Kernel Methods in general)
Simple Neural Nets For Pattern Classification
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
An Illustrative Example
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Binary Classification Problem Learn a Classifier from the Training Set
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Linear Discriminators Chapter 20 From Data to Knowledge.
Neural Networks Lecture 8: Two simple learning algorithms
Perceptrons “ From the heights of error, To the valleys of Truth ” Piyush Kumar Advanced Algorithms.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
This week: overview on pattern recognition (related to machine learning)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Linear Discrimination Reading: Chapter 2 of textbook.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Linear Classification with Perceptrons
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Support Vector Machines
“From the heights of error, To the valleys of Truth”
Presentation transcript:

Perceptrons “From the heights of error, To the valleys of Truth” Piyush Kumar Computational Geometry

Reading Material  Duda/Hart/Stork : 5.4/5.5/9.6.8  Any neural network book (Haykin, Anderson…)  Look at papers of related people  Santosh Vempala  A. Blum  J. Dunagan  F. Rosenblatt  T. Bylander

Introduction Supervised Learning Input Pattern Output Pattern Compare and Correct if necessary

Linear discriminant functions Definition It is a function that is a linear combination of the components of x g(x) = w t x+ w 0 (1) where w is the weight vector and w 0 the bias A two-category classifier with a discriminant function of the form (1) uses the following rule: Decide  1 if g(x) > 0 and  2 if g(x) < 0  Decide  1 if w t x > -w 0 and  2 otherwise If g(x) = 0  x is assigned to either class

LDFs The equation g(x) = 0 defines the decision surface that separates points assigned to the category  1 from points assigned to the category  2 When g(x) is linear, the decision surface is a hyperplane

Classification using LDFs Two main approaches Fischer’s Linear Discriminant Project data onto a line with ‘good’ discrimination; then classify on the real line Linear Discrimination in d-dimensions Classify data using suitable hyperplanes. (We’ll use perceptrons to construct these)

Perceptron: The first NN  Proposed by Frank Rosenblatt in 1956  Neural net researchers accuse Rosenblatt of promising ‘too much’  Numerous variants  We’ll cover the one that’s most geometric to explain  One of the simplest Neural Network.

Perceptrons : A Picture x 0 =-1x1x1 x3x3 x2x2 xnxn w1w1 w0w0... w2w2 w3w3 wnwn +1 Compare And correct

Where is the geometry? Class 1 : (+1) Class 2 : (-1) Is this unique?

Assumption Lets assume for this talk that the red and green points in ‘feature space’ are separable using a hyperplane. Two Category Linearly separable case

Whatz the problem?  Why not just take out the convex hull of one of the sets and find one of the ‘right’ facets?  Because its too much work to do in d- dimensions.  What else can we do?  Linear programming == Perceptrons  Quadratic Programming == SVMs

Perceptrons Aka Learning Half Spaces Can be solved in polynomial time using IP algorithms. Can also be solved using a simple and elegant greedy algorithm (Which I present today)

In Math notation N samples : Can we find a hyperplane that separates the two classes? (labeled by y) i.e. : For all j such that y = +1 : For all j such that y = -1 Where y = +/- 1 are labels for the data.

Further assumption 1 Which we will relax later! Lets assume that the hyperplane that we are looking for passes thru the origin

Further assumption 2 Lets assume that we are looking for a halfspace that contains a set of points Relax now!!

Lets Relax FA 1 now “Homogenize” the coordinates by adding a new coordinate to the input. Think of it as moving the whole red and blue points in one higher dimension From 2D to 3D it is just the x-y plane shifted to z = 1. This takes care of the “bias” or our assumption that the halfspace can pass thru the origin.

Further Assumption 3 Assume all points on a unit sphere! If they are not after applying transformations for FA 1 and FA 2, make them so. Relax now!

Restatement 1 Given: A set of points on a sphere in d-dimensions, such that all of them lie in a half-space. Output: Find one such halfspace Note: You can solve the LP feasibility problem. You can solve any general LP !! Take Estie’s class if youEstie’s class Want to know why.

Restatement 2 Given a convex body (in V-form), find a halfspace passing thru the origin that contains it.

Support Vector Machines A small break from perceptrons

Support Vector Machines Linear Learning Machines like perceptrons. Map non-linearly to higher dimension to overcome the linearity constraint. Select between hyperplanes, Use margin as a test (This is what perceptrons don’t do) From learning theory, maximum margin is good

SVMs Margin

Another Reformulation Unlike Perceptrons SVMs have a unique solution but are harder to solve.

Support Vector Machines There are very simple algorithms to solve SVMs ( as simple as perceptrons ) More soon…

Back to perceptrons

Perceptrons So how do we solve the LP ? Simplex Ellipsoid IP methods Perceptrons = Gradient Decent So we could solve the classification problem using any LP method.

Why learn Perceptrons? You can write an LP solver in 5 mins ! A very slight modification can give u a polynomial time guarantee (Using smoothed analysis)!

Why learn Perceptrons Multiple perceptrons clubbed together are used to learn almost anything in practice. (Idea behind multi layer neural networks) Perceptrons have a finite capacity and so cannot represent all classifications. The amount of training data required will need to be larger than the capacity. We’ll talk about capacity when we introduce VC-dimension. From learning theory, limited capacity is good

Another twist : Linearization If the data is separable with say a sphere, how would you use a perceptron to separate it? (Ellipsoids?)

Linearization Delaunay!?? Lift the points to a paraboloid in one higher dimension, For instance if the data is in 2D, (x,y) -> (x,y,x 2 +y 2 )

The kernel Matrix Another trick that ML community uses for Linearization is to use a function that redefines distances between points. Example : There are even papers on how to learn kernels from data !

Perceptron Smoothed Complexity Let L be a linear program and let L’ be the same linear program under a Gaussian perturbation of variance sigma 2, where sigma 2 <= 1/2d. For any delta, with probability at least 1 – delta either The perceptron finds a feasible solution in poly(d,m,1/sigma,1/delta) L’ is infeasible or unbounded

The Algorithm In one line

The 1 Line LP Solver! Start with a random vector w, and if a point is misclassified do: (until done) One of the most beautiful LP Solvers I’ve ever come across…

A better description Initialize w=0, i=0 do i = (i+1) mod n if x i is misclassified by w then w = w + x i until all patterns classified Return w

An even better description function w = perceptron(r,b) r = [r (zeros(length(r),1)+1)]; % Homogenize b = -[b (zeros(length(b),1)+1)]; % Homogenize and flip data = [r;b]; % Make one pointset s = size(data); % Size of data? w = zeros(1,s(1,2)); % Initialize zero vector is_error = true; while is_error is_error = false; for k=1:s(1,1) if dot(w,data(k,:)) <= 0 w = w+data(k,:); is_error = true; end That’s the entire code! Written in 10 mins. And it can be solve any LP!

An output

In other words At each step, the algorithm picks any vector x that is misclassified, or is on the wrong side of the halfspace, and brings the normal vector w closer into agreement with that point

Still: Why the hell does it work? Back to the most advanced presentation tools available on earth ! The blackboard Wait (Lemme try the whiteboard) The math behind… The Convergence Proof

Proof

That’s all folks