Perceptrons and Linear Classifiers William Cohen 2-4-2008.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Linear Classifiers/SVMs
Linear Separators.
SVM—Support Vector Machines
Support Vector Machines and Margins
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Boosting Approach to ML
PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Support Vector Machines and The Kernel Trick William Cohen
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines Classification
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Machine learning Image source:
Online Learning Algorithms
Machine learning Image source:
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Linear Discrimination Reading: Chapter 2 of textbook.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Protein Classification Using Averaged Perceptron SVM
Announcements My office hours: Tues 4pm Wed: guest lecture, Matt Hurst, Bing Local Search.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Linear Classification with Perceptrons
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Randomized Algorithms Part 3 William Cohen 1. Outline Randomized methods - so far – SGD with the hash trick – Bloom filters – count-min sketches Today:
Announcements Phrases assignment out today: – Unsupervised learning – Google n-grams data – Non-trivial pipeline – Make sure you allocate time to actually.
KERNELS AND PERCEPTRONS. The perceptron A B instance x i Compute: y i = sign(v k. x i ) ^ y i ^ If mistake: v k+1 = v k + y i x i x is a vector y is -1.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Perceptrons Michael J. Watts
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Mistake Bounds William W. Cohen. One simple way to look for interactions Naïve Bayes – two class version dense vector of g(x,y) scores for each word in.
Classification with Perceptrons Reading:
Online Learning Kernels
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Based on slides by William Cohen, Andrej Karpathy, Piyush Rai
Artificial Intelligence 9. Perceptron
Based on slides by William Cohen, Andrej Karpathy, Piyush Rai
Based on slides by William Cohen, Andrej Karpathy, Piyush Rai
CS639: Data Management for Data Science
The Voted Perceptron for Ranking and Structured Classification
Based on slides by William Cohen, Andrej Karpathy, Piyush Rai
Based on slides by William Cohen, Andrej Karpathy, Piyush Rai
EE 193/Comp 150 Computing with Biological Parts
Presentation transcript:

Perceptrons and Linear Classifiers William Cohen

Announcement: no office hours for William this Friday 2/8

Dave Touretzky’s Gallery of CSS Descramblers

Linear Classifiers Let’s simplify life by assuming: –Every instance is a vector of real numbers, x=(x 1,…,x n ). (Notation: boldface x is a vector.) –There are only two classes, y=(+1) and y=(-1) A linear classifier is vector w of the same dimension as x that is used to make this prediction:

w -W-W Visually, x · w is the distance you get if you “project x onto w” X1 x2 X1. w X2. w The line perpendicular to w divides the vectors classified as positive from the vectors classified as negative. In 3d: line  plane In 4d: plane  hyperplane …

w -W-W Wolfram MathWorld Mediaboost.com Geocities.com/bharatvarsha1947

w -W-W Notice that the separating hyperplane goes through the origin…if we don’t want this we can preprocess our examples:

What have we given up? +1 Outlook overcast Humidity normal

What have we given up? Not much! –Practically, it’s a little harder to understand a particular example (or classifier) –Practically, it’s a little harder to debug You can still express the same information You can analyze things mathematically much more easily

Naïve Bayes as a Linear Classifier Consider Naïve Bayes with two classes (+1, -1) and binary features (0,1).

Naïve Bayes as a Linear Classifier

“log odds”

Naïve Bayes as a Linear Classifier pipi qiqi

Summary: –NB is linear classifier –Weights w i have a closed form which is fairly simple, expressed in log-odds Proceedings of ECML-98, 10th European Conference on Machine Learning

An Even Older Linear Classifier 1957: The perceptron algorithm: Rosenblatt –WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas.. experimental brain surgery on epileptic patients while conscious, experiments on.. the visual cortex of cats,... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”Tobermory –Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960) 1960: Perceptron Mark 1 Computer – hardware implementation

Bell Labs TM – Datamation 1961 – April Special Edition of CACM

An Even Older Linear Classifier 1957: The perceptron algorithm: Rosenblatt –WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas.. experimental brain surgery on epileptic patients while conscious, experiments on.. the visual cortex of cats,... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”Tobermory –Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960) 1960: Perceptron Mark 1 Computer – hardware implementation 1969: Minksky & Papert book shows perceptrons limited to linearly separable data, and Rosenblatt dies in boating accident 1970’s: learning methods for two-layer neural networks Mid-late 1980’s (Littlestone & Warmuth): mistake-bounded learning & analysis of Winnow method; early-mid 1990’s, analyses of perceptron/Widrow-Hoff

Experimental evaluation of Perceptron vs WH and Experts (Winnow-like methods) in SIGIR-1996 (Lewis, Schapire, Callan, Papka), and (Cohen & Singer) Freund & Schapire, showed “kernel trick” and averaging/voting worked

The voted perceptron A B instance x i Compute: y i = sign(v k. x i ) ^ y i ^ If mistake: v k+1 = v k + y i x i

u -u 2γ2γ u -u-u 2γ2γ +x1+x1 v1v1 (1) A target u (2) The guess v 1 after one positive example.

u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2 (3a) The guess v 2 after the two positive examples: v 2 =v 1 +x 2 (3b) The guess v 2 after the one positive and one negative example: v 2 =v 1 -x 2 I want to show two things: 1.The v’s get closer and closer to u: v.u increases with each mistake 2.The v’s do not get too large: v.v grows slowly

u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2 (3a) The guess v 2 after the two positive examples: v 2 =v 1 +x 2 (3b) The guess v 2 after the one positive and one negative example: v 2 =v 1 -x 2 > γ

u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2

On-line to batch learning 1.Pick a v k at random according to m k /m, the fraction of examples it was used for. 2.Predict using the v k you just picked. 3.(Actually, use some sort of deterministic approximation to this).

The voted perceptron

Some more comments Perceptrons are like support vector machines (SVMs) 1.SVMs search for something that looks like u: i.e., a vector w where ||w|| is small and the margin for every example is large 2.You can use “the kernel trick” with perceptrons Replace x.w with (x.w+1) d

Experimental Results

Task: classifying hand-written digits for the post office

More Experimental Results (Linear kernel, one pass over the data)