Linear Discrimination Reading: Chapter 2 of textbook.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Linear Discriminant Functions
Perceptron.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Neural Networks.
Decision making in episodic environments
x – independent variable (input)
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Biological neuron artificial neuron.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Data Mining with Neural Networks (HK: Chapter 7.5)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CS 4700: Foundations of Artificial Intelligence
Machine learning Image source:
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Collaborative Filtering Matrix Factorization Approach
Machine learning Image source:
Artificial Neural Networks
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
CS621 : Artificial Intelligence
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Artificial Neural Network
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Start with student evals. What function does perceptron #4 represent?
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 7 Linear Classifiers.
Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
A Simple Artificial Neuron
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Machine Learning Today: Reading: Maria Florina Balcan
Collaborative Filtering Matrix Factorization Approach
Perceptron as one Type of Linear Discriminants
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Linear Discrimination
Presentation transcript:

Linear Discrimination Reading: Chapter 2 of textbook

Framework Assume our data consists of instances x = (x 1, x 2,..., x n ) Assume data can be separated into two classes, positive and negative, by a linear decision surface. Learning: Assuming data is n-dimensional, learn (n−1)- dimensional hyperplane to classify the data into classes.

Linear Discriminant Feature 1 Feature 2

Linear Discriminant Feature 2 Feature 1

Linear Discriminant Feature 2 Feature 1

Example where line won’t work? Feature 2 Feature 1

Perceptrons Discriminant function: w 0 is called the “bias”. −w 0 is called the “threshold” Classification:

Perceptrons as simple neural networks w1w1 w2w2 wnwn o w0w0 +1 x1x1 x2x2 xnxn

Example What is the class y?

Geometry of the perceptron Feature 1 Feature 2 Hyperplane In 2d:

Work with one neighbor on this: (a) Find weights for a perceptron that separates “true” and “false” in x 1  x 2. Find the slope and intercept, and sketch the separation line defined by this discriminant. (b) Do the same, but for x 1  x 2. (c) What (if anything) might make one separation line better than another? In-class exercise

To simplify notation, assume a “dummy” coordinate (or attribute) x 0 = 1. Then we can write: We can generalize the perceptron to cases where we project data points x n into “feature space”,  (x n ):

Notation Let S = {(x k, t k ): k = 1, 2,..., m} be a training set. Note: x k is a vector of inputs, and t k is  {+1, −1} for binary classification, t k   for regression. Output o: Error of a perceptron on a single training example, (x k,t k )

Example S = {((0,0), -1), ((0,1), 1)} Let w = {w 0, w 1, w 2 ) = {0.1, 0.1, −0.3} o What is E 1 ? What is E 2 ? +1 x1x1 0.1 x2x2 −0.3

How do we train a perceptron? Gradient descent in weight space From T. M. Mitchell, Machine Learning

Perceptron learning algorithm Start with random weights w = (w 1, w 2,..., w n ). Do gradient descent in weight space, in order to minimize error E: –Given error E, want to modify weights w so as to take a step in direction of steepest descent.

Gradient descent We want to find w so as to minimize sum-squared error: To minimize, take the derivative of E(w) with respect to w. A vector derivative is called a “gradient”:  E(w)

Here is how we change each weight: and  is the learning rate.

Error function has to be differentiable, so output function o also has to be differentiable.

1 activation output 0 1 activation output 0 Activation functions Not differentiableDifferentiable

This is called the perceptron learning rule, with “true gradient descent”.

Problem with true gradient descent: Search process will land in local optimum. Common approach to this: use stochastic gradient descent: –Instead of doing weight update after all training examples have been processed, do weight update after each training example has been processed (i.e., perceptron output has been calculated). –Stochastic gradient descent approximates true gradient descent increasingly well as   1/ .

Training a perceptron 1.Start with random weights, w = (w 1, w 2,..., w n ). 2.Select training example (x k, t k ). 3.Run the perceptron with input x k and weights w to obtain o. 4.Let  be the learning rate (a user-set parameter). Now, 5.Go to 2.