Machine Learning Perceptron: Linearly Separable Supervised Learning

Slides:

Advertisements

Similar presentations

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Advertisements

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 10: Linear Discrimination

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

Supervised Learning Recap

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Overview over different methods – Supervised Learning

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Lecture 14 – Neural Networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.

Machine learning Image source:

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Machine learning Image source:

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

This week: overview on pattern recognition (related to machine learning)

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Linear Discrimination Reading: Chapter 2 of textbook.

Non-Bayes classifiers. Linear discriminants, neural networks.

Lecture 4 Linear machine

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Machine Learning Supervised Learning Classification and Regression

Neural networks and support vector machines

Machine Learning Fisher’s Criteria & Linear Discriminant Analysis

Machine Learning Clustering: K-means Supervised Learning

Multilayer Perceptrons

Learning with Perceptrons and Neural Networks

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

CH. 1: Introduction 1.1 What is Machine Learning Example:

CS 188: Artificial Intelligence

Overview of Supervised Learning

Machine Learning Today: Reading: Maria Florina Balcan

ECE 471/571 – Lecture 12 Perceptron.

Neuro-Computing Lecture 4 Radial Basis Function Network

دسته بندی با استفاده از مدل های خطی

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Neural Networks Chapter 5

CS 478 Homework CS Homework.

Pattern Recognition and Machine Learning

Backpropagation.

Introduction to Radial Basis Function Networks

CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis

Neural networks (1) Traditional multi-layer perceptrons

Machine Learning Support Vector Machine Supervised Learning

Linear Discrimination

A task of induction to find patterns

Perceptron Learning Rule

Introduction to Neural Networks

A task of induction to find patterns

Perceptron Learning Rule

Perceptron Learning Rule

Hairong Qi, Gonzalez Family Professor

Presentation transcript:

Machine Learning Perceptron: Linearly Separable Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron: Linearly Separable Multilayer Perceptron & EBP & Deep Learning, RBF Network Support Vector Machine Ensemble Learning: Voting, Boosting(Adaboost) Unsupervised Learning Principle Component Analysis Independent Component Analysis Clustering: K-means Semi-supervised Learning & Reinforcement Learning

Classification Credit scoring example: Formally speaking Inputs are income and savings Output is low-risk vs. high-risk Formally speaking Decision rule: if we know

Bayes’ Rule Bayes rule for one concept Bayes rule for K > 1 concepts Decision rule using Bayes rule (Bayes optimal classifier):

Losses and Risks Back to credit scoring example Define Expected risk: Accepted low-risk applicant increases profit, Rejected high-risk applicant decreases loss In general, loss by accepted high-risk applicant ≠ potential gain by rejected low- risk applicant Errors are not symmetric! Define Expected risk: Decision rule (minimum risk classifier):

More on Losses and Risks

Discriminant Functions

Likelihood-based vs. Discriminant-based Likelihood-based classification Discriminant-based classification Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries!

Linear Discriminant Linear discriminant Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when are Gaussian with shared covariance matrix; useful when classes are (almost) linearly separable

Two Class Case

Geometric View

Multiple Classes (One-vs-All)

Pairwise Separation (One-vs-One)

Single-Layer Perceptron Classification

Single-Layer Perceptron with K Outputs

Gradient Descent

Training Perceptron Regression (Linear output) Classification Single sigmoid output K>2 softmax outputs

Training Perceptron Online learning (instances seen one by one) vs. batch learning (whole sample) No need to store the whole sample Problem may change in time Wear and degradation in system components Stochastic gradient descent Update after a single pattern

Expressiveness of Perceptrons Consider perceptron with a = step function Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space:

Homework: Perceptron Learning for OR Problem- Sigmoid Output /* Perceptron Learning for OR problem with Linear Output*/ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> /*parameters*/ #define NUMEPOCH 100 // training epoch #define NUMIN 2 // input #define NUMP 4 // no. training sample #define RWMAX 0.1 // max of weights #define wseed 100 // weight seed #define eta 0.1 //learning rate double rw[NUMIN+1]; main(argc,argv) int argc; char *argv[]; { FILE *fp; double x[NUMP][NUMIN+1], y[NUMP], ys[NUMP], t{NUMP], mse; int i, j, k, p, epoch; x[0][0]=0.; …..x[0][2]=1.; t[0]=0.; ……. srand(wseed); for (i=0; i<MUNIN+1; i++) rw[i]=RWMAX*((double)rand()*2/RAND_MAX-1.); // Begin Training for (epoch=0; epoch<NUMEPOCH; epoch++){ mse=0.; for (p=0; p<NUMP; p++) { for (i=0; i<NUMIN+1; i++) y[p]=rw[i]*x[p][i]; mse+=pow((t[p]-y[p]),2.0)/NUMP; for (i=0; i<NUMIN+1; i++) rw[i]+=eta*(t[p]-y[p])*x[p][i]; } // end of p printf(“%d %e\n”, epoch, mse); } // end of epoch } // end of main