ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:

Advertisements

Similar presentations

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 2nd Edition

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

CHAPTER 10: Linear Discrimination

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Computer vision: models, learning and inference

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 2nd Edition

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 14 – Neural Networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Outline Separating Hyperplanes – Separable Case

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

Linear Discrimination Reading: Chapter 2 of textbook.

INTRODUCTION TO Machine Learning 3rd Edition

Linear Models for Classification

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Machine Learning 5. Parametric Methods.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Neural networks and support vector machines

Maximum Likelihood Estimation

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Mathematical Foundations of BME Reza Shadmehr

INTRODUCTION TO Machine Learning 3rd Edition

INTRODUCTION TO Machine Learning

CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis

Parametric Methods Berlin Chen, 2005 References:

Machine Learning Perceptron: Linearly Separable Supervised Learning

Linear Discrimination

Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|C i ), use Bayes’ rule to calculate P(C i |x) g i (x) = log P(C i |x) Discriminant-based: Assume a model for g i (x|Φ i ); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|C i ) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable Linear Discriminant 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Quadratic discriminant: Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space Generalized Linear Model 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Two Classes 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Geometry 7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multiple Classes 8 Classes are linearly separable Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Pairwise Separation 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

When p (x | C i ) ~ N ( μ i, ∑ ) From Discriminants to Posteriors 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Sigmoid (Logistic) Function 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

E(w|X) is error with parameters w on sample X w*=arg min w E(w | X) Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient Gradient-Descent 13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Gradient-Descent 14 wtwt w t+1 η E (w t ) E (w t+1 ) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Logistic Discrimination 15 Two classes: Assume log likelihood ratio is linear Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Training: Two Classes 16Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Training: Gradient-Descent 17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K>2 Classes 20 softmax Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Example 22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Generalizing the Linear Model 23 Quadratic: Sum of basis functions: where φ(x) are basis functions Hidden units in neural networks (Chapters 11 and 12) Kernels in SVM (Chapter 13) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Discrimination by Regression 24 Classes are NOT mutually exclusive and exhaustive Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)