Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principles of Density Estimation
CHAPTER 13: Alpaydin: Kernel Machines
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Unsupervised Learning
INTRODUCTION TO Machine Learning 3rd Edition
Lecture 3 Nonparametric density estimation and classification
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 10a Kernel density estimators and nearest neighbors All lecture slides will be available as.ppt,.ps,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Optimal Bayes Classification
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO Machine Learning 3rd Edition
Jakob Verbeek December 11, 2009
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
Non-Parametric Density Functions
INTRODUCTION TO Machine Learning
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 7b: Kernel density estimators and nearest neighbors All lecture slides will be available as.
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Hairong Qi, Gonzalez Family Professor
Linear Discrimination
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2

Parametric Estimation Parametric (single global model) Advantage: It reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters. Disadvantage: This assumption does not always hold and we may incur a large error if it does not. Semiparametric (small number of local models) Mixture model (Chap 7) The density is written as a disjunction of a small number of parametric models. 3

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Estimation Assumption: Similar inputs have similar outputs Functions (pdf, discriminant, regression) change smoothly Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Nonparametric methods are also called memory-based or instance-based learning algorithms. 4

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Density Estimation Given the training set X ={x t } t drawn iid (independent and identically distributed) from p(x) Divide data into bins of size h Histogram estimator: (Figure 8.1) PS. In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. ( 5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6 Histogram estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Density Estimation Given the training set X ={x t } t drawn iid from p(x) x is always at the center of a bin of size h Naive estimator: or (Figure 8.2) 7

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) h=0.5 h=1 Naïve estimator: h=2 8 Naive estimator:

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Kernel Estimator Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows): Figure 8.3 If K is Gaussian, then will be smooth having all the derivatives. 9

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10 Kernel estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x): distance to kth closest instance to x 11

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12 k-Nearest Neighbor Estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Generalization to Multivariate Data Kernel density estimator with the requirement that Multivariate Gaussian kernel spheric ellipsoid 13

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification Estimate p(x| C i ) and use Bayes’ rule Kernel estimator The discriminant function 14

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification k-nn estimator For the special case of k-nn estimator where k i : the number of neighbors out of the k nearest that belong to C i V k (x) : the volume of the d-dimensional hypersphere centered at x, with radius c d : the volume of the unit sphere in d dimensions For example, 15

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification k-nn estimator From Then 16

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Condensed ( 壓縮的 ) Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) Error function The error on X storing Z The cardinality of Z 17

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Condensed Nearest Neighbor Incremental algorithm: Add instance if needed A greedy method and a local search The results depend on the order of the training instances. 18

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Regression Smoothing models: Nonparametric regression estimator also called a smoother Running mean smoother regressogram vs naive estimator Kernel smoother Running line smoother Regressogram: Figure

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 20 Regressogram

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 21

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Running mean smoother: naive estimator Figure 8.8 Running Mean Smoother 22

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 23 Running mean smoother

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Kernel Smoother Kernel smoother: Figure 8.9 where K( ) : Gaussian kernel 24

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25 Kernel smoother

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Running Line Smoother Instead of taking an average and giving a constant fit at a point, we can take into account one more term in the Taylor expansion and calculate a linear fit. F(x) = F’(a)/1!+F’’(a) (x-a)/2!+ F’’’(a) (x-a) 2 /3!+… Running line smoother: Figure 8.10 Use the data points in the neighborhood, as defined by h or k Fit a local regression line 26

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27 Running line smoother

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) How to Choose k or h? When k or h is small: Single instances matter; Bias is small, variance is large Undersmoothing High complexity As k or h increases: Average over more instances Variance decreases but bias increases Oversmoothing Low complexity Cross-validation is used to finetune k or h. 28

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29 Kernel estimate for various bin lengths for a two-class problem. Plotted are the conditional densities, p(x|C i ). It seems that the top one oversmooths and the bottom undersmooths, but whichever is the best will depend on where the validation data points are.