Download presentation
Presentation is loading. Please wait.
Published byCharlotte Bailey Modified over 9 years ago
1
MACHINE LEARNING 9. Nonparametric Methods
2
Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Estimating distribution for Classification Regression Clustering Parametric: Assume model, find optimum parameters from data ML,MAP, Least Squares Semi-parametric: Assume distribution mixture. Use EM/Clustering to find parameters
3
Non-parametric estimation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Can’t assume a model for distribution densities Might be a very complicated model with large number of parameters Assuming wrong model leads to large error Nonparametric estimation principle: “Similar inputs have similar outputs” Find similar data instances in the training data and interpolate/average their outputs
4
Nonparametric estimation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Parametric estimation: all data instances affect the final global estimate Global methods Non-parametric estimation No single global model Local models are created as needed Affected only by near-by instances Also called: instance-based or memory-based methods
5
Memory-based method Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Lazy method Store training data of size N O(N) memory O(N) search to for similar data Eager method: parametric methods d parameters, d<N O(d) memory and processing
6
Local methods and “curse of dimensionality” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 500 of 2D points in unit square gives pretty much good picture of the density Single 1000 dimension vector doesn’t have enough information about the joint distribution of 1000 random variables Need much more samples in higher dimensional space
7
Density Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Given the training set X ={x t } t drawn iid from p(x) Commutative distribution For density estimation select length h
8
Histogram Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Divide input space into equal size bins and origin x 0
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9
10
Naïve Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11
12
Naïve Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Each training sample has symmetric region of influence of size h Contribute 1 to x falling into region of influence This region of influence “hard”, not continuous Soft influence Contribute as function of distance Training samples that are close to input contribute more
13
Kernel Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows)
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14
15
K-Nearest Neighbors Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Histogram/Kernel methods select uniform bin size Actually we want bin size to be small if there are lot of samples in the neighborhood Arrange distances from training samples
16
k-Nearest Neighbor Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x), distance to kth closest instance to x
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17
18
Multivariate Data Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Kernel density estimator Multivariate Gaussian kernel spheric ellipsoid
19
Nonparametric Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Training data “vote” for input label Closer points get more influence Kernel Weight votes according to distance K-NN Weight k closest points 1-NN Find closest point Assign label of this point to input
20
Nonparametric Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Estimate p(x| C i ) and use Bayes’ rule Kernel estimator k-NN estimator
21
Condensed Nearest Neighbor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X Use 1-NN
22
Condensed Nearest Neighbor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Incremental algorithm: Add instance if needed
23
Nonparametric Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Aka smoothing models Take several closest points and weight/average their output Parametric model: find polynomial coefficients and evaluate input on fitted function Take average of output in the same bin: Regressogram
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24
25
Kernel Smoother Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25
26
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26
27
Running Line Smoother Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 Fit line locally Can take into accounts distances (Kernel)
28
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28
29
How to Choose k or h? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29 When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h.
30
Classifications Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.