CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011
Eick/Alpaydin: Non-Parametric Density Estimation 2 Non-Parametric Density Estimation Goal is to obtain a density function: nction nction Parametric (single global model), semiparametric (small number of local models) Nonparametric: Similar inputs have similar outputs Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Aka lazy/memory-based/case-based/instance-based learning
Histograms Histogram Usually shows the distribution of values of a single variable Divide the values into bins and show a bar plot of the number of objects in each bin. The height of each bar indicates the number of objects Shape of histogram depends on the number of bins Example: Petal Width (10 and 20 bins, respectively)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Density Estimation Given the training set X={x t } t drawn iid from p(x) Divide data into bins of size h, stating from origin x o Histogram: Naive estimator: or Typo corrected on March
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Origin 0; h(1)=4/16 h(1.25)=1/8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 h(2)=2/2*8=0.125
7 Gaussian Kernel Estimator Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows): Gaussian Influence Functions in general: Influence of x t on x; h determines how quickly influence decreases as distance between x t and x increases; h is called “width” of kernel. Eick/Alpaydin: Non-Parametric Density Estimation Query point
8 Example: Kernel Density Estimation D={x1,x2,x3,x4} f D Gaussian (x)= influence(x1,x) + influence(x2,x) + influence(x3,x) + influence(x4,x)= =0.78 x1 x2 x3 x4 x y Remark: the density value of y would be larger than the one for x Eick/Alpaydin: Non-Parametric Density Estimation
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9
10 Density Functions for different values of h/ Remark: Eick/Alpaydin: Non-Parametric Density Estimation
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x), distance to kth closest instance to x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Multivariate Data Kernel density estimator Multivariate Gaussian kernel spheric ellipsoid
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Nonparametric Classification Estimate p(x|C i ) and use Bayes’ rule Kernel estimator k-NN estimator Skip: Idea use density function for each class
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Condensed Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) skip
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Condensed Nearest Neighbor Incremental algorithm: Add instance if needed skip
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Nonparametric Regression Aka smoothing models Regressogram Idea: use average of the output variable in a neighborhood of x
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 uses bins to define neighborhoods
19 Running mean smoother Running line smoother Running Mean/Kernel Smoother Kernel smoother where K( ) is Gaussian Additive models (Hastie and Tibshirani, 1990) Idea: weights example inversely by their distance to x.
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Example: Computing ĝ (2) xtxt rtrt Weight running mean smother for x=2, h=2.1 Weight of Kernel Smother computed using influence(x,x t ) Running mean smother prediction: ĝ (2)=(6+11+7)/3 Kernel smoother prediction: ĝ (2)=(6*0.2+11*0.2+7*0.1+3*0.05)/0.55
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Idea: Uses influence function to determine weight
23 Smoothness/Smooth Functions depends on a function’s discontinuities and on how quickly its first/second/third/… derivative changes A smooth function is a continuous function whose derivatives change quite slowly Smooth function are frequently preferred as models of systems and decision making small changes in input result in small changes in output Design: Smooth surfaces are more appealing; e.g. car or sofa design
24 Choosing h/k When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity h/k large: very few hills/smooth h/k small: a lot of hills; a lot of changes/discontinuities in the first/second/third/… derivative Cross-validation is used to finetune k or h.