Download presentation
Presentation is loading. Please wait.
Published byDiana Newton Modified over 9 years ago
1
Kernel Methods Arie Nakhmani
2
Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers
3
Kernel Smoothers – The Goal Estimating a function by using noisy observations, when the parametric model for this function is unknown The resulting function should be smooth The level of “smoothness” should be set by a single parameter
4
Example N=100 sample points What is it: “smooth enough” ?
5
Example N=100 sample points
6
Exponential Smoother Smaller smoother line, but more delayed
7
Exponential Smoother Simple Sequential Single parameter Single value memory Too rough Delayed
8
Moving Average Smoother
9
m=11 Larger m smoother, but straightened line
10
Moving Average Smoother Sequential Single parameter: the window size m Memory for m values Irregularly smooth What if we have p-dimensional problem with p>1 ???
11
Nearest Neighbors Smoother x0x0 m=16 Larger m smoother, but biased line
12
Nearest Neighbors Smoother Not sequential Single parameter: the number of neighbors m Trivially extended to any number of dimensions Memory for m values Depends on metrics definition Not smooth enough Biased end-points
13
Low Pass Filter 2 nd order Butterworth: Why do we need kernel smoothers ???
14
Low Pass Filter The same filter…for log function
15
Low Pass Filter Smooth Simply extended to any number of dimensions Effectively, 3 parameters: type, order, and bandwidth Biased end-points Inappropriate for some functions (depends on bandwidth)
16
Kernel Average Smoother x0x0
17
Nadaraya-Watson kernel-weighted average: with the kernel: for Nearest Neighbor Smoother for Locally Weighted Average t
18
Popular Kernels Epanechnikov kernel: Tri-cube kernel: Gaussian Kernel:
19
Non-Symmetric Kernel Kernel example: Which kernel is that ???
20
Kernel Average Smoother Single parameter: window width Smooth Trivially extended to any number of dimensions Memory-based method – little or no training is required Depends on metrics definition Biased end-points
21
Local Linear Regression Kernel-weighted average minimizes: Local linear regression minimizes:
22
Local Linear Regression Solution: where: Other representation: equivalent kernel
23
Local Linear Regression x0x0
24
Equivalent Kernels
25
Local Polynomial Regression Why stop at local linear fits? Let’s minimize:
26
Local Polynomial Regression
27
Variance Compromise
28
Conclusions Local linear fits can help bias dramatically at the boundaries at a modest cost in variance. Local linear fits more reliable for extrapolation. Local quadratic fits do little at the boundaries for bias, but increase the variance a lot. Local quadratic fits tend to be most helpful in reducing bias due to curvature in the interior of the domain. λ controls the tradeoff between bias and variance. Larger λ makes lower variance but higher bias
29
Local Regression in Radial kernel:
30
Popular Kernels Epanechnikov kernel Tri-cube kernel Gaussian kernel
31
Example
32
Higher Dimensions The boundary estimation is problematic Many sample points are needed to reduce the bias Local regression is less useful for p>3 It’s impossible to maintain localness (low bias) and sizeable samples (low variance) at the same time
33
Structured Kernels Non-radial kernel: Coordinates or directions can be downgraded or omitted by imposing restrictions on A. Covariance can be used to adapt a metric A. (related to Mahalanobis distance) Projection-pursuit model
34
Structured Regression Divide into a set (X 1,X 2,…,X q ) with q<p and the remainder of the variables collect in vector Z. Conditionally linear model: For given Z fit a model by locally weighted least squares:
35
Density Estimation original distribution constant window estimation sample set Mixture of two normal distributions
36
Kernel Density Estimation Smooth Parzen estimate:
37
Comparison Mixture of two normal distributions Usually Bandwidth selection is more important than kernel function selection
38
Kernel Density Estimation Gaussian kernel density estimation: where denote the Gaussian density with mean zero and standard deviation. Generalization to : LPF
39
Kernel Density Classification For a J class problem:
40
Radial Basis Functions Function f(x) is represented as expansion in basis functions: Radial basis functions expansion (RBF): where the sum-of-squares is minimized with respect to all the parameters (for Gaussian kernel):
41
Radial Basis Functions When assuming constant j = : the problem of “holes” The solution - Renormalized RBF:
42
Additional Applications Local likelihood Mixture models for density estimation and classification Mean-shift
43
Conclusions Memory-based methods: the model is the entire training data set Infeasible for many real-time applications Provides good smoothing result for arbitrary sampled function Appropriate for interpolation and extrapolation When the model is known, better use another fitting methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.