Download presentation
Presentation is loading. Please wait.
Published byAllison Heath Modified over 6 years ago
1
Machine learning, pattern recognition and statistical data modelling
Lecture 6. Kernel methods and additive models Coryn Bailer-Jones
2
Topics Think globally, act locally: kernel methods
Generalized Additive Models (GAMs) for regression (classification next week) confidence intervals
3
Kernel methods In the first lecture we looked at kernel methods for density estimation E.g. Gaussian kernel of width 2h in d dimensions estimated from N data points πξπ₯ξ= 1 π π=1 π=π 1 ξ2ξ β 2 ξ π 2 exp ξ β₯π₯β π₯ π β₯ 2 2h 2 ξ
4
K-NN kernel density estimation
K = no. neighbours N = total no. points V = volume occupied by K neighbours Overcome fixed kernel size: Vary search volume size, V, until reach K neighbours πξπ₯ξ= πΎ ππ Β© Bishop (1995)
5
One-dimensional kernel smoothers: k-nn
πβnn smoother:πΈξπβ£π=π₯ξ= π ξ ξπ₯ξ=Ave π¦ π β£ π₯ π β π π ξπ₯ξ π π ξπ₯ξis the set ofπpoints nearest toπ₯in (e.g.) squared distance Drawback is that the estimator is not smooth inπ₯. π=30 Β© Hastie, Tibshirani, Friedman (2001)
6
One-dimensional kernel smoothers: Epanechnikov
Instead give more distant points less weight, e.g. with the NadarayaβWatson kernelβweighted average π ξ ξ π₯ 0 ξ= π=1 π πΎ ξ ξ π₯ 0, π₯ π ξ π¦ π π=1 π πΎ ξ ξ π₯ 0, π₯ π ξ using the Epanechnikov kernel πΎ ξ ξ π₯ 0, π₯ π ξ=π· ξ β£π₯β π₯ 0 β£ ξ ξ whereπ·ξπ‘ξ= ξ1β π‘ 2 ξifβ£π‘β£β€1 0otherwise Could generalize kernel to have variable width πΎ ξ ξ π₯ 0, π₯ π ξ=π· ξ β£π₯β π₯ 0 β£ β ξ ξ π₯ 0 ξ ξ Β© Hastie, Tibshirani, Friedman (2001)
7
Kernel comparison Epanechnikov:π·ξπ‘ξ= ξ1β π‘ 2 ξifβ£π‘β£β€1 0otherwise Triβcube:π·ξπ‘ξ= ξ1β β£π‘β£ 3 ξ 3 ifβ£π‘β£β€1 0otherwise Β© Hastie, Tibshirani, Friedman (2001)
8
k-nn and Epanechnikov kernels
π=30 ξ=0.2 Epanechnikov kernel has fixed width (bias approx. constant, variance not) k-nn has adaptive width (constant variance, bias varies as 1/density) free parameters: k or ξ Β© Hastie, Tibshirani, Friedman (2001)
9
Locally-weighted averages can be biased at boundaries
Β© Hastie, Tibshirani, Friedman (2001) Kernel is asymmetric at the boundary
10
Local linear regression
solve linear least squares in local region to predict at a single point green points: effective kernel Β© Hastie, Tibshirani, Friedman (2001)
11
Local quadratic regression
Β© Hastie, Tibshirani, Friedman (2001)
12
Bias-variance trade-off
higher order local fits reduce bias at cost of increased variance, esp. at boundary (see previous page) Β© Hastie, Tibshirani, Friedman (2001)
13
Kernels in higher dimensions
kernel smoothing and local regression generalize to higher dimensions... ...but curse of dimensionality not overcome cannot simultaneously retain localness (=low bias) and sufficient sample size (=low variance) without increasing total sample exponentially with dimension In general we need to make assumptions about underlying data/true function and use structured regression/classification
14
Generalized Additive Model
Could model a pβdimensional set of data using πξ π 1, π 2, ..., π π ξ=ξ·ξ π 1 ξ π 1 ξξ π 2 ξ π 2 ξξ...ξ π π ξ π π ξξξ» Idea is to fit each 1D function separately and then provide an algorithm to iteratively combine them. Do this by minimizing penalized RSS ππ
ππ= π=1 π π¦ π βξ·β π=1 π π π ξ π₯ ππ ξ 2 ξ π=1 π ξ π π π β²β²ξ π‘ π ξ 2 ππ‘ π Could use a variety of smoothers for each π π ξξand the corresponding penalty. Here use cubic splines. To make solution unique must fixξ·, e.g. ξ·= 1 π π=1 π π¦ π in which case π=1 π π π ξ π₯ ππ ξ=0βπ Avoiding the curse: Split pβdimensional problem into p 1βdimensional ones
15
Backfitting algorithm for additive models
in principle this step is not required π π is a smoothing splinefit as a function of π₯ ππ to the residuals, i.e. whatπ βππ’ππbe explained by π π Β© Hastie, Tibshirani, Friedman (2001)
16
Generalized additive models on the rock data
Application of the gam{gam} package on the rock{MASS} data set. See the R scripts on the lecture web site
17
Confidence intervals with splines
Spline function estimate is π ξ ξπ₯ξ=π ξΈ ξ =π ξ π π πξξ ξΆ π ξ β1 π π π² = π ξ π² The smoother matrix π ξ depends only on π₯ π andξbut not onπ². π=Variance π ξ ξπ₯ξ = π ξ π ξ π ξ= ππππξπξ gives the pointwise error estimates on either the training data or new data.
18
R packages for Generalized Additive Models
gam{gam} same as the package implemented in S-PLUS gam{mgcv} a variant on the above brutto{mda} automatically selects between smooth fit (cubic spline), linear fit and omitting variable altogether
19
Summary Kernel methods
improvements over nearest neighbours to reduce (or control) bias local linear, quadratic regression Generalized Additive Models defeat (cheat?) the curse of dimensionality by dividing into p 1- dimensional fitting problems typically use kernel or spline smoothers iterative backfitting algorithm MARS (multiple adaptive regression splines) piecewise linear basis functions if prevent pairwise interactions of dimensions, it is an additive model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.