Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Kernel Methods2 Outline One-Dimensional Kernel Smoothers Local Regression Local Likelihood Kernel Density estimation Naive Bayes Radial Basis Functions Mixture Models and EM
Kernel Methods3 One-Dimensional Kernel Smoothers k-NN: 30-NN curve is bumpy, since is discontinuous in x. The average changes in a discrete way, leading to a discontinuous.
Kernel Methods4 Nadaraya-Watson Kernel weighted average: Epanechnikov quadratic kernel: One-Dimensional Kernel Smoothers
Kernel Methods5 One-Dimensional Kernel Smoothers More general kernel: – : width function that determines the width of the neighborhood at x 0. –For quadratic kernel –For k-NN kernel Variance constant –The Epanechnikov kernel has compact support
Kernel Methods6 Three popular kernel for local smoothing: Epanechnikov kernel and tri-cube kernel are compact but tri-cube has two continuous derivatives Gaussian kernel is infinite support One-Dimensional Kernel Smoothers
Kernel Methods7 Boundary issue –Badly biased on the boundaries because of the asymmetry of the kernel in the region. –Linear fitting remove the bias to first order Local Linear Regression
Kernel Methods8 Local Linear Regression Locally weighted linear regression make a first- order correction Separate weighted least squares at each target point x 0 : The estimate: b(x) T =(1,x); B: Nx2 regression matrix with i- th row b(x) T ;
Kernel Methods9 Local Linear Regression The weights combine the weighting kernel and the least squares operations——Equivalent Kernel
Kernel Methods10 The expansion for, using the linearity of local regression and a series expansion of the true function f around x 0 For local regression The bias depends only on quadratic and higher-order terms in the expansion of. Local Linear Regression
Kernel Methods11 Local Polynomial Regression Fit local polynomial fits of any degree d
Kernel Methods12 Local Polynomial Regression Bias only have components of degree d+1 and higher. The reduction for bias costs the increased variance.
Kernel Methods13 选择核的宽度 核 中, 是参数,控制核宽度: – 对于有紧支集的核, 取其支集区域的半径 – 对于高斯核, 取其方差 – 对 k- 对近邻域法, 取 k/N 窗口宽度导致偏倚 - 方差权衡: – 窗口较窄,方差误差大,均值误差偏倚小 – 窗口较宽,方差误差小,均值误差偏倚大
Kernel Methods14 Structured Local Regression Structured kernels –Introduce structure by imposing appropriate restrictions on A Structured regression function –Introduce structure by eliminating some of the higher-order terms
Kernel Methods15 Any parametric model can be made local: –Parameter associated with : –Log-likelihood: –Model likelihood local to : –A varying coefficient model Local Likelihood & Other Models
Kernel Methods16 Logistic Regression –Local log-likelihood for the J class model –Center the local regressions at Local Likelihood & Other Models
Kernel Methods17 A natural local estimate The smooth Parzen estimate –For Gaussian kernel –The estimate become Kernel Density Estimation
Kernel Methods18 Kernel Density Estimation A kernel density estimate for systolic blood pressure. The density estimate at each point is the average contribution from each of the kernels at that point.
Kernel Methods19 Bayes’ theorem: The estimate for CHD uses the tri-cube kernel with k-NN bandwidth. Kernel Density Classification
Kernel Methods20 Kernel Density Classification The population class densities and the posterior probabilities
Kernel Methods21 Naïve Bayes Naïve Bayes model assume that given a class G=j, the features X k are independent: – is kernel density estimate, or Gaussian, for coordinate X k in class j. –If X k is categorical, use Histogram.
Kernel Methods22 Radial Basis Function & Kernel Radial basis function combine the local and flexibility of kernel methods. –Each basis element is indexed by a location or prototype parameter and a scale parameter –, a pop choice is the standard Gaussian density function.
Kernel Methods23 Radial Basis Function & Kernel For simplicity, focus on least squares methods for regression, and use the Gaussian kernel. RBF network model: Estimate the separately from the. A undesirable side effect of creating holes—— regions of IR p where none of the kernels has appreciable support.
Kernel Methods24 Gaussian radial basis function with fixed width can leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines. Renormalized radial basis functions. The expansion in renormalized RBF Radial Basis Function & Kernel
Kernel Methods25 Mixture Models & EM Gaussian Mixture Model – are mixture proportions, EM algorithm for mixtures –Given log-likelihood: –Suppose we observe Latent Binary Bad Good
Kernel Methods26 Mixture Models & EM Given,compute In Example
Kernel Methods27 Mixture Models & EM Application of mixtures to the heart disease risk factor study.
Kernel Methods28 Mixture Models & EM Mixture model used for classification of the simulated data