Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning: Kernel Methods.
Chapter 4: Linear Models for Classification
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Basis Expansion and Regularization
Kernel methods - overview
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
Outline Separating Hyperplanes – Separable Case
Model Inference and Averaging
Regression analysis Control of built engineering objects, comparing to the plan Surveying observations – position of points Linear regression Regression.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Local Likelihood & other models, Kernel Density Estimation & Classification, Radial Basis Functions.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Estimating standard error using bootstrap
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Model Inference and Averaging
Ch8: Nonparametric Methods
Machine learning, pattern recognition and statistical data modelling
Overview of Supervised Learning
Mixture Density Networks
Mathematical Foundations of BME Reza Shadmehr
What is Regression Analysis?
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Generally Discriminant Analysis
Learning Theory Reza Shadmehr
Basis Expansions and Generalized Additive Models (2)
Model generalization Brief summary of methods
Mathematical Foundations of BME
Mathematical Foundations of BME
Linear Discrimination
Support Vector Machines 2
Presentation transcript:

Dept. Computer Science & Engineering, Shanghai Jiao Tong University Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Outline One-Dimensional Kernel Smoothers Local Regression Local Likelihood Kernel Density estimation Naive Bayes Radial Basis Functions Mixture Models and EM 2018/7/4 Kernel Methods

One-Dimensional Kernel Smoothers k-NN: 30-NN curve is bumpy, since is discontinuous in x. The average changes in a discrete way, leading to a discontinuous . 2018/7/4 Kernel Methods

One-Dimensional Kernel Smoothers Nadaraya-Watson Kernel weighted average: Epanechnikov quadratic kernel: 2018/7/4 Kernel Methods

One-Dimensional Kernel Smoothers More general kernel: : width function that determines the width of the neighborhood at x0. For quadratic kernel For k-NN kernel Variance constant The Epanechnikov kernel has compact support 2018/7/4 Kernel Methods

One-Dimensional Kernel Smoothers Three popular kernel for local smoothing: Epanechnikov kernel and tri-cube kernel are compact but tri-cube has two continuous derivatives Gaussian kernel is infinite support 2018/7/4 Kernel Methods

Local Linear Regression Boundary issue Badly biased on the boundaries because of the asymmetry of the kernel in the region. Linear fitting remove the bias to first order 2018/7/4 Kernel Methods

Local Linear Regression Locally weighted linear regression make a first-order correction Separate weighted least squares at each target point x0: The estimate: b(x)T=(1,x); B: Nx2 regression matrix with i-th row b(x)T; 2018/7/4 Kernel Methods

Local Linear Regression The weights combine the weighting kernel and the least squares operations——Equivalent Kernel 2018/7/4 Kernel Methods

Local Linear Regression The expansion for , using the linearity of local regression and a series expansion of the true function f around x0 For local regression The bias depends only on quadratic and higher-order terms in the expansion of . 2018/7/4 Kernel Methods

Local Polynomial Regression Fit local polynomial fits of any degree d 2018/7/4 Kernel Methods

Local Polynomial Regression Bias only have components of degree d+1 and higher. The reduction for bias costs the increased variance. 2018/7/4 Kernel Methods

选择核的宽度 核 中, 是参数,控制核宽度: 窗口宽度导致偏倚-方差权衡: 对于有紧支集的核, 取其支集区域的半径 对于高斯核, 取其方差 核 中, 是参数,控制核宽度: 对于有紧支集的核, 取其支集区域的半径 对于高斯核, 取其方差 对k-对近邻域法, 取 k/N 窗口宽度导致偏倚-方差权衡: 窗口较窄,方差误差大,均值误差偏倚小 窗口较宽,方差误差小,均值误差偏倚大 2018/7/4 Kernel Methods

Structured Local Regression Structured kernels Introduce structure by imposing appropriate restrictions on A Structured regression function Introduce structure by eliminating some of the higher-order terms 2018/7/4 Kernel Methods

Local Likelihood & Other Models Any parametric model can be made local: Parameter associated with : Log-likelihood: Model likelihood local to : A varying coefficient model 2018/7/4 Kernel Methods

Local Likelihood & Other Models Logistic Regression Local log-likelihood for the J class model Center the local regressions at 2018/7/4 Kernel Methods

Kernel Density Estimation A natural local estimate The smooth Parzen estimate For Gaussian kernel The estimate become 2018/7/4 Kernel Methods

Kernel Density Estimation A kernel density estimate for systolic blood pressure. The density estimate at each point is the average contribution from each of the kernels at that point. 2018/7/4 Kernel Methods

Kernel Density Classification Bayes’ theorem: The estimate for CHD uses the tri-cube kernel with k-NN bandwidth. 2018/7/4 Kernel Methods

Kernel Density Classification The population class densities and the posterior probabilities 2018/7/4 Kernel Methods

Naïve Bayes Naïve Bayes model assume that given a class G=j, the features Xk are independent: is kernel density estimate, or Gaussian, for coordinate Xk in class j. If Xk is categorical, use Histogram. 2018/7/4 Kernel Methods

Radial Basis Function & Kernel Radial basis function combine the local and flexibility of kernel methods. Each basis element is indexed by a location or prototype parameter and a scale parameter , a pop choice is the standard Gaussian density function. 2018/7/4 Kernel Methods

Radial Basis Function & Kernel For simplicity, focus on least squares methods for regression, and use the Gaussian kernel. RBF network model: Estimate the separately from the . A undesirable side effect of creating holes——regions of IRp where none of the kernels has appreciable support. 2018/7/4 Kernel Methods

Radial Basis Function & Kernel Renormalized radial basis functions. The expansion in renormalized RBF Gaussian radial basis function with fixed width can leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines. 2018/7/4 Kernel Methods

Mixture Models & EM Gaussian Mixture Model EM algorithm for mixtures are mixture proportions, EM algorithm for mixtures Given log-likelihood: Suppose we observe Latent Binary Bad Good 2018/7/4 Kernel Methods

Mixture Models & EM Given ,compute In Example 2018/7/4 Kernel Methods

Mixture Models & EM Application of mixtures to the heart disease risk factor study. 2018/7/4 Kernel Methods

Mixture Models & EM Mixture model used for classification of the simulated data 2018/7/4 Kernel Methods

2018/7/4 Kernel Methods