RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation
Penalized Cubic Regression Splines gam() in library “mgcv” gam( y ~ s(x, bs=“cr”, k=n.knots), knots=list(x=c(…)), data = dataset) By default, the optimal smoothing parameter selected by GCV R Demo 1
Kernel Method Nadaraya-Watson locally constant model locally linear polynomial model How to define “local”? By Kernel function, e.g. Gaussian kernel R Demo 1 R package: “locfit” Function: locfit(y~x, kern=“gauss”, deg=, alpha= ) Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg=, alpha= bandwidth range)
Gaussian Processes Distribution on functions f ~ GP(m,κ) m: mean function κ: covariance function p(f(x 1 ),..., f(x n )) ∼ N n (μ, K) μ = [m(x 1 ),...,m(x n )] K ij = κ (x i,x j ) Idea: If x i, x j are similar according to the kernel, then f(x i ) is similar to f(x j )
Gaussian Processes – Noise free observations Example task: learn a function f(x) to estimate y, from data (x, y) A function can be viewed as a random variable of infinite dimensions GP provides a distribution over functions.
Gaussian Processes – Noise free observations Model (x, f) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noise free data (x, f), Length-scale R Demo 2
Model (x, y) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noisy data (x, y), R Demo 3 Gaussian Processes – Noisy observations (GP for Regression)
Reference Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams 527 lecture notes by Emily Fox
Mixture Models – Density Estimation EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC) Remember: EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)
EM algorithm Iterative procedure that attempts to maximize log- likelihood ---> MLE estimates of the mixture model parameters. I.e. one final density estimate
Bayesian Mixture Modeling (MCMC) Uses an iterative procedure to DRAW SAMPLES from posterior (then you can average draws, etc.) Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.