RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
EE-148 Expectation Maximization Markus Weber 5/11/99.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Kernel methods - overview
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Spline and Kernel method Gaussian Processes
Gaussian process regression Bernád Emőke Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.
Gaussian process modelling
PATTERN RECOGNITION AND MACHINE LEARNING
Outline Separating Hyperplanes – Separable Case
Gaussian Processes Nando de Freitas University of British Columbia June 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Model Inference and Averaging
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Gaussian processes for time-series modelling by S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain Philosophical Transactions A Volume.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Phisical Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 4th Maximum likelihood estimation and EM algorithm Kazuyuki Tanaka Graduate School.
INTRODUCTION TO Machine Learning 3rd Edition
CS Statistical Machine learning Lecture 24
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Gaussian Processes For Regression, Classification, and Prediction.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Probability Theory and Parameter Estimation I
Variational Bayes Model Selection for Mixture Distribution
Model Inference and Averaging
Classification of unlabeled data:
Machine learning, pattern recognition and statistical data modelling
Machine learning, pattern recognition and statistical data modelling
Bias and Variance of the Estimator
CSCI 5822 Probabilistic Models of Human and Machine Learning
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Probabilistic Models with Latent Variables
10701 / Machine Learning Today: - Cross validation,
Biointelligence Laboratory, Seoul National University
Parametric Methods Berlin Chen, 2005 References:
Graduate School of Information Sciences, Tohoku University
Yalchin Efendiev Texas A&M University
Probabilistic Surrogate Models
Maximum Likelihood Estimation (MLE)
Uncertainty Propagation
Presentation transcript:

RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation

Penalized Cubic Regression Splines gam() in library “mgcv” gam( y ~ s(x, bs=“cr”, k=n.knots), knots=list(x=c(…)), data = dataset) By default, the optimal smoothing parameter selected by GCV R Demo 1

Kernel Method Nadaraya-Watson locally constant model locally linear polynomial model How to define “local”? By Kernel function, e.g. Gaussian kernel R Demo 1 R package: “locfit” Function: locfit(y~x, kern=“gauss”, deg=, alpha= ) Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg=, alpha= bandwidth range)

Gaussian Processes Distribution on functions f ~ GP(m,κ) m: mean function κ: covariance function p(f(x 1 ),..., f(x n )) ∼ N n (μ, K) μ = [m(x 1 ),...,m(x n )] K ij = κ (x i,x j ) Idea: If x i, x j are similar according to the kernel, then f(x i ) is similar to f(x j )

Gaussian Processes – Noise free observations Example task: learn a function f(x) to estimate y, from data (x, y) A function can be viewed as a random variable of infinite dimensions GP provides a distribution over functions.

Gaussian Processes – Noise free observations Model (x, f) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noise free data (x, f), Length-scale R Demo 2

Model (x, y) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noisy data (x, y), R Demo 3 Gaussian Processes – Noisy observations (GP for Regression)

Reference Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams 527 lecture notes by Emily Fox

Mixture Models – Density Estimation EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC) Remember: EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)

EM algorithm Iterative procedure that attempts to maximize log- likelihood ---> MLE estimates of the mixture model parameters. I.e. one final density estimate

Bayesian Mixture Modeling (MCMC) Uses an iterative procedure to DRAW SAMPLES from posterior (then you can average draws, etc.) Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.