Gaussian Processes Nando de Freitas University of British Columbia June 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Machine Learning and Data Mining Linear regression
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Factor Graphs, Variable Elimination, MLEs Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A.
Bayesian inference of normal distribution
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
From Cognitive Science and Machine Learning Summer School 2010
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Pattern Recognition and Machine Learning
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
GAUSSIAN PROCESS REGRESSION FORECASTING OF COMPUTER NETWORK PERFORMANCE CHARACTERISTICS 1 Departments of Computer Science and Mathematics, 2 Department.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
1 Confidence Intervals for Means. 2 When the sample size n< 30 case1-1. the underlying distribution is normal with known variance case1-2. the underlying.
Spline and Kernel method Gaussian Processes
Gaussian process regression Bernád Emőke Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.
Gaussian process modelling
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
-Arnaud Doucet, Nando de Freitas et al, UAI
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Gaussian Processes Li An Li An
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Lecture 2: Statistical learning primer for biologists
Dropout as a Bayesian Approximation
Gaussian Processes For Regression, Classification, and Prediction.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
CEE 6410 Water Resources Systems Analysis
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Department of Civil and Environmental Engineering
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Non-Parametric Models
Lecture 09: Gaussian Processes
Special Topics In Scientific Computing
Dirichlet process tutorial
CSCI 5822 Probabilistic Models of Human and Machine Learning
TexPoint fonts used in EMF.
TexPoint fonts used in EMF.
OVERVIEW OF BAYESIAN INFERENCE: PART 1
CSCI 5822 Probabilistic Models of Human and Machine Learning
Pattern Recognition and Machine Learning
More Parameter Learning, Multinomial and Continuous Variables
Lecture 10: Gaussian Processes
Parametric Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Classical regression review
Presentation transcript:

Gaussian Processes Nando de Freitas University of British Columbia June 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAAAAAA

GP resources Wikipedia is an excellent resource for the matrix inversion lemma, aka Sherman–Morrison–Woodbury formula or just Woodbury matrix identity. “Gaussian processes in machine learning” by Carl Edward Rasmussen is a nice brief introductory tutorial with Matlab code. Ed Snelson’s PhD thesis is an excellent resource on Gaussian processes and I will use some of its introductory material. The book of Chris Williams and Carl Rasmussen is the ultimate resource. Other good resources: Alex Smola, Zoubin Ghahramani, …

Learning and Bayesian inference Likelihood Prior of “sheep” class Posterior “sheep” d h

Nonlinear regression

Sampling from P(f) N = 5; % The number of training points. sigma = 0.1 % Noise variance. h = 1; % Kernel parameter. % Randomly generate training points on [-5,5]. X = *rand(N,1); x = (-5:0.1:5)'; % The test points. n = size(x,1); % The number of test points. % Construct the mean and covariance functions. m = inline('0.25*x.^2', 'x'); % another example: m = inline('sin(0.9*x)','x'); K = inline(['exp((-1/(h^2))*(repmat(transpose(p),size(q)) - repmat(q,size(transpose(p)))).^2)'], 'p', 'q','h'); % Demonstrate how to sample functions from the prior: L = 5; % Number of functions sampled from the prior P(f) f = zeros(n,L); for i=1:L, f(:,i) = m(x) + sqrtm(K(x,x,h)+sigma*eye(n))'*randn(n,1); end; plot(x,f,'linewidth',2);

[Snelson, 2007]

Active learning with GPs

CPSC 3408 Expected improvement Actual unknown function GP function approximation Next evaluation Data points

CPSC 3409 Expected Improvement Parameter Cost function

2D sensor placement application by Andreas Krause and Carlos Guestrin

CPSC Sensor network scheduling Automatically schedule sensors to obtain the best understanding of the environment while minimizing resource expenditure (power, bandwidth, need for human intervention)?

GP regression Gaussian noise / likelihood Zero-mean GP prior The marginal likelihood (evidence) is Gaussian:

Proof:

GP regression Both sets are, by definition, jointly Gaussian: Train set Test set The joint distribution of the measurements is:

GP regression The predictive conditional distribution is Gaussian too:

Proof sketch:

GP regression % Generate random training labels. F = m(X) + chol(K(X,X,h)+sigma*eye(N))'*randn(N,1); M = m(X); % COMPUTE POSTERIOR MEAN AND VARIANCE S = eye(N)*sigma + K(X,X,h); y = m(x) + K(X,x,h)*inv(S)*(F - M); y = y'; for i = 1:n xi = x(i); c(i) = sigma + K(xi,xi,h) - K(X,xi,h)*inv(S)*K(xi,X,h); end % Plot the mean and 95% confidence intervals. plot(x,y-2*c,'g-','linewidth',3) plot(x,y+2*c,'g-','linewidth',3) plot(x,y,'r','linewidth',3) plot(X,F,'bx','linewidth',15)

Parameter learning for GPs: maximum likelihood For example, we can parameterize the mean and covariance: