Gaussian process regression Bernád Emőke 2007. Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

Biointelligence Laboratory, Seoul National University

Pattern Recognition and Machine Learning

Computer vision: models, learning and inference Chapter 8 Regression.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Visual Recognition Tutorial

Chapter 8 Estimation: Additional Topics

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition and Machine Learning

Classification and risk prediction

Today Linear Regression Logistic Regression Bayesians v. Frequentists

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Presenting: Assaf Tzabari

Machine Learning CMPT 726 Simon Fraser University

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

GAUSSIAN PROCESS REGRESSION FORECASTING OF COMPUTER NETWORK PERFORMANCE CHARACTERISTICS 1 Departments of Computer Science and Mathematics, 2 Department.

Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.

Today Wrap up of probability Vectors, Matrices. Calculus

Spline and Kernel method Gaussian Processes

Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.

1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.

PATTERN RECOGNITION AND MACHINE LEARNING

Gaussian Processes Nando de Freitas University of British Columbia June 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.

Computer vision: models, learning and inference Chapter 19 Temporal models.

1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,

University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Gaussian Processes Li An Li An

Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.

Lecture 2: Statistical learning primer for biologists

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Dropout as a Bayesian Approximation

by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Gaussian Processes For Regression, Classification, and Prediction.

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /

Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.

- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.

RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CEE 6410 Water Resources Systems Analysis

Deep Feedforward Networks

Probability Theory and Parameter Estimation I

Non-Parametric Models

Graduate School of Information Sciences, Tohoku University

Lecture 09: Gaussian Processes

Special Topics In Scientific Computing

CSCI 5822 Probabilistic Models of Human and Machine Learning

Hidden Markov Models Part 2: Algorithms

OVERVIEW OF BAYESIAN INFERENCE: PART 1

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Machine Learning Math Essentials Part 2

Pattern Recognition and Machine Learning

Biointelligence Laboratory, Seoul National University

More Parameter Learning, Multinomial and Continuous Variables

Lecture 10: Gaussian Processes

Parametric Methods Berlin Chen, 2005 References:

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.

Probabilistic Surrogate Models

Presentation transcript:

Gaussian process regression Bernád Emőke 2007

Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number of which have (consistent) joint Gaussian distributions. A Gaussian process is fully specified by its mean function m(x) and covariance function k(x,x’). f ~ GP (m,k)

Generalization from distribution to process Consider the Gaussian process given by: f ~ GP(m,k), and We can draw samples from the function f (vector x)., i,j = 1,..,n

The algorithm … xs = (-5:0.2:5)’; ns = size(xs,1); keps = 1e-9; % the mean function m = inline(‘0.25*x.^2’); % the covariance function K = inline(’exp(-0.5*(repmat(p’’,size(q))-repmat(q,size(p’’))).^2)’); % the distribution function fs = m(xs) + chol(K(xs,xs)+keps*eye(ns))’*randn(ns,1); plot(xs,fs,’.’) …

The result The dots are the values generated with algorithm, the two other curves have (less correctly) been drawn by connecting sampled points.

Posterior Gaussian Process The GP will be used as a prior for Bayesian inference. The primary goals computing the posterior is that it can be used to make predictions for unseen test cases. This is useful if we have enough prior information about a dataset at hand to confidently specify prior mean and covariance functions. Notations: f : function values of training cases (x) f* : function values of the test set (x’) : training means (m(x)) : test means ∑ : covariance (k(x,x’)) ∑* : training set covariance ∑** : training-test set covariance

Posterior Gaussian Process The formula for conditioning a joint Gaussian distribution is: The conditional distribution: This is the posterior distribution for a specific set of test cases. It is easy to verify that the corresponding posterior process Where ∑(X,x) is a vector of covariances between every training case and x.

Gaussian noise in the training outputs Every f(x) has a extra covariance with itself only, with a magnitude equal to the noise variance:, 20 training data GP posterior noise level 0,7

Training a Gaussian Process The mean and covariance functions are parameterized in terms of hyperparameters. For example: The hyperparameters: The log marginal likelihood: f ~ GP(m,k),

Optimizing the marginal likelihood Calculating the partial derivatives: With a numerical optimization routine conjugate gradients to find good hyperparameter settings.

2-dimensional regression The training data has an unknown Gaussian noise and can be seen in the figure 1. in MLP network with Bayesian learning we needed 2500 samples With Gaussian Processes we needed only 350 samples to reach the "right" distribution The CPU time needed to sample the 350 samples on a 2400MHz Intel Pentium workstation was approximately 30 minutes.

References Carl Edward Rasmussen: Gaussian Processes in Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams: Gaussian Processes for Machine Learning