Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDAR Biometrika, 2008.

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Data mining and statistical learning - lecture 6
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Presenter: Yufan Liu November 17th,
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Pattern Recognition and Machine Learning
Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007.
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Chapter 11 Multiple Regression.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Linear and generalised linear models
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Linear and generalised linear models
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Spline and Kernel method Gaussian Processes
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Summarized by Soo-Jin Kim
Gaussian process modelling
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Cs: compressed sensing
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
1 Advances in the Construction of Efficient Stated Choice Experimental Designs John Rose 1 Michiel Bliemer 1,2 1 The University of Sydney, Australia 2.
Method of Least Squares. Least Squares Method of Least Squares:  Deterministic approach The inputs u(1), u(2),..., u(N) are applied to the system The.
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Spatial Dynamic Factor Analysis Hedibert Freitas Lopes, Esther Salazar, Dani Gamerman Presented by Zhengming Xing Jan 29,2010 * tables and figures are.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Measure Independence in Kernel Space Presented by: Qiang Lou.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Gaussian Processes Li An Li An
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
CpSc 881: Machine Learning
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Gaussian Processes For Regression, Classification, and Prediction.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Discovering Optimal Training Policies: A New Experimental Paradigm Robert V. Lindsey, Michael C. Mozer Institute of Cognitive Science Department of Computer.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
A Primer on Running Deterministic Experiments
Support Vector Machines (SVM)
Deep Feedforward Networks
CSCI 5822 Probabilistic Models of Human and Machine Learning
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Unfolding Problem: A Machine Learning Approach
Lithography Diagnostics Based on Empirical Modeling
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Robust Full Bayesian Learning for Neural Networks
Rank-Sparsity Incoherence for Matrix Decomposition
Unfolding with system identification
Maximum likelihood estimation of intrinsic dimension
Presentation transcript:

Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDAR Biometrika, 2008

Introduction We have noisy observations from the unknown function observed at locations respectively The prediction for new input x, Where K_{f,f} is n x n covariance matrix. Problem: O(n^3) in performing necessary matrix inversions with n denoting the number of data points The unknown function f is assumed to be a realization of a GP.

Key idea Existing solutions : “Knots” or “landmarks” based solutions – Determining the location and spacing of knots, with the choice having substantial impact Methods have been proposed for allowing uncertain numbers and locations of knots in the predictive process using reversible jump – Unfortunately, such free knot methods increase the computational burden substantially, partially eliminating the computational savings due to a low rank method Motivated by the literature on compressive sensing, the authors propose an alternative random projection of all the data points onto a lower- dimensional subspace

Nystrom Approximation (Williams and Seeger, 2000)

Landmark based Method Let X*= Let, Defining An approximation to

Random projection method The key idea for random projection is to use instead of where Let be the random projection approximation to,

Some properties When m = n, So, and we get back the original process with a full rank random projection Relation to Nystrom approximation – Approximation in the machine learning literature were viewed as reduced rank approximations to the covariance matrices – It is easy to see that corresponds to a Nystrom approximation to

Choice of Ф

Low Distortion embeddings Embed matrix K from a using random projection matrix Embeddings with low distortion properties have been well-studied and J-L transforms are among the most popular –

Given a fixed rank m what is the near optimal projection for that rank m?

Finding range for a given target error condition

Results Conditioning numbers – Full covariance matrix for a smooth GP tracked at a dense set of locations will be ill-conditioned and nearly rank-deficient in practice – Inverses may be highly unstable and severely degrade inference

Results Parameter estimation (Toy data)

Results Parameter estimation (real data)