Lecture 09: Gaussian Processes

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Pattern Recognition and Machine Learning
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Visual Recognition Tutorial
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Gaussian Processes Li An Li An
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Lecture 2: Statistical learning primer for biologists
Gaussian Processes For Regression, Classification, and Prediction.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Lecture 04: Logistic Regression
ICS 280 Learning in Graphical Models
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Lecture 02: Linear Regression
CH 5: Multivariate Methods
Lecture 05: K-nearest neighbors
Graduate School of Information Sciences, Tohoku University
Lecture 04: Logistic Regression
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Probabilistic Models for Linear Regression
Lecture 26: Faces and probabilities
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
EC 331 The Theory of and applications of Maximum Likelihood Method
Statistical NLP: Lecture 4
The Multivariate Normal Distribution, Part 2
CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.
Lecture 11: Mixture of Gaussians
Lecture 02: Linear Regression
Pattern Recognition and Machine Learning
Lecture 04: Logistic Regression
CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.
More Parameter Learning, Multinomial and Continuous Variables
Generally Discriminant Analysis
Lecture 10: Gaussian Processes
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Lecture 03: K-nearest neighbors
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
HKN ECE 313 Exam 2 Review Session
Presentation transcript:

Lecture 09: Gaussian Processes CS489/698: Intro to ML Lecture 09: Gaussian Processes 10/24/17 Yao-Liang Yu

Outline Gaussian Distribution Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu

Announcement Assignment 3 due on Oct 31. 10/24/17 Yao-Liang Yu

Gaussian distribution covariance, PSD mean dimension 10/24/17 Yao-Liang Yu

Carl Friedrich Gauss (1777 – 1855) 10/24/17 Yao-Liang Yu

Important facts marginal joint conditional 10/24/17 Yao-Liang Yu

Derivation 10/24/17 Yao-Liang Yu

Outline Gaussian Distributions Gaussian Processes Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu

What is a random variable? A random variable is a function Z(ω) ω Z(ω) 10/24/17 Yao-Liang Yu

Gaussian process A collection of Gaussian random variables {Zt : t in T} such that for any finite N, {Zt : t in N} is jointly Gaussian A Gaussian process is a function of two variables Z(t,ω) For any finite N, {Zt := Z(t,ω) | t in N} is a Gaussian random vector For any ω, Zω := Z(t, ω) : T  R is a function of one variable t (sample path) Does Gaussian process exist? 10/24/17 Yao-Liang Yu

Example 10/24/17 Yao-Liang Yu

Mean and covariance function For each t, Z(t,ω) is a Gaussian random variable hence has mean For each s and t, the covariance between Zs and Zt: Say mean function covariance function 10/24/17 Yao-Liang Yu

Recap: verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite ( ) Symmetric: Kij = Kji Positive semidefinite (PSD): for all 10/24/17 Yao-Liang Yu

What is a covariance function? For t1, t2, …, tn, by definition is the covariance between and K is symmetric and PSD Thus, the covariance functionκis a kernel ! 10/24/17 Yao-Liang Yu

Conversely Theorem. Given any function and kernel function , exist This may not hold for other distributions !!! 10/24/17 Yao-Liang Yu

Andrey Kolmogorov (1903 - 1987) Foundations of the theory probability 10/24/17 Yao-Liang Yu

Effect of kernel 10/24/17 Yao-Liang Yu

Example: Linear Regression unknown but deterministic independent of each other for different x Z t Y is a Gaussian process 10/24/17 Yao-Liang Yu

Maximum Likelihood Having observed Need to estimate w Choose w that explains the observations best i.i.d. likelihood of data 10/24/17 Yao-Liang Yu

Example: Bayesian Linear Regression independent of w independent of each other for different x Z t Y is a Gaussian process 10/24/17 Yao-Liang Yu

Maximum A Posteriori likelihood of data prior posterior normalization Different prior on w leads to different regularization w ~ N(0, I) is equivalent as ridge regression w ~ Lap(0, I) is equivalent as Lasso 10/24/17 Yao-Liang Yu

Outline Gaussian Distributions Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu

Abstract View Let be a Gaussian process Having observed Need to predict 10/24/17 Yao-Liang Yu

Familiar view Equivalent in finite dimensions Feature map of k Equivalent in finite dimensions Incorrect but “intuitive” in infinite dimensions 10/24/17 Yao-Liang Yu

Back to abstract is jointly Gaussian What is the distribution of ? 10/24/17 Yao-Liang Yu

Recap: Important facts marginal joint conditional 10/24/17 Yao-Liang Yu

Recap: Example 10/24/17 Yao-Liang Yu

Application t = (position, velocity, acceleration) Z = torque 10/24/17 Yao-Liang Yu

Outline Gaussian Distributions Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu

Gaussian process classification Binary label Y generated by Zt is a Gaussian process (real-valued) Given observations Need to predict integrate out latent variable 10/24/17 Yao-Liang Yu

Questions? 10/24/17 Yao-Liang Yu