Lecture 09: Gaussian Processes CS489/698: Intro to ML Lecture 09: Gaussian Processes 10/24/17 Yao-Liang Yu
Outline Gaussian Distribution Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu
Announcement Assignment 3 due on Oct 31. 10/24/17 Yao-Liang Yu
Gaussian distribution covariance, PSD mean dimension 10/24/17 Yao-Liang Yu
Carl Friedrich Gauss (1777 – 1855) 10/24/17 Yao-Liang Yu
Important facts marginal joint conditional 10/24/17 Yao-Liang Yu
Derivation 10/24/17 Yao-Liang Yu
Outline Gaussian Distributions Gaussian Processes Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu
What is a random variable? A random variable is a function Z(ω) ω Z(ω) 10/24/17 Yao-Liang Yu
Gaussian process A collection of Gaussian random variables {Zt : t in T} such that for any finite N, {Zt : t in N} is jointly Gaussian A Gaussian process is a function of two variables Z(t,ω) For any finite N, {Zt := Z(t,ω) | t in N} is a Gaussian random vector For any ω, Zω := Z(t, ω) : T R is a function of one variable t (sample path) Does Gaussian process exist? 10/24/17 Yao-Liang Yu
Example 10/24/17 Yao-Liang Yu
Mean and covariance function For each t, Z(t,ω) is a Gaussian random variable hence has mean For each s and t, the covariance between Zs and Zt: Say mean function covariance function 10/24/17 Yao-Liang Yu
Recap: verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite ( ) Symmetric: Kij = Kji Positive semidefinite (PSD): for all 10/24/17 Yao-Liang Yu
What is a covariance function? For t1, t2, …, tn, by definition is the covariance between and K is symmetric and PSD Thus, the covariance functionκis a kernel ! 10/24/17 Yao-Liang Yu
Conversely Theorem. Given any function and kernel function , exist This may not hold for other distributions !!! 10/24/17 Yao-Liang Yu
Andrey Kolmogorov (1903 - 1987) Foundations of the theory probability 10/24/17 Yao-Liang Yu
Effect of kernel 10/24/17 Yao-Liang Yu
Example: Linear Regression unknown but deterministic independent of each other for different x Z t Y is a Gaussian process 10/24/17 Yao-Liang Yu
Maximum Likelihood Having observed Need to estimate w Choose w that explains the observations best i.i.d. likelihood of data 10/24/17 Yao-Liang Yu
Example: Bayesian Linear Regression independent of w independent of each other for different x Z t Y is a Gaussian process 10/24/17 Yao-Liang Yu
Maximum A Posteriori likelihood of data prior posterior normalization Different prior on w leads to different regularization w ~ N(0, I) is equivalent as ridge regression w ~ Lap(0, I) is equivalent as Lasso 10/24/17 Yao-Liang Yu
Outline Gaussian Distributions Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu
Abstract View Let be a Gaussian process Having observed Need to predict 10/24/17 Yao-Liang Yu
Familiar view Equivalent in finite dimensions Feature map of k Equivalent in finite dimensions Incorrect but “intuitive” in infinite dimensions 10/24/17 Yao-Liang Yu
Back to abstract is jointly Gaussian What is the distribution of ? 10/24/17 Yao-Liang Yu
Recap: Important facts marginal joint conditional 10/24/17 Yao-Liang Yu
Recap: Example 10/24/17 Yao-Liang Yu
Application t = (position, velocity, acceleration) Z = torque 10/24/17 Yao-Liang Yu
Outline Gaussian Distributions Gaussian Process Gaussian Linear Regression Advanced 10/24/17 Yao-Liang Yu
Gaussian process classification Binary label Y generated by Zt is a Gaussian process (real-valued) Given observations Need to predict integrate out latent variable 10/24/17 Yao-Liang Yu
Questions? 10/24/17 Yao-Liang Yu