Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer
What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP – Regression GP – Regression with Squared Exponential Kernel CS 478 – INTRODUCTION 2
Regression CS 478 – INTRODUCTION 3
Paper Titles Gaussian process classification for segmenting and annotating sequences Discovering hidden features with Gaussian processes regression Active learning with Gaussian processes for object categorization Semi-supervised learning via Gaussian processes Discriminative Gaussian process latent variable model for classification Computation with infinite neural networks Fast sparse Gaussian process methods: The informative vector machine Approximate Dynamic Programming with Gaussian Processes CS 478 – INTRODUCTION 4
Paper Titles Learning to control an octopus arm with Gaussian process temporal difference measure Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp Gaussian process latent variable models for visualization of high dimensional data Bayesian unsupervised signal classification by Dirichlet process mixtures of Gaussian processes CS 478 – INTRODUCTION 5
Bayesian ML CS 478 – INTRODUCTION 6
Running Example CS 478 – INTRODUCTION 7
Prior GENERALEXAMPLE CS 478 – INTRODUCTION 8
Data Likelihood GENERALEXAMPLE CS 478 – INTRODUCTION 9
Posterior GENERALEXAMPLE CS 478 – INTRODUCTION 10
Posterior Predictive GENERALEXAMPLE CS 478 – INTRODUCTION 11
Other Regression Methods CS 478 – INTRODUCTION 12
Other Regression Methods CS 478 – INTRODUCTION 13
Other Regression Methods CS 478 – INTRODUCTION 14
Gaussian Process GP = Stochastic Process where Random Variables are Gaussian Distributed CS 478 – INTRODUCTION 15 x = 1 Distribution of f(1)
Gaussian Process - MVN CS 478 – INTRODUCTION 16
Gaussian Process - Background First theorized in 1940s Used in geostatistic and meterology in 1970s for time series prediction Under the name kriging CS 478 – INTRODUCTION 17
Smooth Functions CS 478 – INTRODUCTION 18
Dist. Over Functions CS 478 – INTRODUCTION 19
Covariance Function CS 478 – INTRODUCTION 20
Covariance Functions CS 478 – INTRODUCTION 21
Covariance Functions CS 778 – GAUSSIAN PROCESS 22
CS 478 – INTRODUCTION 23 Hyperparameters Length-Scale
Inference/Prediction The joint distribution of training targets and test target is MVN We condition the MVN on the observed training targets Resulting conditional distribution is Gaussian Use distribution mean/covariance for prediction CS 478 – INTRODUCTION 24
Magic Math From Wikipedia CS 478 – INTRODUCTION 25
Discussion PROS Learned function is smooth but not constrained by form Confidence intervals Good prior for other models E.g. Bias N-degree polynomials to be smooth Can do multiple test points that co-vary appropriately CONS Math/Optimization is messy Use lots of approximations NxN Matrix inversion Numerical Stability issues Scalability in # instances issues like SVMs High dimensional data is challenging CS 478 – INTRODUCTION 26
Implementation CS 478 – INTRODUCTION 27
Example CS 478 – INTRODUCTION 28 X f(X) ? ?
Example – Covariance Matrix CS 478 – INTRODUCTION 29 X f(X) ? ?
Example – Matrix Inversion or CS 478 – INTRODUCTION 30 X f(X) ? ? Next step, invert K
Example – Multiply by Targets CS 478 – INTRODUCTION 31 X f(X) ? ?
Example – Predicting f(2) CS 478 – INTRODUCTION 32 X f(X) ? ? Now need k* for f(2) k** = 1 (no obs noise)
CS 478 – INTRODUCTION 33 Example – Predicting f(2)
CS 478 – INTRODUCTION 34 Example – Predicting f(4)
CS 778 – GAUSSIAN PROCESS 35 Example - Plot X f(X)
My Experiments CS 778 – GAUSSIAN PROCESS 36
CS 778 – GAUSSIAN PROCESS 37 Results - Linear ModelMSE GP0.009 Lasso0.019 KNN0.058
CS 778 – GAUSSIAN PROCESS 38 Results - Cubic ModelMSE GP0.046 Lasso0.037 KNN0.101
CS 778 – GAUSSIAN PROCESS 39 Results - Periodic ModelMSE GP0.087 Lasso1.444 KNN0.350
CS 778 – GAUSSIAN PROCESS 40 Results - Sigmoid ModelMSE GP0.034 Lasso0.045 KNN0.060
CS 778 – GAUSSIAN PROCESS 41 Results - Complex ModelMSE GP0.194 Lasso0.574 KNN0.327
Real Dataset Concrete Slump Test How fast does concrete flow given its materials? 103 instances; 7 input features; 3 outputs; all real values Input Features are kg per cubic meter of concrete Cement, Slag, Fly Ash, Water, SP, Coarse Agg, Fine Agg Output Variables are Slump, Flow, 28-day Compressive Strength Ran 10-fold CV with our three models on each output variable CS 778 – GAUSSIAN PROCESS 42
Concrete Results CS 778 – GAUSSIAN PROCESS 43 Slump ModelAvg MSE +- Std GP Lasso KNN Flow ModelAvg MSE +- Std GP Lasso KNN Strength ModelAvg MSE +- Std GP Lasso KNN
Learning Hyperparameters Find hyperparameters to maximize (log) probability of data Can use Gradient Descent by computing for is just the elementwise derivative of the covariance matrix which is the derivative of the covariance function CS 778 – GAUSSIAN PROCESS 44
Learning Hyperparameters For squared exponential covariance function in this form We get the following derivatives CS 778 – GAUSSIAN PROCESS 45
Covariance Function Tradeoffs CS 778 – GAUSSIAN PROCESS 46
Classification Many regression methods become binary classification by passing output through a sigmoid, Softmax, or other threshold MLP, SVM, Logistic Regression Same with GP, but scarier math CS 778 – GAUSSIAN PROCESS 47
CS 778 – GAUSSIAN PROCESS 48 Classification Math Vote for class 1 Weight for vote where GP priorignored Intractable! GP Regression
Bells and Whistles Multiple Regression – Assume correlation between multiple outputs Correlated noise – especially useful for time Training set values + derivatives (max or min occurs at…) Noise in the x-direction Mixture (ensemble) of local GP experts Integral approximation with GP Covariance over structured inputs (strings, trees, etc) CS 778 – GAUSSIAN PROCESS 49
Bibliography Wikipedia: Gaussian Process, Multi-variate Normal Scikit-Learn Documentation C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN X. c 2006 Massachusetts Institute of Technology. S. Marsland, Machine Learning An Algorithmic Perspective 2 nd Edition, CRC Press, 2015 CS 778 – GAUSSIAN PROCESS 50