Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer

What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP – Regression GP – Regression with Squared Exponential Kernel CS 478 – INTRODUCTION 2

Regression CS 478 – INTRODUCTION 3

Paper Titles Gaussian process classification for segmenting and annotating sequences Discovering hidden features with Gaussian processes regression Active learning with Gaussian processes for object categorization Semi-supervised learning via Gaussian processes Discriminative Gaussian process latent variable model for classification Computation with infinite neural networks Fast sparse Gaussian process methods: The informative vector machine Approximate Dynamic Programming with Gaussian Processes CS 478 – INTRODUCTION 4

Paper Titles Learning to control an octopus arm with Gaussian process temporal difference measure Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp Gaussian process latent variable models for visualization of high dimensional data Bayesian unsupervised signal classification by Dirichlet process mixtures of Gaussian processes CS 478 – INTRODUCTION 5

Bayesian ML CS 478 – INTRODUCTION 6

Running Example CS 478 – INTRODUCTION 7

Prior GENERALEXAMPLE CS 478 – INTRODUCTION 8

Data Likelihood GENERALEXAMPLE CS 478 – INTRODUCTION 9

Posterior GENERALEXAMPLE CS 478 – INTRODUCTION 10

Posterior Predictive GENERALEXAMPLE CS 478 – INTRODUCTION 11

Other Regression Methods CS 478 – INTRODUCTION 12

Gaussian Process GP = Stochastic Process where Random Variables are Gaussian Distributed CS 478 – INTRODUCTION 15 x = 1 Distribution of f(1)

Gaussian Process - MVN CS 478 – INTRODUCTION 16

Gaussian Process - Background First theorized in 1940s Used in geostatistic and meterology in 1970s for time series prediction Under the name kriging CS 478 – INTRODUCTION 17

Smooth Functions CS 478 – INTRODUCTION 18

Dist. Over Functions CS 478 – INTRODUCTION 19

Covariance Function CS 478 – INTRODUCTION 20

Covariance Functions CS 478 – INTRODUCTION 21

Covariance Functions CS 778 – GAUSSIAN PROCESS 22

CS 478 – INTRODUCTION 23 Hyperparameters Length-Scale

Inference/Prediction The joint distribution of training targets and test target is MVN We condition the MVN on the observed training targets Resulting conditional distribution is Gaussian Use distribution mean/covariance for prediction CS 478 – INTRODUCTION 24

Magic Math From Wikipedia CS 478 – INTRODUCTION 25

Discussion PROS Learned function is smooth but not constrained by form Confidence intervals Good prior for other models E.g. Bias N-degree polynomials to be smooth Can do multiple test points that co-vary appropriately CONS Math/Optimization is messy Use lots of approximations NxN Matrix inversion Numerical Stability issues Scalability in # instances issues like SVMs High dimensional data is challenging CS 478 – INTRODUCTION 26

Implementation CS 478 – INTRODUCTION 27

Example CS 478 – INTRODUCTION 28 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

Example – Covariance Matrix CS 478 – INTRODUCTION 29 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

Example – Matrix Inversion or CS 478 – INTRODUCTION 30 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Next step, invert K

Example – Multiply by Targets CS 478 – INTRODUCTION 31 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

Example – Predicting f(2) CS 478 – INTRODUCTION 32 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Now need k* for f(2) k** = 1 (no obs noise)

CS 478 – INTRODUCTION 33 Example – Predicting f(2)

CS 478 – INTRODUCTION 34 Example – Predicting f(4)

CS 778 – GAUSSIAN PROCESS 35 Example - Plot X f(X) 0 -5 1 0 2 2.8 3 5 4 2.3

My Experiments CS 778 – GAUSSIAN PROCESS 36

CS 778 – GAUSSIAN PROCESS 37 Results - Linear ModelMSE GP0.009 Lasso0.019 KNN0.058

CS 778 – GAUSSIAN PROCESS 38 Results - Cubic ModelMSE GP0.046 Lasso0.037 KNN0.101

CS 778 – GAUSSIAN PROCESS 39 Results - Periodic ModelMSE GP0.087 Lasso1.444 KNN0.350

CS 778 – GAUSSIAN PROCESS 40 Results - Sigmoid ModelMSE GP0.034 Lasso0.045 KNN0.060

CS 778 – GAUSSIAN PROCESS 41 Results - Complex ModelMSE GP0.194 Lasso0.574 KNN0.327

Real Dataset Concrete Slump Test How fast does concrete flow given its materials? 103 instances; 7 input features; 3 outputs; all real values Input Features are kg per cubic meter of concrete Cement, Slag, Fly Ash, Water, SP, Coarse Agg, Fine Agg Output Variables are Slump, Flow, 28-day Compressive Strength Ran 10-fold CV with our three models on each output variable CS 778 – GAUSSIAN PROCESS 42

Concrete Results CS 778 – GAUSSIAN PROCESS 43 Slump ModelAvg MSE +- Std GP0.103 +- 0.037 Lasso0.072 +- 0.024 KNN0.067 +- 0.040 Flow ModelAvg MSE +- Std GP0.091 +- 0.026 Lasso0.051 +- 0.020 KNN0.063 +- 0.036 Strength ModelAvg MSE +- Std GP0.030 +- 0.015 Lasso0.004 +- 0.002 KNN0.008 +- 0.005

Learning Hyperparameters Find hyperparameters to maximize (log) probability of data Can use Gradient Descent by computing for is just the elementwise derivative of the covariance matrix which is the derivative of the covariance function CS 778 – GAUSSIAN PROCESS 44

Learning Hyperparameters For squared exponential covariance function in this form We get the following derivatives CS 778 – GAUSSIAN PROCESS 45

Covariance Function Tradeoffs CS 778 – GAUSSIAN PROCESS 46

Classification Many regression methods become binary classification by passing output through a sigmoid, Softmax, or other threshold MLP, SVM, Logistic Regression Same with GP, but scarier math CS 778 – GAUSSIAN PROCESS 47

CS 778 – GAUSSIAN PROCESS 48 Classification Math Vote for class 1 Weight for vote where GP priorignored Intractable! GP Regression

Bells and Whistles Multiple Regression – Assume correlation between multiple outputs Correlated noise – especially useful for time Training set values + derivatives (max or min occurs at…) Noise in the x-direction Mixture (ensemble) of local GP experts Integral approximation with GP Covariance over structured inputs (strings, trees, etc) CS 778 – GAUSSIAN PROCESS 49

Bibliography Wikipedia: Gaussian Process, Multi-variate Normal Scikit-Learn Documentation www.gaussianprocess.org C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN 026218253X. c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml www.GaussianProcess.org/gpml S. Marsland, Machine Learning An Algorithmic Perspective 2 nd Edition, CRC Press, 2015 CS 778 – GAUSSIAN PROCESS 50

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Similar presentations

Presentation on theme: "Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Similar presentations

Presentation on theme: "Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer."— Presentation transcript:

Similar presentations

About project

Feedback