Download presentation
Presentation is loading. Please wait.
Published byVirgil Webb Modified over 8 years ago
1
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer
2
What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP – Regression GP – Regression with Squared Exponential Kernel CS 478 – INTRODUCTION 2
3
Regression CS 478 – INTRODUCTION 3
4
Paper Titles Gaussian process classification for segmenting and annotating sequences Discovering hidden features with Gaussian processes regression Active learning with Gaussian processes for object categorization Semi-supervised learning via Gaussian processes Discriminative Gaussian process latent variable model for classification Computation with infinite neural networks Fast sparse Gaussian process methods: The informative vector machine Approximate Dynamic Programming with Gaussian Processes CS 478 – INTRODUCTION 4
5
Paper Titles Learning to control an octopus arm with Gaussian process temporal difference measure Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp Gaussian process latent variable models for visualization of high dimensional data Bayesian unsupervised signal classification by Dirichlet process mixtures of Gaussian processes CS 478 – INTRODUCTION 5
6
Bayesian ML CS 478 – INTRODUCTION 6
7
Running Example CS 478 – INTRODUCTION 7
8
Prior GENERALEXAMPLE CS 478 – INTRODUCTION 8
9
Data Likelihood GENERALEXAMPLE CS 478 – INTRODUCTION 9
10
Posterior GENERALEXAMPLE CS 478 – INTRODUCTION 10
11
Posterior Predictive GENERALEXAMPLE CS 478 – INTRODUCTION 11
12
Other Regression Methods CS 478 – INTRODUCTION 12
13
Other Regression Methods CS 478 – INTRODUCTION 13
14
Other Regression Methods CS 478 – INTRODUCTION 14
15
Gaussian Process GP = Stochastic Process where Random Variables are Gaussian Distributed CS 478 – INTRODUCTION 15 x = 1 Distribution of f(1)
16
Gaussian Process - MVN CS 478 – INTRODUCTION 16
17
Gaussian Process - Background First theorized in 1940s Used in geostatistic and meterology in 1970s for time series prediction Under the name kriging CS 478 – INTRODUCTION 17
18
Smooth Functions CS 478 – INTRODUCTION 18
19
Dist. Over Functions CS 478 – INTRODUCTION 19
20
Covariance Function CS 478 – INTRODUCTION 20
21
Covariance Functions CS 478 – INTRODUCTION 21
22
Covariance Functions CS 778 – GAUSSIAN PROCESS 22
23
CS 478 – INTRODUCTION 23 Hyperparameters Length-Scale
24
Inference/Prediction The joint distribution of training targets and test target is MVN We condition the MVN on the observed training targets Resulting conditional distribution is Gaussian Use distribution mean/covariance for prediction CS 478 – INTRODUCTION 24
25
Magic Math From Wikipedia CS 478 – INTRODUCTION 25
26
Discussion PROS Learned function is smooth but not constrained by form Confidence intervals Good prior for other models E.g. Bias N-degree polynomials to be smooth Can do multiple test points that co-vary appropriately CONS Math/Optimization is messy Use lots of approximations NxN Matrix inversion Numerical Stability issues Scalability in # instances issues like SVMs High dimensional data is challenging CS 478 – INTRODUCTION 26
27
Implementation CS 478 – INTRODUCTION 27
28
Example CS 478 – INTRODUCTION 28 X f(X) 0 -5 1 0 2 ? 3 5 4 ?
29
Example – Covariance Matrix CS 478 – INTRODUCTION 29 X f(X) 0 -5 1 0 2 ? 3 5 4 ?
30
Example – Matrix Inversion or CS 478 – INTRODUCTION 30 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Next step, invert K
31
Example – Multiply by Targets CS 478 – INTRODUCTION 31 X f(X) 0 -5 1 0 2 ? 3 5 4 ?
32
Example – Predicting f(2) CS 478 – INTRODUCTION 32 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Now need k* for f(2) k** = 1 (no obs noise)
33
CS 478 – INTRODUCTION 33 Example – Predicting f(2)
34
CS 478 – INTRODUCTION 34 Example – Predicting f(4)
35
CS 778 – GAUSSIAN PROCESS 35 Example - Plot X f(X) 0 -5 1 0 2 2.8 3 5 4 2.3
36
My Experiments CS 778 – GAUSSIAN PROCESS 36
37
CS 778 – GAUSSIAN PROCESS 37 Results - Linear ModelMSE GP0.009 Lasso0.019 KNN0.058
38
CS 778 – GAUSSIAN PROCESS 38 Results - Cubic ModelMSE GP0.046 Lasso0.037 KNN0.101
39
CS 778 – GAUSSIAN PROCESS 39 Results - Periodic ModelMSE GP0.087 Lasso1.444 KNN0.350
40
CS 778 – GAUSSIAN PROCESS 40 Results - Sigmoid ModelMSE GP0.034 Lasso0.045 KNN0.060
41
CS 778 – GAUSSIAN PROCESS 41 Results - Complex ModelMSE GP0.194 Lasso0.574 KNN0.327
42
Real Dataset Concrete Slump Test How fast does concrete flow given its materials? 103 instances; 7 input features; 3 outputs; all real values Input Features are kg per cubic meter of concrete Cement, Slag, Fly Ash, Water, SP, Coarse Agg, Fine Agg Output Variables are Slump, Flow, 28-day Compressive Strength Ran 10-fold CV with our three models on each output variable CS 778 – GAUSSIAN PROCESS 42
43
Concrete Results CS 778 – GAUSSIAN PROCESS 43 Slump ModelAvg MSE +- Std GP0.103 +- 0.037 Lasso0.072 +- 0.024 KNN0.067 +- 0.040 Flow ModelAvg MSE +- Std GP0.091 +- 0.026 Lasso0.051 +- 0.020 KNN0.063 +- 0.036 Strength ModelAvg MSE +- Std GP0.030 +- 0.015 Lasso0.004 +- 0.002 KNN0.008 +- 0.005
44
Learning Hyperparameters Find hyperparameters to maximize (log) probability of data Can use Gradient Descent by computing for is just the elementwise derivative of the covariance matrix which is the derivative of the covariance function CS 778 – GAUSSIAN PROCESS 44
45
Learning Hyperparameters For squared exponential covariance function in this form We get the following derivatives CS 778 – GAUSSIAN PROCESS 45
46
Covariance Function Tradeoffs CS 778 – GAUSSIAN PROCESS 46
47
Classification Many regression methods become binary classification by passing output through a sigmoid, Softmax, or other threshold MLP, SVM, Logistic Regression Same with GP, but scarier math CS 778 – GAUSSIAN PROCESS 47
48
CS 778 – GAUSSIAN PROCESS 48 Classification Math Vote for class 1 Weight for vote where GP priorignored Intractable! GP Regression
49
Bells and Whistles Multiple Regression – Assume correlation between multiple outputs Correlated noise – especially useful for time Training set values + derivatives (max or min occurs at…) Noise in the x-direction Mixture (ensemble) of local GP experts Integral approximation with GP Covariance over structured inputs (strings, trees, etc) CS 778 – GAUSSIAN PROCESS 49
50
Bibliography Wikipedia: Gaussian Process, Multi-variate Normal Scikit-Learn Documentation www.gaussianprocess.org C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN 026218253X. c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml www.GaussianProcess.org/gpml S. Marsland, Machine Learning An Algorithmic Perspective 2 nd Edition, CRC Press, 2015 CS 778 – GAUSSIAN PROCESS 50
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.