Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Similar presentations


Presentation on theme: "Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer."— Presentation transcript:

1 Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer

2 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP – Regression GP – Regression with Squared Exponential Kernel CS 478 – INTRODUCTION 2

3 Regression CS 478 – INTRODUCTION 3

4 Paper Titles Gaussian process classification for segmenting and annotating sequences Discovering hidden features with Gaussian processes regression Active learning with Gaussian processes for object categorization Semi-supervised learning via Gaussian processes Discriminative Gaussian process latent variable model for classification Computation with infinite neural networks Fast sparse Gaussian process methods: The informative vector machine Approximate Dynamic Programming with Gaussian Processes CS 478 – INTRODUCTION 4

5 Paper Titles Learning to control an octopus arm with Gaussian process temporal difference measure Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp Gaussian process latent variable models for visualization of high dimensional data Bayesian unsupervised signal classification by Dirichlet process mixtures of Gaussian processes CS 478 – INTRODUCTION 5

6 Bayesian ML CS 478 – INTRODUCTION 6

7 Running Example CS 478 – INTRODUCTION 7

8 Prior GENERALEXAMPLE CS 478 – INTRODUCTION 8

9 Data Likelihood GENERALEXAMPLE CS 478 – INTRODUCTION 9

10 Posterior GENERALEXAMPLE CS 478 – INTRODUCTION 10

11 Posterior Predictive GENERALEXAMPLE CS 478 – INTRODUCTION 11

12 Other Regression Methods CS 478 – INTRODUCTION 12

13 Other Regression Methods CS 478 – INTRODUCTION 13

14 Other Regression Methods CS 478 – INTRODUCTION 14

15 Gaussian Process GP = Stochastic Process where Random Variables are Gaussian Distributed CS 478 – INTRODUCTION 15 x = 1 Distribution of f(1)

16 Gaussian Process - MVN CS 478 – INTRODUCTION 16

17 Gaussian Process - Background First theorized in 1940s Used in geostatistic and meterology in 1970s for time series prediction Under the name kriging CS 478 – INTRODUCTION 17

18 Smooth Functions CS 478 – INTRODUCTION 18

19 Dist. Over Functions CS 478 – INTRODUCTION 19

20 Covariance Function CS 478 – INTRODUCTION 20

21 Covariance Functions CS 478 – INTRODUCTION 21

22 Covariance Functions CS 778 – GAUSSIAN PROCESS 22

23 CS 478 – INTRODUCTION 23 Hyperparameters Length-Scale

24 Inference/Prediction The joint distribution of training targets and test target is MVN We condition the MVN on the observed training targets Resulting conditional distribution is Gaussian Use distribution mean/covariance for prediction CS 478 – INTRODUCTION 24

25 Magic Math From Wikipedia CS 478 – INTRODUCTION 25

26 Discussion PROS Learned function is smooth but not constrained by form Confidence intervals Good prior for other models E.g. Bias N-degree polynomials to be smooth Can do multiple test points that co-vary appropriately CONS Math/Optimization is messy Use lots of approximations NxN Matrix inversion Numerical Stability issues Scalability in # instances issues like SVMs High dimensional data is challenging CS 478 – INTRODUCTION 26

27 Implementation CS 478 – INTRODUCTION 27

28 Example CS 478 – INTRODUCTION 28 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

29 Example – Covariance Matrix CS 478 – INTRODUCTION 29 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

30 Example – Matrix Inversion or CS 478 – INTRODUCTION 30 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Next step, invert K

31 Example – Multiply by Targets CS 478 – INTRODUCTION 31 X f(X) 0 -5 1 0 2 ? 3 5 4 ?

32 Example – Predicting f(2) CS 478 – INTRODUCTION 32 X f(X) 0 -5 1 0 2 ? 3 5 4 ? Now need k* for f(2) k** = 1 (no obs noise)

33 CS 478 – INTRODUCTION 33 Example – Predicting f(2)

34 CS 478 – INTRODUCTION 34 Example – Predicting f(4)

35 CS 778 – GAUSSIAN PROCESS 35 Example - Plot X f(X) 0 -5 1 0 2 2.8 3 5 4 2.3

36 My Experiments CS 778 – GAUSSIAN PROCESS 36

37 CS 778 – GAUSSIAN PROCESS 37 Results - Linear ModelMSE GP0.009 Lasso0.019 KNN0.058

38 CS 778 – GAUSSIAN PROCESS 38 Results - Cubic ModelMSE GP0.046 Lasso0.037 KNN0.101

39 CS 778 – GAUSSIAN PROCESS 39 Results - Periodic ModelMSE GP0.087 Lasso1.444 KNN0.350

40 CS 778 – GAUSSIAN PROCESS 40 Results - Sigmoid ModelMSE GP0.034 Lasso0.045 KNN0.060

41 CS 778 – GAUSSIAN PROCESS 41 Results - Complex ModelMSE GP0.194 Lasso0.574 KNN0.327

42 Real Dataset Concrete Slump Test How fast does concrete flow given its materials? 103 instances; 7 input features; 3 outputs; all real values Input Features are kg per cubic meter of concrete Cement, Slag, Fly Ash, Water, SP, Coarse Agg, Fine Agg Output Variables are Slump, Flow, 28-day Compressive Strength Ran 10-fold CV with our three models on each output variable CS 778 – GAUSSIAN PROCESS 42

43 Concrete Results CS 778 – GAUSSIAN PROCESS 43 Slump ModelAvg MSE +- Std GP0.103 +- 0.037 Lasso0.072 +- 0.024 KNN0.067 +- 0.040 Flow ModelAvg MSE +- Std GP0.091 +- 0.026 Lasso0.051 +- 0.020 KNN0.063 +- 0.036 Strength ModelAvg MSE +- Std GP0.030 +- 0.015 Lasso0.004 +- 0.002 KNN0.008 +- 0.005

44 Learning Hyperparameters Find hyperparameters to maximize (log) probability of data Can use Gradient Descent by computing for is just the elementwise derivative of the covariance matrix which is the derivative of the covariance function CS 778 – GAUSSIAN PROCESS 44

45 Learning Hyperparameters For squared exponential covariance function in this form We get the following derivatives CS 778 – GAUSSIAN PROCESS 45

46 Covariance Function Tradeoffs CS 778 – GAUSSIAN PROCESS 46

47 Classification Many regression methods become binary classification by passing output through a sigmoid, Softmax, or other threshold MLP, SVM, Logistic Regression Same with GP, but scarier math CS 778 – GAUSSIAN PROCESS 47

48 CS 778 – GAUSSIAN PROCESS 48 Classification Math Vote for class 1 Weight for vote where GP priorignored Intractable! GP Regression

49 Bells and Whistles Multiple Regression – Assume correlation between multiple outputs Correlated noise – especially useful for time Training set values + derivatives (max or min occurs at…) Noise in the x-direction Mixture (ensemble) of local GP experts Integral approximation with GP Covariance over structured inputs (strings, trees, etc) CS 778 – GAUSSIAN PROCESS 49

50 Bibliography Wikipedia: Gaussian Process, Multi-variate Normal Scikit-Learn Documentation www.gaussianprocess.org C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN 026218253X. c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml www.GaussianProcess.org/gpml S. Marsland, Machine Learning An Algorithmic Perspective 2 nd Edition, CRC Press, 2015 CS 778 – GAUSSIAN PROCESS 50


Download ppt "Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer."

Similar presentations


Ads by Google