SAMSI March 2007 GASP Models and Bayesian Regression David M. Steinberg Dizza Bursztyn Tel Aviv University Ashkelon College.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Pattern Recognition and Machine Learning
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
Dimension reduction (1)
Ch11 Curve Fitting Dr. Deshi Ye
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
The General Linear Model. The Simple Linear Model Linear Regression.
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Visual Recognition Tutorial
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Multiple regression analysis
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Classification and risk prediction
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Chapter 11 Multiple Regression.
Linear and generalised linear models
Linear and generalised linear models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Lecture II-2: Probability Review
Separate multivariate observations
Review of Lecture Two Linear Regression Normal Equation
Objectives of Multiple Regression
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Applications The General Linear Model. Transformations.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Sampling and estimation Petter Mostad
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Estimating standard error using bootstrap
Data Modeling Patrice Koehl Department of Biological Sciences
The simple linear regression model and parameter estimation
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
CHAPTER 29: Multiple Regression*
10701 / Machine Learning Today: - Cross validation,
OVERVIEW OF LINEAR MODELS
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Applied Statistics and Probability for Engineers
Presentation transcript:

SAMSI March 2007 GASP Models and Bayesian Regression David M. Steinberg Dizza Bursztyn Tel Aviv University Ashkelon College

SAMSI March GASP: The Random Field Regression Model 2.The RFR Model and Bayesian Regression 3.RFR to Bayes – What is the Model? 4.Example 1: A Simple One-Factor Model 5.Example 2: Nuclear Waste Repository 6.From Bayes to RFR 7.Conclusions PREVIEW

SAMSI March 2007 GASP: The Random Field Regression Model What kind of model should be used for data from computer experiments? We need to consider: Attention to bias, not to variance. Nonlinear effects and interactions. High-dimensional inputs. The RFR model is one possible solution.

SAMSI March 2007 The Random Field Regression Model Also known as Kriging model – from roots in geostatistics GASP – for Gaussian stochastic process

SAMSI March 2007 The Random Field Regression Model Let y denote a response and x a vector of factor settings or covariates. Treat y as the realization of a random field with a fixed regression component: y(x) =  0 +  j f j (x) +  (x) The regression part is often limited to just the constant term.

SAMSI March 2007 The Random Field Regression Model The random field  (x) is used to represent the departure of the true response function from the regression model. Typical assumptions: E{  (x)} = 0. E{  (x 1 )  (x 2 )} = C(X 1,X 2 ) =  2 R(X 1,X 2 )

SAMSI March 2007 The Random Field Regression Model We can estimate the response at a new input site using the Best Linear Unbiased Predictor. The estimator is also the posterior mean if we assume that all random terms have normal distributions. The estimator is much more flexible than the standard regression model. It smoothly interpolates the output data.

SAMSI March 2007 The Random Field Regression Model Typically the correlation function R includes parameters that can be estimated by maximum likelihood or by cross-validation. One popular recommendation: R(x 1,x 2 ) =  exp{ -  j | x 1,j – x 2,j | p(j) }

SAMSI March 2007 The Random Field Regression Model An example from Welch et al., Technometrics, 1992.

SAMSI March 2007 The Random Field Regression Model The RFR model has been found to generate good predictors in applications. But it … is difficult to interpret. does not relate to “classical” models. is not clear “what it does to the data”.

SAMSI March 2007 RFR Model and Bayesian Regression We will show that the RFR model can be understood as a Bayesian regression model. Suppose we want to represent the response y using a regression model: y(x) =  0 +  j f j (x)

SAMSI March 2007 RFR Model and Bayesian Regression Take a Bayesian view and assign priors to the coefficients. Assign a vague prior to the constant. Assume that the remaining terms are independent, with  j ~ N(0,  j 2 ).

SAMSI March 2007 RFR Model and Bayesian Regression We now have y(x) =  0 +  j f j (x) =  0 +  (x)

SAMSI March 2007 RFR Model and Bayesian Regression The term  (x) is a random field whose distribution is induced by the prior assumptions on the regression coefficients. E{  (x)} = 0. E{  (x 1 )  (x 2 )} = C(X 1,X 2 ) =   j 2 f j (X 1 )f j (X 2 )

SAMSI March 2007 RFR Model and Bayesian Regression The RFR model is equivalent to a Bayesian regression model. The number of regression functions can be as large as we desire, even a full series expansion.

SAMSI March 2007 RFR Model and Bayesian Regression The importance of each regression function in the Bayesian model is reflected by the prior variance, with important terms assigned large variances. A regression component in the RFR model corresponds to assigning diffuse priors to the appropriate coefficients (i.e. giving them “infinite” prior variances). Then leave those terms out of the random field component.

SAMSI March 2007 RFR to Bayes– What is the Model? Suppose we fit an RFR model to data from a computer experiment. Can we find an associated Bayesian regression model? Finding the Bayes model may be helpful in understanding the RFR model.

SAMSI March 2007 RFR to Bayes– What is the Model? Some simple data analysis provides an answer. Our algorithm: Compute the correlation matrix R(X i,X j ) at all pairs of design points. Compute the eigenvalues and eigenvectors of the correlation matrix. For the leading eigenvalues, find out how the associated eigenvectors are related to the design factors.

SAMSI March 2007 Example 1: A One-factor Design Consider a computer experiment with just one factor. The design includes 50 points spread uniformly on the interval [-1,1]. The correlation function is estimated from the power exponential family: R(X 1,X 2 ) = exp{ |X 1 – X 2 | 2 }.

SAMSI March 2007 Example 1: A One-factor Design The eight leading eigenvalues of the correlation matrix:

SAMSI March 2007 Example 1: A One-factor Design The first eigenvector, plotted against the input factor from our design:

SAMSI March 2007 Example 1: A One-factor Design The second eigenvector:

SAMSI March 2007 Example 1: A One-factor Design The third eigenvector:

SAMSI March 2007 Example 1: A One-factor Design The fourth eigenvector:

SAMSI March 2007 RFR to Bayes– What is the Model? Why does the algorithm work? Let Y denote the output vector. The Bayesian regression model says that Y =   1 f 1 + … +  T f T. where f 1,f 2,…,f T are the columns in the regression matrix. Then Y ~ N(  0 1,C), where C=   j 2 f j f’ j.

SAMSI March 2007 RFR to Bayes– What is the Model? The algorithm merely reverses the logic. Given the correlation matrix, it identifies regression vectors and prior variances. The regression vectors depend on intrinsic properties of the correlation function and on the experimental design. For example, if the design “confounds” two effects, we might get a regression vector that is explained by either of the two or by a linear combination of them.

SAMSI March 2007 Example 2: Nuclear Waste Repository We included 26 input factors. The design is a 900 point Latin Hypercube, generated automatically by RESRAD. Several pairs of factors should be equal to one another. RESRAD allowed us to enforce a 0.99 rank correlation between such pairs. Other pairs should be similar and we used a 0.3 rank correlation for them.

SAMSI March 2007 Example 2: Nuclear Waste Repository The response is the maximal equivalent annual dose of radiation in the drinking water (in millirem) during a 10,000 year time window. IAEC standards stipulate that this dose should be at most 30 millirem. The goal is to identify factors that affect the outcome and should be subject to further study at a proposed repository site.

SAMSI March 2007 Example 2: Nuclear Waste Repository The output data show no leaching at all for more than 75% of the input vectors. When leaching does occur, the maximal annual dose has a highly skewed distribution:

SAMSI March 2007 Example 2: Nuclear Waste Repository We fitted an RFR model to the log of the maximal annual dose, using only those input vectors with an outcome of at least 0.1 (n=163). We selected the 8 strongest input factors as predictors. Most of these factors are also related to the presence/absence of leaching, so the design for the RFR model is no longer uniform in the input space.

SAMSI March 2007 Example 2: Nuclear Waste Repository Two of the strongest factors related to presence or absence of leaching.

SAMSI March 2007 Example 2: Nuclear Waste Repository A RFR model was fitted using 8 strong factors with the PErK software of Brian Williams. The power exponential correlation function was used.

SAMSI March 2007 Example 2: Nuclear Waste Repository The fitted model: ExponentThetaFactor Kd U238 Unsaturated Thickness Kd U238 Saturated Effective Porosity Eff. Porosity Saturated Hydraulic Conductivity Precipitation Kd T230 Contaminated

SAMSI March 2007 Example 2: Nuclear Waste Repository The eigenvalues:

SAMSI March 2007 Example 2: Nuclear Waste Repository The leading eigenvector versus Thickness:

SAMSI March 2007 Example 2: Nuclear Waste Repository The next 5 eigenvectors are almost linear functions of the 5 input factors with the largest scale parameters. Dominant factors in red. R 2 (%)FactorsE-vector 97.71,2,31,2, ,2,3, ,2,3, ,3,6, ,2,6,71,2,6,76

SAMSI March 2007 Example 2: Nuclear Waste Repository Adding a few nonlinear effects increases the R 2 values to above 95%. The first vector has small quadratic effects of the first 3 factors. The 6 th vector has clear nonlinear effects of factor 7 (Precipitation – low exponent in model).

SAMSI March 2007 Example 2: Nuclear Waste Repository The 7th eigenvector is not a linear function of the input factors. Adding second-order effects shows a strong quadratic effect of Precipitation.

SAMSI March 2007 Example 2: Nuclear Waste Repository Plot of the vector against Precipitation:

SAMSI March 2007 Example 2: Nuclear Waste Repository Regressing the e-vector against a “tent function with a plateau” in Precipitation gives an R 2 of 89.9%. The remaining scatter is most closely related to a linear effect in factor 5 (Effective Porosity in the Saturated Zone) and a quadratic effect in factor 3 (Kd for U238 in the Saturated Zone).

SAMSI March 2007 Example 2: Nuclear Waste Repository The 8th eigenvector is not a linear function of the input factors. It can be largely explained by a linear term in Effective Porosity (Saturated Zone), a quadratic dependence on the Kd for U238 (Saturated Zone), the interaction of the last factor with Thickness, and nonlinear terms in Precipitation.

SAMSI March 2007 Example 2: Nuclear Waste Repository Plot vs EP (SZ). Residuals vs. Precipitation The outlier is in a “corner” of the Thickness by Kd projection.

SAMSI March 2007 Example 2: Nuclear Waste Repository We can also apply the idea “in reverse”. Suppose there is a linear effect in one of the input factors. Is the effect a part of the RFR model?

SAMSI March 2007 Example 2: Nuclear Waste Repository Results from regressing linear effects on the 12 leading eigenvectors. R 2 (%)E-vectorsFactor

SAMSI March 2007 Example 2: Nuclear Waste Repository Results from regressing pure cubic effects on the 12 leading eigenvectors. R 2 (%)E-vectorsFactor

SAMSI March 2007 From Bayes to RFR The ideas here can also be used to derive covariance functions for RFR models. Write down a Bayesian regression model. Compute the resulting covariance function.

SAMSI March 2007 From Bayes to RFR Example 1: Hermite polynomials. Decay to 0 away from the origin. Priors on the coefficients that shrink exponentially. Result is the power exponential family with all exponents equal to 2.

SAMSI March 2007 From Bayes to RFR Example 2: Fourier series. Priors on the coefficients that shrink polynomially. Result is family of spline covariances.

SAMSI March 2007 Some Special Models 1.Gaussian correlation and Hermite polynomials Consider a single univariate term in this product.

SAMSI March 2007 The scaled Hermite polynomials are orthonormal with respect to the N(0,1) density. Define Assume

SAMSI March 2007 Then

SAMSI March 2007 Plot of J 1 for w=0.35.

SAMSI March 2007 Plot of J 2 for w=0.35.

SAMSI March Trigonometric regression and splines. Assume x is in [0,1] and The first sum has polynomials and all coefficients except the last one have vague priors. The remaining terms have priors with mean 0 and

SAMSI March 2007 The contribution of the trigonometric terms to the covariance function is Here B 2m is a Bernoulli polynomial and the right-hand side is a spline in one argument for fixed values of the second argument.

SAMSI March 2007 The estimator produced by this model is exactly the interpolating spline of degree 2m-1 that minimizes the squared m’th integral of the estimate while at the same time interpolating the data.

SAMSI March 2007 Conclusions RFR models offer great flexibility for modeling data from computer experiments. RFR models have an equivalent interpretation as Bayesian regression models. The Bayesian regression framework can be helpful for understanding how the RFR model is modeling the data. Some straightforward data analysis can uncover a Bayesian model associated with the RFR.