- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where is covariance matrix with size d, includes correlation betw i & j. –Compare this with single parameter (univariate) distribution Corresponds to . –Compare this with multi-parameter but independent distribution. Corresponds to = diag( 1 2, 2 2, …, d 2 ). No correlation with each other. –Multivariate normal distribution with two parameters ( : correlation coeff)
- 2 - Preliminaries Multivariate normal model example –Two random variables x=[x1, x2] following multivariate normal distribution with zero means, unit variance and correlation . Simulate x by drawing samples with N=1000. When there is no correlation, i.e., =0, it is the same as sampling x1 and x2 independently, and just pair them. As the correlation is stronger, we get the following trend. If x1 is high, x2 is also likely to be high. Independent or =0.01 =0.5 =0.9 =1.0
- 3 - Correlated regression Departures from ordinary linear regression –Previously, there were a number of assumptions. Linearity of response w.r.t. the explanatory variables Normality of the error terms Independent (uncorrelated) observations with equal variance –Non of which is true in practice. Correlations –Consider error between data and regression. Sometimes, error can’t be -1 at a point, +1 at the next. The change should be gradual. In this sense, errors at two adjacent points can be correlated. –If we ignore this, the posterior inference may lead to wrong result. Furthermore, prediction for future will also be wrong.
- 4 - Classical regression review Important equations –Functional form of the regression –Regression coefficients –Standard error –Coefficient of determination –Variance of coefficients –Variance of regression –Variance of prediction
- 5 - Classical regression review: Bayesian version Analytical procedure –Joint posterior pdf of 2 –Factorization –Marginal pdf of 2 –Conditional pdf of –Posterior prediction version 1 Posterior prediction version 2
- 6 - Correlated regression Likelihood of correlated regression model –Consider n number of observed data y at x, where the errors are correlated. Then the likelihood of y is given by where is covariance matrix with the size n. –Compare this with classical regression where I, i.e., diagonal matrix with variance.
- 7 - Bayesian correlated regression Covariance matrix –We often consider variance matrix in the following form, where Q represents correlation structure, while 2 accounts for global variance. In the matrix Q, d is a factor that controls degree of correlation. –Let’s see the behavior of function Q = exp{ -(x/d) 2 } –Let’s see the behavior of correlation matrix Q (11x11) over x=0:0.1:1. If d is small, function is narrow, correlation is small, data are independent. If d is large, function is wide, correlation is strong. d=0.01 d=0.2 d=5
- 8 - Bayesian correlated regression Correlated y example –y ~ multivariate normal dist. with zero mean & covariance Q over x=(0,1). Simulate y(x) over x = 0:0.1:1 with N=4, which follows multivariate normal distribution. When there is no correlation, y(x) at adjacent x is independent each other. As the correlation is stronger, we get the following trend. Independent or d=0.01 d=0.2 d=0.5d=5 d=100
- 9 - Bayesian correlated regression
Bayesian correlated regression
Bayesian correlated regression Sampling of posterior distribution by factorization –Data –Samples of 3 parameters ( 1, 2 & 2 ) are drawn with N=5000 using the factorization approach. y=[0.95, 1.08, 1.28, 1.23, 1.42, 1.45]'; x=[ ]'; posterior distribution of b1 & b2posterior distribution of s2
Bayesian correlated regression
Bayesian correlated regression d=0.01 d=0.1 d=0.5Classical regression
Discussions on the posterior prediction Interpolation: –At every observation point, any realization of predicted y always interpolates the data y. The variance at the data point also vanishes. –The reason is explained in the following.
Discussions on the posterior prediction Choice of correlation factor –The distribution shape largely depends on the value of d. –For small d, which is small correlation, y(x) at adjacent x independent each other, which leads to the result closer to classical regression. –For large d, strong correlation between adjacent points leads to the unique knotted smoother shape like the figure, because of still passing thru the data. d=0.01 d=0.1 d=0.5
Discussions on the posterior prediction Choice of correlation factor –In the Bayesian inference, this d can be added as the unknowns as well. From the joint posterior pdf, not only the samples of 2 but also d are obtained. –In this case, we have no other way but to use MCMC. d=0.01 d=0.1 d=0.5
Practice example Sampling by MCMC –This will be implemented and discussed depending on the availability. –Only the results are suggested here. d unknown d=0.1 d=0.2
Kriging surrogate model
Kriging surrogate model