Download presentation
Presentation is loading. Please wait.
1
Multivariate Linear Regression
BMTRY 726 7/10/2018
2
Linear Regression Analysis
We are interested in predicting values of one or more responses from a set of predictors Regression analysis is an extension of what we discussed with ANOVA and MANOVA Allows for inclusion of continuous predictors in place of (or in addition to) treatment indicators in MANOVA
3
Why Use Such Models There are many reasons to consider regression approaches Models are simple and interpretable Linear models can outperform non-linear methods when: There are a limited number of training observations Low signal to noise ratio Such models can be have can be made more flexible (i.e. non-linear) by applying transformations to the data Fro example use of polynomials
4
f(x) for Linear Regression
Given our features, x, the regression function takes the following form: Recall we then want to identify the estimate of f(x) that minimizes the prediction error for output We can define a loss function L(Y, f(x)) The most common choice of loss function for regression is L2 = squared error loss
5
Notation & Data Consider a set of j = 1, 2,…, p variables (or features) collected in a study and an outcome y And i = 1, 2,…, n is the number of samples
6
Univariate Regression Analysis
Univariate regression models a single response y as a mean dependent on a set of independent predictors zi and random error ei Model assumptions:
7
Least Squares Estimation
Based on this loss function, we can develop an estimate of f(x) by finding value that minimizes the loss
8
Least Squares Estimation
This approach is referred to as the method of least square to estimate or model parameters
9
Geometry of Least Squares
10
Least Squares Estimation
Based on this loss function, we can develop an estimate of f(x) by finding value that minimizes the loss
11
Least Squares Estimation
We estimate the variance using the residuals
12
Least Squares Estimation
13
LRT for individual bi’s
First we may test if any predictors effect the response: The LRT is based on the difference in sums of square between the full and null models…
14
LRT for individual bi’s
Difference in SS between the full and null models…
16
Model Building If we have a large number of predictors, we want to identify the “best” subset There are many methods of selecting the “best” -Examine all possible subsets of predictors -Forward stepwise selection -Backwards stepwise selection -Shrinkage approaches
17
Model Building Though we can consider predictors that are significant, this may not yield the “best” subset (some models may yield similar results) The “best” choice is made by examining some criterion -R2 -adjusted R2 -Mallow’s Cp -AIC Since R2 increases as predictors are added, Mallow’s Cp and AIC are better choices for selecting the “best” predictor subset
18
Model Checking Always good to check if the model is “correct” before using it to make decisions… Information about fit is contained in the residuals If the model fits well the estimated error terms should mimic N(0, s 2). So how can we check?
19
Model Checking Studentized residuals plot
Plot residuals versus predicted values -Ideally points should be scattered (i.e. no pattern) -If a pattern exists, can show something about the problem
20
Model Checking Plot residuals versus predictors
QQ plot of studentized residuals plot
21
Model Checking While residuals analysis is useful, it may miss outliers- i.e. observations that are very influential on predictions Leverage: -how far is the jth observation from the others? -How much pull does j exert on the fit Observations that affect inferences are influential
22
Collinearity If Z is not full rank, a linear combination aZ of columns in Z =0 In such a case the columns are co-linear in which case the inverse of Z’Z doesn’t exist It is rare that aZ == 0, but if a combination exists that is nearly zero, (Z’Z)-1 is numerically unstable Results in very large estimated variance of the model parameters making it difficult to identify significant regression coefficients
23
Collinearity We can check for severity of multicollinearity using the variance inflation factor (VIF)
24
Extending Univariate Regression
Consider a regression problem now where we have m outcomes for each of our n individuals and want to find the association with r predictors If we assume that each response follows its own regression model we get
25
In Classical Regression Terms
When we write the model this was we can see it is equivalent to classic linear regression
26
In Classical Regression Terms
If we consider the ith response we can estimate b What about variance?
27
Sum of Squares It follows from the univariate case that SSE is We want to minimize SSE, so our solution for the ith outcome is Our sum of squares decomposition for the model is
28
Example Let’s develop our regression equations for the following data
z1 1 2 3 4 y1 8 9 y2 -1
29
Example Let’s develop our regression equations for the following data
z1 1 2 3 4 y1 8 9 y2 -1
30
Example Let’s develop our regression equations for the following data
z1 1 2 3 4 y1 8 9 y2 -1
31
Example Use all this information to find our sum of squares…
32
Properties The same properties we had in univariate regression hold here
33
Properties The same properties we had in univariate regression hold here
34
Estimate of S
35
So far we haven’t made any assumptions about the distribution of e… what if normality holds
36
LRT What do we do with this information? Naturally we can develop LRT for our regression parameters
37
Other Hypotheses ? Consider specific hypotheses about the association between our predictors and our p outcomes in Y -Hypotheses about levels of a categorical predictor -Hypotheses about the magnitude of a predictors effect on multiple outcomes MV regression offers the opportunity to evaluate whether or not predictors have a similar impact on correlated outcomes
38
Inference We can make certain inferences about the elements of our parameter matrix Generalized LRT procedure (Wilks, 1932) Make comparison across groups (rows of b) Make comparisons across traits Matrix of zeros
39
Compute the corrected sums of squares and cross-products matrix for the model:
As we’ve noted previously, this reduces to: And when M = I, the diagonal elements of H are model sums of squares
40
Compute a matrix of residual sums of squares and cross-products:
When M = I: Diagonal elements of E are sums of the squared residuals Off-diagonal elements are sums of cross-products of residuals
41
Likelihood Ratio Test Reject the null hypothesis if Wilk’s criterion is too small Large sample chi-square approximation, reject H0 if
42
A more accurate approximation given by Rao
(C.R. Rao (1951) Bull Int Stat Inst. 33(2), ) When is true: where
43
Exact distribution of Wilk’s criterion:
(1) Independence (2) Homogeneity of covariance matrices (3) Multivariate normality When is true: When either (Table 6.3 in J & W)
44
Example: DNA Methylation
An investigator wants to know if exposure to environmental contaminants impacts DNA methylation in humans. Pilot study on 11 subjects Outcomes -%methylated DNA -% hydroxyl methylated DNA Serum levels of exposure -Multiple perfluoronated compounds (we will consider PFNA)
46
Model of DNA Methylation
What does the model look like?
47
Model of DNA Methylation
Consider the LRT to evaluate the coefficients for PFNA
48
Model of DNA Methylation
Alternative approach to evaluate the coefficients for PFNA?
49
Model of DNA Methylation
Alternative approach to evaluate the coefficients for PFNA?
50
Model of DNA Methylation
What if the PI wants to evaluate if the impact of PFNA on each outcome is the same? What is the hypothesis?
51
Model of DNA Methylation
Let’s test this hypothesis…
52
Model of Methylation on PFNA
Let’s test this hypothesis…
53
Model of Methylation on PFNA
What did we fail to consider in examining whether the impact of PFNA on each outcome is the same?
54
Model of Methylation on PFNA
How do our results/test change to address these issues?
55
Predictions from Multivariate Regression
Often we are interested in using regression models to make predictions for new data. We can do this with MV regression models…
56
Predictions from Multivariate Regression
We can use this information to construct 100(1-a)% confidence regions
57
Predictions from Multivariate Regression
We can also construct 100(1-a)% prediction regions
58
Predictions from Multivariate Regression
We can also construct 100(1-a)% confidence Intervals and prediction intervals
59
Concept of Linear Regression
Up until now, we have been focusing on fixed covariates Suppose instead that the response and all covariates are random with some joint distribution What if we want to predict Y using
60
Concept of Linear Regression
We select b0 and b to minimize the MSE MSE minimized over b0 and b when Y Y-b0-b’Z b0+b’Z Z
61
We select b0 and b to minimize the MSE MSE minimized over b0 and b when Where
62
So how do we use this? Useful if we want to use Z to interpret Y…
63
We’ve made no distributional assumptions so far If general form of f (Z) used to approximate y s.t. E(y-f (Z))min what will f (Z) be Special Case, Y and Z are jointly normal:
64
Example Find the MLE of the regression function for a single response
65
Example Find the best linear predictor, its mean square error, and the multiple correlation coefficient (assume n = 10)
66
Prediction of Several Variables
What if we are considering more than a single response? Consider responses Y1, Y2,…,Ym (and are MVN) It is easy to see that the regression equation takes the form
67
Prediction of Several Variables
The maximum likelihood estimators look very similar to the single response case…
68
Example Find the MLE of the regression function for two responses
69
Example Find the MLE of the regression function for two responses
70
Partial Correlation We may also be interested in determining the association between the Y’s after removing the effect of Z We can define a partial correlation between the Y’s removing the effect of Z as follows The corresponding sampling partial correlation coefficient is:
71
Testing Correlations May be interested in determining whether all correlations are 0
72
Testing Correlations We then consider the -2log likelihood (using a large sample approximation) Bartlett correction:
73
Testing Correlations: Example
Say we have an estimated correlation matrix and want to test if all correlations are 0
74
Inference for Individual Correlations
Now what if we are interested in testing if individual or partial correlations are 0 Using the sample covariance matrix we can compute a t-test
75
Inference for Individual Correlations
We can also find an approximate (1-a)100% CI for correlation:
76
Example From our earlier correlation matrix r13 = 0.24:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.