Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.

Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present –Perfect reliability of predictors 3. Homoscedasticity (homogeneity) of errors of prediction 4. -Independence of errors of prediction 5. Normality of errors of prediction (residuals)

REMEDIES 1 Linear vs. Nonlinear modeling: –Linearity means for our purposes that each term in a model is added to another, and that each term has only one parameter as part of it –Eg. y = b 1 x 1 + b 2 x 2 + b 0 + e is linear because each variable (x 1, x 2, (intercept), e ) has only one parameter with it: b 1, b 2, b 0, 1 –An alternative definition is that x 1 and all the other variables are actually best modeled by some transformation, such as x 2 1

FORM OF RELATIONSHIP Linearity- use lowess line to provide evidence

Relationship problems in MR Counseling Linearity: transform using power functions: –If X is curvilinear in relation to Y, change X to X 2, Y = b 1 X 2 + e –If there is a possible interaction effect of two predictors, X1 and X2, create a new variable equal to the product (chapter 7) –Transform Y using log(Y) or SQRT(Y) in SPSS: Compute

REMEDIES 1 x 2 1 is still linear in the sense of the first definition, but when we plot it we see a different prediction line: x21x21 x1x1 y y b linear b quadratic data y x x 2 11 1 4 2 4 9 3 9 16 4 16 9 5 25 4 6 36

Main Effects and Interactions Main effects: effect for one predictor consistent across all values of second predictor Standard regression model: y = b 1 x 1 + b 2 x 2 + b 0 Interaction: effect additional to main effects Defined as product of two variables: x 1* x 2 new predictor variable

2. The Missing Variable ALL models can change seriously if we are missing an important variable Z –b-weights will usually change if Z is correlated with the predictors already in the model –Standard error of estimation and standard errors of b-weights will be reduced if Z is related to Y and not the other predictors –All above will change depending on the combination of relationships

Relevant Variables Theory, theory, theory Test additional variables for change in R- square, change in b- and beta weights of remaining predictors, change in residual characteristics.

Omitted relevant variable Effect of omitted predictor on beta weights, R-square, path coefficients Added value plots- examine error of new predictor from other predictors graphed against errors of prediction of original predictors

Measurement error in IVs Validity is attenuated (correlation between predictor and outcome) if the predictor is measured unreliably Disattenuated correlation between predictor and dependent variable is estimated by dividing by the square root of the reliability of the predictor (and of the dependent variable if we want a construct estimate of the correlation)

Measurement error in IVs Structural Equation Modeling (SEM) correctly estimates standard errors of parameters under ML estimation, assumes large samples (at least 100) OLS does not correctly correct the disattenuated correlation standard error for the b-weight

Omitted relevant variable LLR Smoother -5.000000.000005.0000010.00000 Residua of Social Stressl -10.00000 0.00000 10.00000 20.00000 Residual in Attitude to School A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A Error of Initial Prediction by Error of Social Stress Prediction Predictors: Sensation Seeking, Locus of Control, Self Reliance

Omitted relevant variable Predictors: Sensation Seeking, Locus of Control, Self Reliance Linear Regression -5.000000.000005.0000010.00000 Residual of Social Stress -10.00000 0.00000 10.00000 20.00000 Residual of Attitude to School A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A Unstandardized Residual = -0.00 + 0.25 * RES_2 R-Square = 0.01 Error of Initial Prediction by Error of Social Stress Prediction

Measurement Error Requires Structural Equation Model Approach to include measurement error in predictors –Requires independent estimate of reliability of each predictor appropriate for the sample T2 T1 e1 e2 T3 √1-Rel. T2 √1-Rel. T1 e3 β 31 β 32

MRA ASSUMPTIONS-3 Homoscedasticity- errors of the prediction hyperplane are assumed homogeneous across the hyperplane (think of normal distributions of errors sticking out above and below the hyperplane: y x1 x2 Predictor values for x 1 and x 2

3. Nonhomogeneity of Variance If variances are unequal due to known groupings or clustering, each group’s variance can be estimated separately in SEM to fit correctly a regression model If variances are changing linearly for a predictor (or set of predictors): –Weighted least squares can be used –Nonlinear modeling (Hierarchical linear model) of the variance can be conducted using SEM

Homoscedasticity of residuals: SPSS Regression: SAVE: Unstandardized Predicted Values, Unstandardized Residuals SPSS Graph: Interactive: Scatterplot: Fit: Smoother LLR Smoother 2.000004.000006.00000 Unstandardized Predicted Value -5.00000 0.00000 5.00000 Unstandardized Residual A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A AA A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A Error of Initial Prediction by Error of Social Stress Prediction Linear Regression 2.000004.000006.00000 Unstandardized Predicted Value -5.00000 0.00000 5.00000 Unstandardized Residual A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A AA A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A A A A = 0.00 + -0.00 * PRE_1 R-Square = 0.00 Error of Initial Prediction by Error of Social Stress Prediction

Error bars suggest increasing variance with age, poor estimates for age 11, 19

Nonhomogeneous Variance Correction Levene’s test, Brown-Forsythe correction for ANOVA Not comparable correction for regression; divide predictor into 5 subgroups based on distribution, use ANOVA Use Weighted Least Squares (see text)

Dependent Errors Clustering- data occur in groups (classrooms, clinics, etc.) so that errors are more similar within cluster than between cluster Autocorrelation- data are correlated within person over time (or between clustered pairs, triplets etc.)

Dependent Errors Clustering becomes a category variable in the analysis through dummy coding; separate group models can be constructed through SEM, or the category variable can be included in the path model (which is better depends on homogeneous variance or not)

Dependent Errors Autocorrelation is computed for time data by correlating pairs of data; each pair is the data at one time point with its successor. Eg. (x 1, x 2 ), (x 2, x 3 ), (x 3, x 4 ), etc. The regression of y on itself r uses the autocorrelation y t – r y t-1 = y* t The new dependent variable y* t is now modeled in a regression

Dependent Errors Means correlation among errors in sets of the data (eg. Some siblings in a group of adolescent sample of BASC) –Likely problem only if a meaningful fraction of the total sample size Difficult to determine, separate from random pairings of errors Other data (cluster information) needed, such as age for BASC Anxiety

Dependent Errors Time-related or longitudinal data may have autocorrelation (correlation over time of residuals); Durbin-Watson test gives omnibus test; ARMA model given below: Time1Time2Time3 e2e3...    

MRA ASSUMPTIONS-5 NORMALITY OF RESIDUALS:Violation of this assumption is not a problem –unless SKEWNESS is severe ( >± 4 or 5, maybe even larger) and –KURTOSIS is severe (> 3)Combinations of skewness and kurtosis at the edge of these values may cause problems Effects of violation: –Type I error rates increase, sometimes greatly –Estimates can be biased

Normality of residuals -2.500000.000002.500005.00000 Residual of Predicted Attitude to School 0 25 50 75 Count

Q-Q Plot Plots the quantiles of a variable's distribution against the quantiles of any of a number of test distributions. Probability plots are generally used to determine whether the distribution of a variable matches a given distribution. If the selected variable matches the test distribution, the points cluster around a straight line. Available test distributions include beta, chi-square, exponential, gamma, half-normal, Laplace, Logistic, Lognormal, normal, pareto, Student's t, Weibull, and uniform. Depending on the distribution selected, you can specify degrees of freedom and other parameters. You obtain probability plots for transformed values. Transformation options include natural log, standardize values, difference, and seasonally difference. You can specify the method for calculating expected distributions, and for resolving "ties," or multiple observations with the same value.

Q-Q plot: SPSS GRAPH: Q-Q 40506070 Observed Value 40 50 60 70 e t r m l V l Normal Q-Q Plot of ATTITUDE TO SCHOOL Residual of prediction

P-P plots Plots a variable's cumulative proportions against the cumulative proportions of any of a number of test distributions. Probability plots are generally used to determine whether the distribution of a variable matches a given distribution. If the selected variable matches the test distribution, the points cluster around a straight line. Available test distributions include beta, chi-square, exponential, gamma, half-normal, Laplace, Logistic, Lognormal, normal, pareto, Student's t, Weibull, and uniform. Depending on the distribution selected, you can specify degrees of freedom and other parameters. You obtain probability plots for transformed values. Transformation options include natural log, standardize values, difference, and seasonally difference. You can specify the method for calculating expected distributions, and for resolving "ties," or multiple observations with the same value.

P-P Plot: SPSS: Graph: P-P Residual of prediction

Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.

Similar presentations

Presentation on theme: "Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.

Similar presentations

Presentation on theme: "Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present."— Presentation transcript:

Similar presentations

About project

Feedback