Download presentation
Presentation is loading. Please wait.
Published byCorey Griffith Modified over 9 years ago
1
LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia
2
LOGO Learning Objectives Determine when regression analysis is the appropriate statistical tool in analyzing a problem. 1 Understand how regression helps us make predictions using the least squares concept. 2 Use dummy variables with an understanding of their interpretation. 3 Be aware of the assumptions underlying regression analysis and how to assess them. 4 Select an estimation technique & explain the difference between stepwise & simultaneous regression 5 Interpret the results of regression 6 Apply the diagnostic procedures necessary to assess “influential” observations. 7 Multiple Regression Analysis
3
LOGO Types of Regression Models Multiple Regression Analysis
4
LOGO Definition Multiple regression analysis... is a statistical technique that can be used to analyze the relationship between a single dependent (criterion) variable and several independent (predictor) variables. Multiple Regression Analysis
5
LOGO Linear Multiple Regression Model Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error Relationship between 1 dependent & 2 or more independent variables is a linear function Multiple Regression Analysis
6
LOGO Population Linear Multiple Regression Model Bivariate model Multiple Regression Analysis
7
LOGO A Decision Process for Multiple Regression Analysis Objectives of Multiple RegressionResearch Design of Multiple RegressionAssumptions in Multiple Regression Analysis Estimating the Regression Model and Assessing Overall Fit Interpreting the Regression Variate Validation of the Results Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6: Multiple Regression Analysis
8
LOGO Stage 1 Objectives of Multiple Regression Research Problems Appropriate for Multiple Regression Prediction : to predict the dependent variable. Explanation : to examine the regression coefficients for each independent variable and attempts to develop a subtantive or theoretical reason for the effects of the independent variable. Specifying a Statistical Relationship [no functional relationship] Selection of Dependent and Independent Variables The theory that supports using the variables, Measurement error, especially in the dependent variable [Summated scales and Structural equation modeling procedures] Specification error, especially in the independent variable Multiple Regression Analysis
9
LOGO Rules of Thumb 4-1 Only structural equation modeling (SEM) can directly accommodate measurement error, but using summated scales can mitigate it when using multiple regression. When in doubt, include potentially irrelevant variables (as they can only confuse interpretation) rather than possibly omitting a relevant variable (which can bias all regression estimates). Meeting Multiple Regression Objectives Multiple Regression Analysis
10
LOGO Stage 2 Research Design of Multiple Regression Sample Size Statistical Power and Sample Size Generalizability and Sample Size Creating Additional Variables Incorporating Nonmetric Data with Dummy Variables Representing Curvilinear Effects with Polynomials Representing Interaction or Moderator Effects Multiple Regression Analysis
11
LOGO Rules of Thumb 4-2 Simple regression can be effective with a sample size of 20, but maintaining power at.80 in multiple regression requires a minimum sample of 50 and preferably 100 observations for most research situations. The minimum ratio of observations to variables is 5 to 1, but the preferred ratio is 15 or 20 to 1, and this should increase when stepwise estimation is used. Maximizing the degrees of freedom improves generalizability and addresses both model parsimony and sample size concerns. Sample Size Considerations Multiple Regression Analysis
12
LOGO Rules of Thumb 4-3 Non metric variables can only be included in a regression analysis by creating dummy variables. Dummy variables can only be interpreted in relation to their reference category. Adding an additional polynomial term represents another inflection point in the curvilinear relationship. Quadratic and cubic polynomials are generally sufficient to represent most curvilinear relationships. Assessing the significance of a polynomial or interaction term is accomplished by evaluating incremental R 2, not the significance of individual coefficients, due to high multicollinearity. Variable Transformations Multiple Regression Analysis
13
LOGO Stage 3 Assumptions in Multiple Regression Analysis Assessing Individual Variables Versus the Variate Methods of diagnosis [studentized residual] Linearity of the Phenomenon Constant Variance of the Error Term Independence of the Error Terms Normality of the Error Term Distribution Multiple Regression Analysis
14
LOGO Graphical Analysis of Residual Multiple Regression Analysis
15
LOGO Rules of Thumb 4-4 Testing assumptions must be done not only for each dependent and independent variable, but for the variate as well. Graphical analyses (i.e., partial regression plots, residual plots and normal probability plots) are the most widely used methods of assessing assumptions for the variate. Remedies for problems found in the variate must be accomplished by modifying one or more independent variables as described in Chapter 2. Assessing Statistical Assumptions Multiple Regression Analysis
16
LOGO Stage 4 Estimating the Regression Model & Assessing Overall Fit How to choose Regression Model? Use your substantive knowledge of research context Theoretical foundation Selecting an estimation technique Confirmatory Specification (simultaneous) Sequential Search Methods Stepwise (variables not removed once included in regression equation). Forward Inclusion & Backward Elimination. Hierarchical. Combinatorial Approach (All-Possible-Subsets) Testing the Regression Variate for Meeting the Regression Assumptions Multiple Regression Analysis
17
LOGO Rules of Thumb 4-5 No matter which estimation technique is chosen, theory must be a guiding factor in evaluating the final regression model because: Confirmatory Specification, the only method to allow direct testing of a pre-specified model, is also the most complex from the perspectives of specification error, model parsimony and achieving maximum predictive accuracy. Sequential search (e.g., stepwise), while maximizing predictive accuracy, represents a completely “automated” approach to model estimation, leaving the researcher almost no control over the final model specification. Combinatorial estimation, while considering all possible models, still removes control from the researcher in terms of final model specification even though the researcher can view the set of roughly equivalent models in terms of predictive accuracy. No single method is “Best” and the prudent strategy is to use a combination of approaches to capitalize on the strengths of each to reflect the theoretical basis of the research question. Estimation Techniques Multiple Regression Analysis
18
LOGO Types of Influential Observations There are three basic types based upon the nature of their impact on the regression results: Outliers are observations that have large residual values and can be identified only with respect to a specific regression model. Leverage points are observations that are distinct from the remaining observations based on their independent variable values. Influential observations are the broadest category, including all observations that have a disproportionate effect on the regression results. Influential observations potentially include outliers and leverage points but may include other observations as well. Influential observations... include all observations that have a disproportionate effect on the regression results. Multiple Regression Analysis
19
LOGO Corrective Actions for Influential Influentials, outliers, and leverage points are based on one of four conditions, each of which has a specific course of corrective action: 1.An error in observations or data entry – remedy by correcting the data or deleting the case, 2. A valid but exceptional observation that is explainable by an extraordinary situation – remedy by deletion of the case unless variables reflecting the extraordinary situation are included in the regression equation, 3.An exceptional observation with no likely explanation – presents a special problem because there is no reason for deleting the case, but its inclusion cannot be justified either, suggesting analyses with and without the observations to make a complete assessment, 4.An ordinary observation in its individual characteristics but exceptional in its combination of characteristics – indicates modifications to the conceptual basis of the regression model and should be retained. Multiple Regression Analysis
20
LOGO Rules of Thumb 4-6 Always ensure practical significance when using large sample sizes, as the model results and regression coefficients could be deemed irrelevant even when statistically significant due just to the statistical power arising from large sample sizes. Use the adjusted R2 as your measure of overall model predictive accuracy. Statistical significance is required for a relationship to have validity, but statistical significance without theoretical support does not support validity. While outliers may be easily identifiable, the other forms of influential observations requiring more specialized diagnostic methods can be equal to or even more impactful on the results. Statistical Significance and Influential Observations Multiple Regression Analysis
21
LOGO Stage 5 Interpreting the Regression Variate Using the Regression Coefficients Standardizing the Regression Coefficients: Beta Coefficients [most common mean =0; std.dev =1] Assessing Multicollinearity Multiple Regression Analysis
22
LOGO Assessing Multicollinearity 1.High Correlation Between X Variables 2.Coefficients Measure Combined Effect 3.Leads to Unstable Coefficients Depending on X Variables in Model 4.Always Exists -- Matter of Degree 5.Example: Using Both Age & Height as Explanatory Variables in Same Model Multiple Regression Analysis
23
LOGO Detecting Multicollinearity 1.Examine Correlation Matrix Correlations Between Pairs of X Variables Are More than With Y Variable 2.Examine Variance Inflation Factor (VIF) measures how much the variance of the regression coefficients is inflated by multicollinearity problems. If VIF j > 10, Multicollinearity Exists 3.Few Remedies Obtain New Sample Data Eliminate One Correlated X Variable Multiple Regression Analysis
24
LOGO Rules of Thumb 4-7 Interpret the impact of each independent variable relative to the other variables in the model, as model respecification can have a profound effect on the remaining variables: Use beta weights when comparing relative importance among independent variables. Regression coefficients describe changes in the dependent variable, but can be difficult in comparing across independent variables if the response formats vary. Multicollinearity may be considered “good” when it reveals a suppressor effect, but generally it is viewed as harmful since increases in multicollinearity: reduce the overall R2 that can be achieved, confound estimation of the regression coefficients, and negatively impact the statistical significance tests of coefficients. Generally accepted levels of multicollinearity (tolerance values up to.10, corresponding to a VIF of 10) should be decreased in smaller samples due to increases in standard error attributable to multicollinearity. Interpreting the Regression Variate Multiple Regression Analysis
25
LOGO Stage 6 Validation of the Results Additional or Split Samples Calculating the PRESS Statistics [similar to r 2 ] Comparing Regression Models [adjusted r 2 ] Predicting with the Model [several factor] Multiple Regression Analysis
26
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Objectives of Multiple Regression STAGE 1
27
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Research Design of Multiple Regression STAGE 2 Sample = 100, with 13 potential independent variables. R 2 approximately 23% at power of 0.80 with significance set at 0.01 If significance level = 0.05, so R 2 about 18% of variance. Guideline of the minimum ratio of observations to independent variable = 5:1 100 observation with 13 independent variable 100 : 13 ≈ 7 : 1 (ok!) 13
28
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Assumptions in Multiple Regression Analysis STAGE 3 1.Testing individual dependent variable and independent variables Linearity scatter-plots(ok!) Constant variance test heteroscedasticity ONLY X 6 & X 17 Normality ONLY X 12 [makes transformations for others] 2.Testing overall relationship after model estimation
29
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Estimating the Regression Model and Assessing Overall Fit STAGE 4 The Stepwise Procedure
30
LOGO www.themegallery.com Multiple Regression Analysis
31
LOGO www.themegallery.com Multiple Regression Analysis
32
LOGO www.themegallery.com Multiple Regression Analysis
33
LOGO www.themegallery.com Multiple Regression Analysis
34
LOGO www.themegallery.com Multiple Regression Analysis
35
LOGO www.themegallery.com Multiple Regression Analysis
36
LOGO www.themegallery.com Multiple Regression Analysis
37
LOGO www.themegallery.com Multiple Regression Analysis
38
LOGO www.themegallery.com Multiple Regression Analysis
39
LOGO www.themegallery.com Multiple Regression Analysis
40
LOGO www.themegallery.com Multiple Regression Analysis
41
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Interpreting the Regression Variate STAGE 5
42
LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Validation of the Results STAGE 6 Examining the Adjusted R 2 Split-Sample Validation
43
LOGO www.themegallery.com Multiple Regression Analysis
44
LOGO Using Confirmatory Estimation approach www.themegallery.com Multiple Regression Analysis
45
LOGO Using Dummy Variable www.themegallery.com Multiple Regression Analysis
46
LOGO A Butterfly Flaps its Wings in Japan, Which Causes It to Rain in Nebraska. -- Anonymous
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.