Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia.

Similar presentations


Presentation on theme: "LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia."— Presentation transcript:

1 LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia

2 LOGO Learning Objectives Determine when regression analysis is the appropriate statistical tool in analyzing a problem. 1 Understand how regression helps us make predictions using the least squares concept. 2 Use dummy variables with an understanding of their interpretation. 3 Be aware of the assumptions underlying regression analysis and how to assess them. 4 Select an estimation technique & explain the difference between stepwise & simultaneous regression 5 Interpret the results of regression 6 Apply the diagnostic procedures necessary to assess “influential” observations. 7 Multiple Regression Analysis

3 LOGO Types of Regression Models Multiple Regression Analysis

4 LOGO Definition Multiple regression analysis... is a statistical technique that can be used to analyze the relationship between a single dependent (criterion) variable and several independent (predictor) variables. Multiple Regression Analysis

5 LOGO Linear Multiple Regression Model Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error Relationship between 1 dependent & 2 or more independent variables is a linear function Multiple Regression Analysis

6 LOGO Population Linear Multiple Regression Model Bivariate model Multiple Regression Analysis

7 LOGO A Decision Process for Multiple Regression Analysis Objectives of Multiple RegressionResearch Design of Multiple RegressionAssumptions in Multiple Regression Analysis Estimating the Regression Model and Assessing Overall Fit Interpreting the Regression Variate Validation of the Results Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6: Multiple Regression Analysis

8 LOGO Stage 1 Objectives of Multiple Regression  Research Problems Appropriate for Multiple Regression Prediction : to predict the dependent variable. Explanation : to examine the regression coefficients for each independent variable and attempts to develop a subtantive or theoretical reason for the effects of the independent variable.  Specifying a Statistical Relationship [no functional relationship]  Selection of Dependent and Independent Variables The theory that supports using the variables, Measurement error, especially in the dependent variable [Summated scales and Structural equation modeling procedures] Specification error, especially in the independent variable Multiple Regression Analysis

9 LOGO Rules of Thumb 4-1 Only structural equation modeling (SEM) can directly accommodate measurement error, but using summated scales can mitigate it when using multiple regression. When in doubt, include potentially irrelevant variables (as they can only confuse interpretation) rather than possibly omitting a relevant variable (which can bias all regression estimates). Meeting Multiple Regression Objectives Multiple Regression Analysis

10 LOGO Stage 2 Research Design of Multiple Regression  Sample Size Statistical Power and Sample Size Generalizability and Sample Size  Creating Additional Variables Incorporating Nonmetric Data with Dummy Variables Representing Curvilinear Effects with Polynomials Representing Interaction or Moderator Effects Multiple Regression Analysis

11 LOGO Rules of Thumb 4-2 Simple regression can be effective with a sample size of 20, but maintaining power at.80 in multiple regression requires a minimum sample of 50 and preferably 100 observations for most research situations. The minimum ratio of observations to variables is 5 to 1, but the preferred ratio is 15 or 20 to 1, and this should increase when stepwise estimation is used. Maximizing the degrees of freedom improves generalizability and addresses both model parsimony and sample size concerns. Sample Size Considerations Multiple Regression Analysis

12 LOGO Rules of Thumb 4-3 Non metric variables can only be included in a regression analysis by creating dummy variables. Dummy variables can only be interpreted in relation to their reference category. Adding an additional polynomial term represents another inflection point in the curvilinear relationship. Quadratic and cubic polynomials are generally sufficient to represent most curvilinear relationships. Assessing the significance of a polynomial or interaction term is accomplished by evaluating incremental R 2, not the significance of individual coefficients, due to high multicollinearity. Variable Transformations Multiple Regression Analysis

13 LOGO Stage 3 Assumptions in Multiple Regression Analysis  Assessing Individual Variables Versus the Variate  Methods of diagnosis [studentized residual]  Linearity of the Phenomenon  Constant Variance of the Error Term  Independence of the Error Terms  Normality of the Error Term Distribution Multiple Regression Analysis

14 LOGO Graphical Analysis of Residual Multiple Regression Analysis

15 LOGO Rules of Thumb 4-4 Testing assumptions must be done not only for each dependent and independent variable, but for the variate as well. Graphical analyses (i.e., partial regression plots, residual plots and normal probability plots) are the most widely used methods of assessing assumptions for the variate. Remedies for problems found in the variate must be accomplished by modifying one or more independent variables as described in Chapter 2. Assessing Statistical Assumptions Multiple Regression Analysis

16 LOGO Stage 4 Estimating the Regression Model & Assessing Overall Fit  How to choose Regression Model? Use your substantive knowledge of research context Theoretical foundation  Selecting an estimation technique Confirmatory Specification (simultaneous) Sequential Search Methods Stepwise (variables not removed once included in regression equation). Forward Inclusion & Backward Elimination. Hierarchical. Combinatorial Approach (All-Possible-Subsets)  Testing the Regression Variate for Meeting the Regression Assumptions Multiple Regression Analysis

17 LOGO Rules of Thumb 4-5 No matter which estimation technique is chosen, theory must be a guiding factor in evaluating the final regression model because:  Confirmatory Specification, the only method to allow direct testing of a pre-specified model, is also the most complex from the perspectives of specification error, model parsimony and achieving maximum predictive accuracy.  Sequential search (e.g., stepwise), while maximizing predictive accuracy, represents a completely “automated” approach to model estimation, leaving the researcher almost no control over the final model specification.  Combinatorial estimation, while considering all possible models, still removes control from the researcher in terms of final model specification even though the researcher can view the set of roughly equivalent models in terms of predictive accuracy. No single method is “Best” and the prudent strategy is to use a combination of approaches to capitalize on the strengths of each to reflect the theoretical basis of the research question. Estimation Techniques Multiple Regression Analysis

18 LOGO Types of Influential Observations There are three basic types based upon the nature of their impact on the regression results: Outliers are observations that have large residual values and can be identified only with respect to a specific regression model. Leverage points are observations that are distinct from the remaining observations based on their independent variable values. Influential observations are the broadest category, including all observations that have a disproportionate effect on the regression results. Influential observations potentially include outliers and leverage points but may include other observations as well. Influential observations... include all observations that have a disproportionate effect on the regression results. Multiple Regression Analysis

19 LOGO Corrective Actions for Influential Influentials, outliers, and leverage points are based on one of four conditions, each of which has a specific course of corrective action: 1.An error in observations or data entry – remedy by correcting the data or deleting the case, 2. A valid but exceptional observation that is explainable by an extraordinary situation – remedy by deletion of the case unless variables reflecting the extraordinary situation are included in the regression equation, 3.An exceptional observation with no likely explanation – presents a special problem because there is no reason for deleting the case, but its inclusion cannot be justified either, suggesting analyses with and without the observations to make a complete assessment, 4.An ordinary observation in its individual characteristics but exceptional in its combination of characteristics – indicates modifications to the conceptual basis of the regression model and should be retained. Multiple Regression Analysis

20 LOGO Rules of Thumb 4-6 Always ensure practical significance when using large sample sizes, as the model results and regression coefficients could be deemed irrelevant even when statistically significant due just to the statistical power arising from large sample sizes. Use the adjusted R2 as your measure of overall model predictive accuracy. Statistical significance is required for a relationship to have validity, but statistical significance without theoretical support does not support validity. While outliers may be easily identifiable, the other forms of influential observations requiring more specialized diagnostic methods can be equal to or even more impactful on the results. Statistical Significance and Influential Observations Multiple Regression Analysis

21 LOGO Stage 5 Interpreting the Regression Variate  Using the Regression Coefficients  Standardizing the Regression Coefficients: Beta Coefficients [most common  mean =0; std.dev =1]  Assessing Multicollinearity Multiple Regression Analysis

22 LOGO Assessing Multicollinearity 1.High Correlation Between X Variables 2.Coefficients Measure Combined Effect 3.Leads to Unstable Coefficients Depending on X Variables in Model 4.Always Exists -- Matter of Degree 5.Example: Using Both Age & Height as Explanatory Variables in Same Model Multiple Regression Analysis

23 LOGO Detecting Multicollinearity 1.Examine Correlation Matrix  Correlations Between Pairs of X Variables Are More than With Y Variable 2.Examine Variance Inflation Factor (VIF) measures how much the variance of the regression coefficients is inflated by multicollinearity problems.  If VIF j > 10, Multicollinearity Exists 3.Few Remedies  Obtain New Sample Data  Eliminate One Correlated X Variable Multiple Regression Analysis

24 LOGO Rules of Thumb 4-7 Interpret the impact of each independent variable relative to the other variables in the model, as model respecification can have a profound effect on the remaining variables:  Use beta weights when comparing relative importance among independent variables.  Regression coefficients describe changes in the dependent variable, but can be difficult in comparing across independent variables if the response formats vary. Multicollinearity may be considered “good” when it reveals a suppressor effect, but generally it is viewed as harmful since increases in multicollinearity:  reduce the overall R2 that can be achieved,  confound estimation of the regression coefficients, and  negatively impact the statistical significance tests of coefficients. Generally accepted levels of multicollinearity (tolerance values up to.10, corresponding to a VIF of 10) should be decreased in smaller samples due to increases in standard error attributable to multicollinearity. Interpreting the Regression Variate Multiple Regression Analysis

25 LOGO Stage 6 Validation of the Results  Additional or Split Samples  Calculating the PRESS Statistics [similar to r 2 ]  Comparing Regression Models [adjusted r 2 ]  Predicting with the Model [several factor] Multiple Regression Analysis

26 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Objectives of Multiple Regression STAGE 1

27 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Research Design of Multiple Regression STAGE 2 Sample = 100, with 13 potential independent variables. R 2 approximately 23% at power of 0.80 with significance set at 0.01 If significance level = 0.05, so R 2 about 18% of variance. Guideline of the minimum ratio of observations to independent variable = 5:1 100 observation with 13 independent variable  100 : 13 ≈ 7 : 1 (ok!) 13

28 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Assumptions in Multiple Regression Analysis STAGE 3 1.Testing individual dependent variable and independent variables Linearity  scatter-plots(ok!) Constant variance  test heteroscedasticity  ONLY X 6 & X 17 Normality  ONLY X 12 [makes transformations for others] 2.Testing overall relationship after model estimation

29 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Estimating the Regression Model and Assessing Overall Fit STAGE 4 The Stepwise Procedure

30 LOGO www.themegallery.com Multiple Regression Analysis

31 LOGO www.themegallery.com Multiple Regression Analysis

32 LOGO www.themegallery.com Multiple Regression Analysis

33 LOGO www.themegallery.com Multiple Regression Analysis

34 LOGO www.themegallery.com Multiple Regression Analysis

35 LOGO www.themegallery.com Multiple Regression Analysis

36 LOGO www.themegallery.com Multiple Regression Analysis

37 LOGO www.themegallery.com Multiple Regression Analysis

38 LOGO www.themegallery.com Multiple Regression Analysis

39 LOGO www.themegallery.com Multiple Regression Analysis

40 LOGO www.themegallery.com Multiple Regression Analysis

41 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Interpreting the Regression Variate STAGE 5

42 LOGO Ilustration of a Regression Analysis Multiple Regression Analysis Validation of the Results STAGE 6 Examining the Adjusted R 2 Split-Sample Validation

43 LOGO www.themegallery.com Multiple Regression Analysis

44 LOGO Using Confirmatory Estimation approach www.themegallery.com Multiple Regression Analysis

45 LOGO Using Dummy Variable www.themegallery.com Multiple Regression Analysis

46 LOGO A Butterfly Flaps its Wings in Japan, Which Causes It to Rain in Nebraska. -- Anonymous


Download ppt "LOGO Chapter 4 Multiple Regression Analysis Devilia Sari - Natalia."

Similar presentations


Ads by Google