Chapter 17 Basic Multivariate Techniques Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e
17-2© 2007 Pearson Education Canada Testing Three-Variable Causal Models Demonstrating causality is more difficult in nonexperimental research than experimental To establish causality you must show that: 1. Variables are associated 2. Plausible causal sequence 3. Variables not spuriously connected Identical analyses can be used to test different three-variable models
17-3© 2007 Pearson Education Canada Testing Three-Variable Causal Models (cont’d) Begin with standard contingency table; three variable models elaborate upon bivariate models TABLE 17.1 Percentage of Senior High-School Students with Plans for Further Education by Socioeconomic Status (SES) TYPE OF PLANLOW SES BACKGROUND HIGH SES BACKGROUND TOTAL N%N%N% Some plans No plans TOTAL X 2 = df = 1 Significant at the.001 level.
17-4© 2007 Pearson Education Canada Testing for Intervening Variables: The Intervening Variable Model > X > I > Y Can propose a number of possible intervening variables. For example:
17-5© 2007 Pearson Education Canada The Rationale If the model is correct, X should only be able to influence Y through the intervening variable (I). If we hold the “I” variable constant, X cannot influence Y
17-6© 2007 Pearson Education Canada Jackson’s Rule of Thirds Compare the original difference to what happens to the difference when we run the control for the intervening variable To do so, we decide where the cut-points are between the thirds Table 17.3, Applying Jackson’s Rule of Thirds, shows how to calculate and interpret the results (see next slide)
17-7© 2007 Pearson Education Canada Jackson’s Rule of Thirds: Using Crosstabs to Test for an Intervening Variable Original Difference: Low SES High SES % with plans: Difference: 15.8
17-8© 2007 Pearson Education Canada Jackson’s Rule of Thirds: Calculating and Interpreting Thirds Original difference = 15.8 (88.9 – 73.1) Third = 15.8 / 3 = 5.3 Interpretation of results If difference > 21.2 ( ) = increased If between 10.5 and 21.1 (15.8 ± 5.3) = same If < 10.5 (15.9 – 5.3) = decreased If different in two categories = mixed
17-9© 2007 Pearson Education Canada Outcome 1 (from Table 17.4, p. 446) Original Difference: Low SES High SES % with plans Difference: 15.8 Crosstabs, Outcome 1: Best Friend High SES Best Friend Low SES SES Level Low SES High SES Low SES High SES % with plans Difference 1.0 –2.0 Interpretation: Relationship decreased/disappeared: this outcome is consistent with an intervening variable model
17-10© 2007 Pearson Education Canada Outcome 2 (from Table 17.4, p. 446) Original Difference: Low SES High SES % with plans Difference: 15.8 Crosstabs, Outcome 2: Best Friend High SES Best Friend Low SES SES Level Low SES High SES Low SES High SES % with plans Difference Interpretation: Relationship stays the same: reject the intervening variable model
17-11© 2007 Pearson Education Canada Outcome 3 (from Table 17.4, p. 446) Original Difference: Low SES High SES % with plans Difference: 15.8 Crosstabs, Outcome 3: Best Friend High SES Best Friend Low SES SES Level Low SES High SES Low SES High SES % with plans Difference Interpretation: Relationship decreased/disappeared: this outcome supports an intervening variable model
17-12© 2007 Pearson Education Canada Outcome 4 (from Table 17.4, p. 446) Original Difference: Low SES High SES % with plans Difference: 15.8 Crosstabs, Outcome 4: Best Friend High SES Best Friend Low SES SES Level Low SES High SES Low SES High SES % with plans Difference Interpretation: Relationship strengthened: reject the intervening variable model
17-13© 2007 Pearson Education Canada Outcome 5 (from Table 17.4, p. 446) Original Difference: Low SES High SES % with plans Difference: 15.8 Crosstabs, Outcome 5: Best Friend High SES Best Friend Low SES SES Level Low SES High SES Low SES High SES % with plans Difference Interpretation: Results are mixed: reject the intervening variable model
17-14© 2007 Pearson Education Canada Using Means to Test for an Intervening Variable Model Original Difference (from bottom of Table 17.5, p. 448) : Low SES High SES % with plans Difference: 1.40 Calculating thirds = 1.40 / 3 =.47 Increased if > = 1.88 Stayed Same if 1.40 ±.47 =.93 – 1.87 Decreased if < 1.40 –.47 =.93 Mixed if different in two categories
17-15© 2007 Pearson Education Canada Outcome 1 (from Table 17.5, p. 448) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 1 No Support Support SES Level Low High Low High % Plans Difference Interpretation: Relationship intensified (shown by increases): reject the intervening variable model; financial support likely has an independent influence on dependent variable
17-16© 2007 Pearson Education Canada Outcome 2 (from Table 17.5, p. 448) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 2: No Support Support SES Level Low High Low High % Plans Difference Interpretation: Relationship stayed the same: reject the intervening variable model
17-17© 2007 Pearson Education Canada Outcome 3 (from Table 17.5, p. 448) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 3: No Support Support SES Level Low High Low High % Plans Difference Interpretation: Relationship decreased/disappeared: evidence supports the intervening variable model
17-18© 2007 Pearson Education Canada Outcome 4 (from Table 17.5, p. 448) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 4: No Support Support SES Level Low High Low High % Plans Difference Interpretation: Relationship decreased/disappeared: this outcome supports an intervening variable model
17-19© 2007 Pearson Education Canada Outcome 5 (from Table 17.5, p. 448) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome # 5 No Support Support SES Level Low High Low High % Plans Difference Interpretation: Results are mixed: reject the intervening variable model
17-20© 2007 Pearson Education Canada Testing For Sources of Spuriousness: The Source of Spuriousness Model Researcher proposes that there is a statistically significant relation between X and Y, but the relationship may not be causal, existing only because some third variable is influencing both
17-21© 2007 Pearson Education Canada Source of Spuriousness: Rationale If X and Y are spuriously associated, the reason they vary together is that a third variable (source of spuriousness S/S) is influencing both X and Y If we control for the S/S, there should no longer be any association between X and Y Use same test and steps: test the original X/Y relationship. If significant, apply Jackson’s rule of thirds. If original difference disappears, source of spuriousness model is supported
17-22© 2007 Pearson Education Canada Source of Spuriousness: Dilemma Results are not empirically distinguishable Two researchers propose two different causal models to explain the X/Y relation; one proposes a S/S model, the other proposes an intervening variable model If original difference disappears, each finds support for the different causal models Stresses the importance of a priori theorizing – interpretation guided by theory
17-23© 2007 Pearson Education Canada Using Means to Test for a Source of Spuriousness Model Table 17.8, on page 452, shows five different outcomes to illustrate using Jackson’s Rule of Thirds to test for a source of spuriousness The source of spuriousness identified is rural versus urban background. The researcher is suggesting that both SES and plans for post- secondary education are explained by this third variable (urban/rural residence) The next five slides show the different outcomes
17-24© 2007 Pearson Education Canada Using Means to Test for a Source of Spuriousness Model (cont’d) Original Difference (from bottom of Table 17.5, p. 448) : Low SES High SES % with plans Difference: 1.40 Calculating thirds = 1.40 / 3 =.47 Increased if > = 1.88 Stayed Same if 1.40 ±.47 =.93 – 1.87 Decreased if < 1.40 –.47 =.93 Mixed if different in two categories
17-25© 2007 Pearson Education Canada Outcome 1 (from Table 17.8, p. 452) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 1 Rural Background Urban Background SES Level Low High Low High % Plans Difference Interpretation: Relationship intensified (shown by increases): reject the source of spuriousness model; rural/urban background likely has an independent influence on dependent variable
17-26© 2007 Pearson Education Canada Outcome 2 (from Table 17.8, p. 452) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 2 Rural Background Urban Background SES Level Low High Low High % Plans Difference Interpretation: Difference remain the same after control for urban rural background: reject source of spuriousness model
17-27© 2007 Pearson Education Canada Outcome 3 (from Table 17.8, p. 452) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 3 Rural Background Urban Background SES Level Low High Low High % Plans Difference Interpretation: Difference decreased: we find support for the source of spuriousness model
17-28© 2007 Pearson Education Canada Outcome 4 (from Table 17.8, p. 452) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 4 Rural Background Urban Background SES Level Low High Low High % Plans Difference Interpretation: Original difference reduced to less than one- third its original value: supports the source of spuriousness model
17-29© 2007 Pearson Education Canada Outcome 5 (from Table 17.8, p. 452) Original Difference: Low SES High SES % with plans Difference: 1.40 Means, Outcome 5 Rural Background Urban Background SES Level Low High Low High % Plans Difference Interpretation: Results are mixed. Difference disappears among rural students, but is only slightly reduced among urban students. We reject the source of spuriousness model
17-30© 2007 Pearson Education Canada Multiple Regression: Regression Multiple regression is used: to examine the impact of several variables on a dependent variable when the dependent variable and, preferably, most of the independent variables are ratio level
17-31© 2007 Pearson Education Canada Multiple Regression (cont’d) Multiple regression is a powerful tool because it allows the researcher to: estimate the relative importance of each of the independent variables in predicting variation in a dependent variable identify a linear equation describing the relationship between the independent and dependent variables
17-32© 2007 Pearson Education Canada The Linear Regression Equation Elements in the equation tell us the relative importance of each factor in predicting the dependent variable. Recall, from Chapter 8, the regression formula for two variables: Y = a + bX Multiple Regression extends the equation where: Y = a + b 1 X 1 + b 2 X 2 + … b k X k
17-33© 2007 Pearson Education Canada The Linear Equation (cont’d) Y = a + b 1 X 1 + b 2 X 2 + …b k X k Y is the dependent variable a is the constant, the point where the regression line crosses the Y axis b represent the beta weightings for each of the independent variables X is the value of the independent variable
17-34© 2007 Pearson Education Canada The Linear Equation (cont’d) Y = a + ß 1 X 1 + ß 2 X 2 + …ß k X k ß These values are knows as beta weights. A beta weight simply represents a standardized version of a b coefficient. Think of ßs as Z-score versions of the b coefficients. Recall that Z scores standardize variables
17-35© 2007 Pearson Education Canada The Linear Equation (cont’d) To compute the relative importance of variables once we have the betas, we can use the following formula: % Variance explained by each variable = ß 1 x R 2 x 100 ßs
17-36© 2007 Pearson Education Canada Multiple Regression (cont’d) SPSS will produce both b and ß values. The a value (called the constant) will also be printed. R 2 : This value will also be reported which tells you how much of the variance in the dependent variable is explained by the equation
17-37© 2007 Pearson Education Canada Using Non-Ratio Level Variables Ordinal variables may be included in their raw form (un-recoded) but the equation will underestimate the relative importance of non- ratio variables Nominal variables may be included by transforming them into “dummy variables” Dummy variables are recoded to “presence/absence” variables
17-38© 2007 Pearson Education Canada Creating Dummy Variables Create new variables to replace the nominal variable so that you have one fewer variables than categories in the original variable. i.e., if you have a four-category religion variable (Christian, Jewish, Muslim, Other), then recode this into three new variables coded into presence (1) / absence (0). shown in Table 17.9, p. 462
17-39© 2007 Pearson Education Canada Tips for Regression Analysis 1. Ensure that variables are theoretically independent of one another 2. Watch out for highly correlated independent variables (multicollinearity) Either convert these into an index (if that makes sense) or simply select one of them 3. Try to achieve ratio-level measurement 4. Use raw data: do not use recoded forms of ordinal or ratio variables
17-40© 2007 Pearson Education Canada Tips (cont’d) 5. Use Backward solution so least important variables drop out first 6. Interpret weightings with care 7. Monitor number of cases When missing values are a concern, try: Repeat analysis keeping problem variables “Pairwise” treatment of missing values “Means” solution where missing values set to mean for the variable
17-41© 2007 Pearson Education Canada Tips (cont’d) 8. Deal with interactions among independent variables If two variables have little impact on the dependent variable independently, but you expect the interaction to explain variation in Y, create an interaction variable
17-42© 2007 Pearson Education Canada Discriminant Function Analysis Similar to regression analysis but used in cases where the dependent variable is either: measured at the nominal level, or not normally distributed Discriminant function analysis attempts to predict the category of the Y variable into which each case falls by using the combined information from the X variables e.g., predict whether someone will participate in post-secondary education, based on info on grade 11 average, SES, family size, etc.
17-43© 2007 Pearson Education Canada Comparison with Multiple Regression Similar: can look at impact of several X variables Results in the calculation of discriminant coefficients similar to a regression equation D = B 0 + B 1 X 1 + B 2 X B k X k B 0 = the constant B 1 = the coefficient for the 1st variable To compute the “discriminant score”: multiply the coefficient by the observed value (see Table 17.11, p. 466).
17-44© 2007 Pearson Education Canada Discriminant Analysis (cont’d) Discriminant analysis assumes ratio level independent variables (similar to regression) and, like regression, dummy variables may be included. Both standardized and unstandardized coefficients are provided on the output. If you want to calculate relative contributions, use the standardized version
17-45© 2007 Pearson Education Canada Discriminant Analysis (cont’d) When discriminant analysis is run, you will get a report on the % of cases that can be correctly classified by using the information on the independent variables The analysis relies on Lambda. This statistic measures the proportionate reduction in error that results with knowledge of the independent variables
17-46© 2007 Pearson Education Canada Table Discriminant Analysis, Sample Presentation ACTUAL GROUPNUMBER OF CASES PREDICTED GROUP MEMBERSHIP 1 2 Participate (1) Not participate (2) TOTAL Percentage of “grouped” cases correctly classified: 293 out of 344 cases = 85.2%.