Multiple Independent Variables POLS 300 Butz
Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality Need for techniques that allow the research to control for other independent variables
Multivariate Analysis Employed to see how large sets of variables are interrelated. Idea is that if one can find a relationship between x and y after accounting for other variables (w and z) we may be able to make “causal inference”.
Multivariate Analysis We know that both X and Y both may be caused by Z, spurious relationship. Multivariate Analysis allows for the inclusions of other variables and to test if there is still a relationship between X and Y.
Multivariate Analysis Must ask if the possibility of a third variable (and maybe others) is the “true” cause of both the IV and DV Experimental analyses “prove” causation but only in Laboratory Setting…must use Multivariate Statistical Analyses in “real- world” Need to “Control” or “hold constant” other variables to isolate the effect of IV on DV!
Controlling for Other Independent Variables Multivariate Crosstabulation – evaluate bivariate relationship within subsets of sample defined by different categories of third variable (“control by grouping”) At what level(s) of measurement would we use Multivariate Crosstabulation??
Multivariate Crosstabulation Control by grouping: group the observations according to their values on the third variable and… then observe the original relationship within each of these groups. P. 407/506 – Spending Attitudes and Voting…controlling for Income! – spurious Occupational Status and Voter Turnout P. 411/510…control for “education”!
Quick Review: Regression In general, the goal of linear regression is to find the line that best predicts Y from X. Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line Minimizing the vertical distances of the data points from the line.
Regression vs. Correlation The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line) The regression line represents predicted values of Y based on the values of X
Equation for a Line (Perfect Linear Relationship) Y i = a + BX i a = Intercept, or Constant = The value of Y when X = 0 B = Slope coefficient = The change (+ or ‑ ) in Y given a one unit change in X
Slope Y i = a + BX i B = Slope coefficient If B is positive than you have a positive relationship. If it is negative you have a negative relationship. The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit change in X General Interpretation: For one unit change in X, we expect a B change in Y on average.
Calculating the Regression Equation For “Threat Hypothesis” The estimated regression equation is: E(welfare benefit1995) = [(-6.292) * %black(1995)] Number of obs = 50 F( 1, 64) = Prob < = R-squared = welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval] Black1995(b)| _cons(a)|
Regression Example: “Threat Hypothesis” To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y. 10% E(welfare benefit1995) = [(-6.292) * 10] = $ % E(welfare benefit1995) = [(-6.292) * 20] = $ % E(welfare benefit1995) = [(-6.292) * 30] = $234.09
Regression Analysis and Statistical Significance Testing for statistical significance for the slope –The p-value - probability of observing a sample slope value (Beta Coefficent) at least as large as the one we are observing in our sample IF THE NULL HYPOTHESIS IS TRUE –P-values closer to 0 suggest the null hypothesis is less likely to be true (P <.05 usually the threshold for statistical significance) –Based on t-value…(Beta/S.E.) = t
Multiple Regression At what level(s) of measurement would we employ multiple regression??? Interval and Ratio DVs Now working with a new model: Y i = a b X i b X 2i b k X ki e i
Multiple Regression Y i = a b X i b X 2i b k X ki e i b are “Partial” slope coefficients. a is the Y-Intercept. e is the Error Term.
Slope Coefficients Slope coefficients are now Partial Slope Coefficients, although we still refer to them generally as slope coefficients. They have a new interpretation: “The expected change in Y given a one ‑ unit change in X1, holding all other variables constant”
Multiple Regression By “holding constant” all other X’s, we are therefore “controlling for” all other X’s, and thus isolating the “independent effect” of the variable of interest. “Holding Constant” – group observations according to levels of X2, X3, ect…then look at impact of X1 on Y! This is what Multiple Regression is doing in practice!!! Make everyone “equal” in terms of “control” variable then examine the impact of X1 on Y!
“Holding Constant” other IVs Income (Y) = Education (X1); Seniority (X2) Look at relationship between Seniority and Income WITHIN different levels of education!!! “Holding Education Constant” Look at relationship between Education and Income WITHIN different levels of Senority!!! “Holding Seniority Constant”
The Intercept Y i = a b X i b X 2i b k X ki e i Y-Intercept (Constant) value…(a)…is now the expected value of Y when ALL the Independent Variables are set to 0.
Testing for Statistical Significance Proceeds as before – a probability that the null hypothesis holds (p-value) is generated for each sample slope coefficient Based on “t-value” (Beta/ S.E.) And Degrees of Freedom!
Fit of the Regression R-squared value – the proportion of variation in the dependent variable explained by ALL of the independent variables combined TSS – ResSS/ TSS… “Explained Variation in DV divided by Total Variation in DV”
R-square R-square ranges from 0 to 1. 0 is no relationship. 1 is a prefect relationship…IVs explain 100% of the variance in the DV.
R-square Doesn’t tell us WHY the dependent variable varies or explains the results….This is why we need Theory!!! Simply a measure of how well your model fits the dependent variable. How well are the Xs predicting Y! How much variation in Y is explained by Xs!
Multiple Regression -Y= Income in dollars X1= Education in years X2= Seniority in years Y= a + b1(education) + b2(Seniority) + e
Example Y= X X 2 + e - Both Coefficients are statistically significant at the P <.05 Level… Because of the positive Beta…expected change in Income (Y) given a one ‑ unit increase in Education is +$432, holding seniority in years constant.
Predicted Values Lets predict someone with 10 years of education and 5 years of seniority. Y= X X 2 +e = (10)+281(5) = Predicted value of Y for this case is $11,391.
R-squared r-squared for this model is.56. Education and Seniority explain 56% of the variation in income.