Review
. . . . .
Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)
Computational formula
Correlation
Hypothesis testing of r Is there a significant relationship between X and Y (or are they independent)? Are two independent correlations significantly different than each other?
Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)
. . . . .
Regression Equation Y = a + bX Where: Y = value predicted from a particular X value a = point at which the regression line intersects the Y axis b = slope of the regression line X = X value for which you wish to predict a Y value
Regression
How to draw the regression line . . . . .
Hypothesis Testing Have learned How to calculate r as an estimate of relationship between two variables How to calculate b as a measure of the rate of change of Y as a function of X Next determine if these values are significantly different than 0
Testing b The significance test for r and b are equivalent If X and Y are related (r), then it must be true that Y varies with X (b). Important to learn b significance tests for multiple regression
Calculate t-observed b = Slope Sb = Standard error of slope
Multiple Regression Good news! No Math Bad news! Too complicated to do by hand Bad news! Almost all conceptual
Causal Models X (IV) the cause of Y (DV)
Causal Models X (IV) is the cause of Y (DV) This is an assumption – causation is not demonstrated with statistics! X Y
Remember Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike 108 Violet 65
Remember Y = 127 + -13.26(X) COV = -30.5 N = 5 r = -.81 Sx = 1.52 Sy = 24.82
Causal Models -13.26 Candy Depression
Example Data collected from 15 people Salary Years since Ph.D. Publications
Example Predict the salary of a person from the time since their Ph.D. (in years)
Example Predict the salary of a person from the time since their Ph.D. (in years) Y = 51,670 + 1218(X) What do these mean? $51,670 a person tends to earn after graduating (Years = 0) Each year after that a person’s salary increase $1,218 a year
Causal Models 1,218 Years since Ph.D. Salary
Example Predict the salary of a person from the number of publications they have
Causal Models 334 Publications Salary
What if we have two IVs? It is possible to use two IVs at the same time to predict a DV Use both publications and years since Ph.D. to predict salary
Causal Models Publications 334 Salary 1,218 Years since Ph.D.
Causal Models Publications 334 Salary 1,218 Years since Ph.D. How to interpret values if IVs are independent
Causal Models Publications 334 Salary 1,218 Years since Ph.D. Problem: Information provided by publications and Years is probably somewhat redundant
Causal Models Publications Salary r = .66 Years since Ph.D.
Causal Models Publications Salary r = .66 Years since Ph.D. Must estimate these regression coefficients so this relationship is taken into account (called “partial regression coefficients”)
Regression Coefficients Basic logic is exactly the same as normal regression Least squares Has one intercept and each of the IVs has one slope
Regression Coefficients bo the intercept b1 the slope of the first IV b2 the slope of the second IV bp the slope of p IV Y = bo + b1 (X1) + b2 (X2) +. . . .+ bp (Xp)
Example Predict the salary of a person from the number of publications they have and the years since they got their Ph.D.
Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications
Causal Models Publications 122 Salary r = .66 977 Years since Ph.D.
Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?
Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?
Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?
Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?
Question Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications Which IV has a greater “effect” of salary?
Standardized Regression Coefficients Conceptually the same as standardizing all variables and then doing regression analysis Why does this work? Example with Years predicting Salary
Standardized Regression Coefficients With a single predictor – Unstandardized 1,218 Years since Ph.D. Salary With a single predictor -- Standardized .71 Years since Ph.D. Salary
Standardized Regression Coefficients With single IV Correlation between years and salary (r = .71) is the SAME as the standardized regression weight!
Standardized Regression Coefficients β1 = Standardized Regression of first IV β2 = Standardized Regression of second IV βp = Standardized Regression of p IV β0 = Intercept (always = 0)
Remember Publications 122 Salary r = .66 977 Years since Ph.D.
Standardized Publications .21 Salary r = .66 .57 Years since Ph.D.
Regression Coefficients Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) Which IV has a greater “effect” of salary? Can interpret in SD units
Regression Coefficients Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) What would you predict the salary to be if a person’s Years = 1.2 and a persons publications = -.50? Interpret what these values mean!
Testing the full model How well does the model predict? The fit test for the full model and its significance are equal for both standardized and unstandardized models
Person Z1 Z2 ZY 1 -1.26 .35 -.83 2 -.53 -.24 3 -.63 -1.40 -.84 4 .63 1.23 1.56 5 1.26 .36
Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862
Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902
Multiple R
Multiple R
Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model
Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model
Multiple R Commonly used as R2 Can be tested for significance Pros and Cons Can be tested for significance Does the set of variables (taken together) predict Y at better than chance levels? H1 : R* > 0 Ho : R* = 0
Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902
Significance testing for Multiple R p = number of predictors N = total number of observations
Significance testing for Multiple R p = number of predictors N = total number of observations
Significance testing for Multiple R p = number of predictors N = total number of observations
Significance testing for Multiple R Fcrit Page # 737 Need two df Numerator df = p Denominator df = N – p - 1
Significance testing for Multiple R Fcrit Need two df Numerator df = p Denominator df = N – p - 1 F (2, 2) = 19.00
Multiple R If F > Fcrit reject Ho and accept H1 If F < or = Fcrit fail to reject Ho Current problem – fail to reject Ho These two variables do not predict the outcome
Practice The teaching salary example Based on 15 people Two IVs
Significance testing for Multiple R p = number of predictors N = total number of observations
Significance testing for Multiple R F crit (2,12) = 3.89
Multiple R If F > Fcrit reject Ho and accept H1 If F < or = Fcrit fail to reject Ho Current problem – accept H1 These two variables do predict the outcome
Detour Moving back to issues of correlation This will help with . . .
Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model
How strong is the relationship between publications and salary if we partial out the effect of years? What this is saying
Salary
Salary Publications
r2 SP = .35 Salary Publications
r2 SP = .35 Salary Publications r2 is a ratio = Variance explained / Total Variance Total variance of Salary = 1 (standardized)
r2 SP = .35 .65 Salary Publications
e Salary a b c Publications Years
? e Salary a b c Publications Years
? e Salary a b c Publications Years How strong is the relationship between publications and salary if we partial out the effect of years?
Semipartial correlation of publications and salary Years
Semipartial correlation of publications and salary Years Multiple R2 = a + c + b
Multiple R
R2 = .53 or a + b + c e Salary a b c Publications Years
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c e Salary Publications Years
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c)
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c) or R2 – r2sy
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = R2 – r2sy = .53 - .50 = .03 Thus semipartial correlation = .17
R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years What is the correlation between years and salary controlling for publications?