Yesterday Correlation Regression -Definition -Deviation Score Formula, Z score formula -Hypothesis Test Regression Intercept and Slope Unstandardized Regression Line Standardized Regression Line Hypothesis Tests
Summary Correlation: Pearson’s r Unstandardized Regression Line
Some issues with r Outliers have strong effects Restriction of range can suppress or augment r Correlation is not causation No linear correlation does not mean no association
Outliers Child 19 is lowering r Child 18 is increasing r
The restricted range problem The relationship you see between X and Y may depend on the range of X For example, the size of a child’s vocabulary has a strong positive association with the child’s age But if all of the children in your data set are in the same grade in school, you may not see much association
Common causes, confounds Two variables might be associated because they share a common cause. There is a positive correlation between ice cream sales and drownings. Also, in many cases, there is the question of reverse causality
Non-linearity Some variables are not linearly related, though a relationship obviously exists For monotonic relationships that are not linear we use Spearman’s r
Regression: Analyzing the “Fit” How well does the regression line describe the data? Assessing “fit” relies on analysis of residuals Are the residuals randomly distributed? (If no, perhaps a linear model is inappropriate) How large are the residuals? Too big? (low correlation means big residuals)
Assumptions of Regression The residuals have mean of 0 and variance of sresid2 The residuals are uncorrelated with X The residuals are homoscedastic (similarly sized across the range of x)
Residual Diagnostics I: Graphing
Residual Diagnostics I: Graphing Residual Plot resid Problem: curvilinearity
Residual Diagnostics I: Graphing Agreeableness Time 2
Residual Diagnostics I: Graphing Residual Plot Residuals Problem: heteroscedasticity
Regression: Analyzing the “Fit” How well does the regression line describe the data? Assessing “fit” relies on analysis of residuals Are the residuals randomly distributed? (If no, perhaps a linear model is inappropriate) How large are the residuals? Too big? (low correlation means big residuals) Residual plots ANOVA
Regression ANOVA SSY SSmodel SSresid Y Y’
Regression ANOVA Source SS df s2 Model Error Total F=t2 “the amount of variance in Y explained by our model”
Exercise X Y Fill in the ANOVA table 1 3 4 5 6 9 7 Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5
Exercise X Y Y’ (Y-Y’)2 1 3 4 5 6 9 7 SSresid = … SSmodel = … Predicted value (Y-Y’)2 Residual (Unpredicted deviation) (Predicted Deviation) 1 3 4 5 6 9 7 SSresid = … SSmodel = … Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5
Exercise X Y Y’ (Y-Y’)2 1 3 3.5 (-0.5)2 (-1.5)2 4 (0.5)2 5 (-1)2 (0)2 Predicted value (Y-Y’)2 Residual (Unpredicted deviation) (Predicted Deviation) 1 3 3.5 (-0.5)2 (-1.5)2 4 (0.5)2 5 (-1)2 (0)2 6 (1)2 9 6.5 (1.5)2 7 SSresid = … SSmodel = … Mean: 5 5 Stdevp: 3.27 1.41 r= 0.866025404 b= 0.375 a= 3.125 X Y Y' 1 3 3.5 1 4 3.5 5 4 5 5 6 5 9 6 6.5 9 7 6.5 3 9
Regression ANOVA Source SS df s2 F Model Error Total
Regression ANOVA Source SS df s2 F Model 9 1 12 Error 3 4 .75 Total 5