Tutorial 4 MBP 1010 Kevin Brown
Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1 (perfect positive linear correlation). 0 indicates no linear association. – Location and scale independent
Linear Regression
Requires you to define? Y – independent variable X – dependent variable(s)
Allows you to answer what questions? Is there an association (same question as the Pearson correlation coefficient) What is the association? Measured as the slope.
Assumes Linearity Constant residual variance (homoscedasticity) / residuals normal Errors are independent (i.e. not clustered)
Homogeneity of variance
Outputs “estimates” intercept slope standard errors t values p-values residual standard error (SSE – what is this?) R 2
Linear regression example: height vs. weight Extract information: > summary(lm(HW[,2] ~ HW[,1])) Call: lm(formula = HW[, 2] ~ HW[, 1]) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) HW[, 1] e-05 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 48 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 38 DF, p-value: 5.022e-05
Linear regression example: height vs. weight Extract information: > summary(lm(HW[,2] ~ HW[,1])) Call: lm(formula = HW[, 2] ~ HW[, 1]) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) HW[, 1] e-05 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 48 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 38 DF, p-value: 5.022e-05
Example Televisions, Physicians and Life Expectancy (World Almanac Factbook 1993) example – Residuals & Outliers – High leverage points & influential observations – Dummy variable coding – Transformations Take home messages – Regression is a very flexible tool – correlation ≠ causation
Dummy coding Creates an alternate variable that’s used for analysis For 2 categories you set values of … – reference level to 0 – level of interest to 1
Residuals and Outliers
High Leverage Points and Influential Observations