Download presentation
Presentation is loading. Please wait.
Published byTyler Poole Modified over 9 years ago
1
June 30, 2008Stat 111 - Lecture 16 - Regression1 Inference for relationships between variables Statistics 111 - Lecture 16
2
June 30, 2008Stat 111 - Lecture 16 - Regression2 Administrative Notes Homework 5 due on Wednesday, July 1 st
3
Final is this Thursday –Will start at exactly 10:40 –Will end at exactly 12:10 –Get here a bit early or you will be wasting time on your exam Final will focus on material from after midterm There will be at least two questions that require knowing material from before midterm Calculators are definitely needed! Single 8.5 x 11 cheat sheet (two-sided) allowed June 30, 2008Stat 111 - Lecture 16 - Regression3 Final Exam
4
June 30, 2008Stat 111 - Lecture 16 - Regression4 Inference Thus Far Tests and intervals for a single variable Tests and intervals to compare a single variable between two samples For the last couple of classes, we have looked at count data and inference for population proportions Before that, we looked at continuous data and inference for population means Next couple of classes: inference for a relationship between two continuous variables
5
June 30, 2008Stat 111 - Lecture 16 - Regression5 Two Continuous Variables Remember linear relationships between two continuous variables? Scatterplots Correlation Best Fit Lines
6
June 30, 2008Stat 111 - Lecture 16 - Regression6 Scatterplots and Correlation Visually summarize the relationship between two continuous variables with a scatterplot If our X and Y variables show a linear relationship, we can calculate a best fit line between Y and X Education and Mortality: r = -0.51 Draft Order and Birthdayr = -0.22
7
June 30, 2008Stat 111 - Lecture 16 - Regression7 Linear Regression Best fit line is called Simple Linear Regression Model: Coefficients: is the intercept and is the slope Other common notation: 0 for intercept, 1 for slope Our Y variable is a linear function of the X variable but we allow for error (ε i ) in each prediction We approximate the error by using the residual Observed Y i Predicted Y i = + X i
8
June 30, 2008Stat 111 - Lecture 16 - Regression8 Residuals and Best Fit Line 0 and 1 that give the best fit line are the values that give smallest sum of squared residuals: Best fit lineis also called the least-squares line e i = residual i
9
June 30, 2008Stat 111 - Lecture 16 - Regression9 Best values for Regression Parameters The best fit line has these values for the regression coefficients: Also can estimate the average squared residual: Best estimate of slope Best estimate of intercept
10
June 30, 2008Stat 111 - Lecture 16 - Regression10 Example: Education and Mortality Mortality = 1353.16 - 37.62 · Education Negative association means negative slope b
11
June 30, 2008Stat 111 - Lecture 16 - Regression11 Interpreting the regression line The estimated slope b is the average change you get in the Y variable if you increased the X variable by one Eg. increasing education by one year means an average decrease of ≈ 38 deaths per 100,000 The estimated intercept a is the average value of the Y variable when the X variable is equal to zero Eg. an uneducated city (Education = 0) would have an average mortality of 1353 deaths per 100,000
12
June 30, 2008Stat 111 - Lecture 16 - Regression12 Example: Vietnam Draft Order Draft Order = 224.9 - 0.226 · Birthday Slightly negative slope means later birthdays have a lower draft order
13
June 30, 2008Stat 111 - Lecture 16 - Regression13 Significance of Regression Line Does the regression line show a significant linear relationship between the two variables? If there is not a linear relationship, then we would expect zero correlation (r = 0) So the estimated slope b should also be close to zero Therefore, our test for a significant relationship will focus on testing whether our true slope is significantly different from zero: H 0 : = 0 versus H a : 0 Our test statistic is based on the estimated slope b
14
June 30, 2008Stat 111 - Lecture 16 - Regression14 Test Statistic for Slope Our test statistic for the slope is similar in form to all the test statistics we have seen so far: The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate We will not be doing this calculation manually because the JMP software does this calculation for us!
15
June 30, 2008Stat 111 - Lecture 16 - Regression15 Example: Education and Mortality
16
June 30, 2008Stat 111 - Lecture 16 - Regression16 p-value for Slope Test Is T = -4.53 significantly different from zero? To calculate a p-value for our test statistic T, we use the t distribution with n-2 degrees of freedom For testing means, we used a t distribution as well, but we had n-1 degrees of freedom before For testing slopes, we use n-2 degrees of freedom because we are estimating two parameters (intercept and slope) instead of one (a mean) For cities dataset, n = 60, so we have d.f. = 58 Looking at a t-table with 58 df, we discover that the P(T < -4.53) < 0.0005
17
June 30, 2008Stat 111 - Lecture 16 - Regression17 Conclusion for Cities Example Two-sided alternative: p-value < 2 x 0.0005 = 0.001 We could get the p-value directly from the JMP output, which is actually more accurate than t-table Since our p-value is far less than the usual -level of 0.05, we reject our null hypothesis We conclude that there is a statistically significant linear relationship between education and mortality
18
June 30, 2008Stat 111 - Lecture 16 - Regression18 Next Class - Lecture 17 More Inference for Regression Moore, McCabe and Craig: 10.1, 2.4-2.5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.