Download presentation
Presentation is loading. Please wait.
Published byKarin Dean Modified over 9 years ago
1
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111 - Lecture 17
2
July 1, 2008Lecture 17 - Regression Testing2 Administrative Notes Homework 5 due tomorrow Lecture on Wednesday will be review of entire course
3
July 1, 2008Lecture 17 - Regression Testing3 Final Exam Thursday from 10:40-12:10 It’ll be right here in this room Calculators are definitely needed! Single 8.5 x 11 cheat sheet (two-sided) allowed I’ve put a sample final on the course website
4
July 1, 2008Lecture 17 - Regression Testing4 Outline Review of Regression coefficients Hypothesis Tests Confidence Intervals Examples
5
July 1, 2008Lecture 17 - Regression Testing5 Two Continuous Variables Visually summarize the relationship between two continuous variables with a scatterplot Numerically, we focus on best fit line (regression) Education and MortalityDraft Order and Birthday Mortality = 1353.16 - 37.62 · EducationDraft Order = 224.9 - 0.226 · Birthday
6
June 30, 2008Stat 111 - Lecture 16 - Regression6 Best values for Regression Parameters The best fit line has these values for the regression coefficients: Also can estimate the average squared residual: Best estimate of slope Best estimate of intercept
7
July 1, 2008Lecture 17 - Regression Testing7 Significance of Regression Line Does the regression line show a significant linear relationship between the two variables? If there is not a linear relationship, then we would expect zero correlation (r = 0) So the slope b should also be zero Therefore, our test for a significant relationship will focus on testing whether our slope is significantly different from zero H 0 : = 0 versus H a : 0
8
June 30, 2008Stat 111 - Lecture 16 - Regression8 Linear Regression Best fit line is called Simple Linear Regression Model: Coefficients: is the intercept and is the slope Other common notation: 0 for intercept, 1 for slope Our Y variable is a linear function of the X variable but we allow for error (ε i ) in each prediction We approximate the error by using the residual Observed Y i Predicted Y i = + X i
9
June 30, 2008Stat 111 - Lecture 16 - Regression9 Test Statistic for Slope Our test statistic for the slope is similar in form to all the test statistics we have seen so far: The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate We will not be doing this calculation manually because the JMP software does this calculation for us!
10
June 30, 2008Stat 111 - Lecture 16 - Regression10 Example: Education and Mortality
11
July 1, 2008Lecture 17 - Regression Testing11 Confidence Intervals for Coefficients JMP output also gives the information needed to make confidence intervals for slope and intercept 100·C % confidence interval for slope : b +/- t n-2 * SE(b) The multiple t * comes from a t distribution with n-2 degrees of freedom 100·C % confidence interval for intercept : a +/- t n-2 * SE(a) Usually, we are less interested in intercept but it might be needed in some situations
12
July 1, 2008Lecture 17 - Regression Testing12 Confidence Intervals for Example We have n = 60, so our multiple t * comes from a t distribution with d.f. = 58. For a 95% C.I., t * = 2.00 95 % confidence interval for slope : -37.6 ± 2.0*8.307 = (-54.2,-21.0) Note that this interval does not contain zero! 95 % confidence interval for intercept : 1353± 2.0*91.42 = (1170,1536)
13
July 1, 2008Lecture 17 - Regression Testing13 Another Example: Draft Lottery Is the negative linear association we see between birthday and draft order statistically significant? p-value
14
July 1, 2008Lecture 17 - Regression Testing14 Another Example: Draft Lottery p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order Statistical evidence that the randomization was not done properly! 95 % confidence interval for slope : -.23±1.98*.05 = (-.33,-.13) Multiple t * = 1.98 from t distribution with n-2 = 363 d.f. Confidence interval does not contain zero, which we expected from our hypothesis test
15
July 1, 2008Lecture 17 - Regression Testing15 Dataset of 78 seventh-graders: relationship between IQ and GPA Clear positive association between IQ and grade point average Education Example
16
July 1, 2008Lecture 17 - Regression Testing16 Education Example Is the positive linear association we see between GPA and IQ statistically significant? p-value
17
July 1, 2008Lecture 17 - Regression Testing17 Education Example p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA 95 % confidence interval for slope :.101±1.99*.014 = (.073,.129) Multiple t * = 1.99 from t distribution with n-2 = 76 d.f. Confidence interval does not contain zero, which we expected from our hypothesis test
18
July 1, 2008Lecture 17 - Regression Testing18 Next Class - Lecture 18 Review of course material
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.