Inference for Regression

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Objectives (BPS chapter 24)
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Inferences for Regression
Chapter Topics Types of Regression Models
Chapter 12 Section 1 Inference for Linear Regression.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Inference for Regression
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lesson Inference for Regression. Knowledge Objectives Identify the conditions necessary to do inference for regression. Explain what is meant by.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Lesson Testing the Significance of the Least Squares Regression Model.
Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
CHAPTER 12 More About Regression
23. Inference for regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Regression.
Regression Analysis: Statistical Inference
AP Statistics Chapter 14 Section 1.
Regression Inferential Methods
Inferences for Regression
CHAPTER 12 More About Regression
Regression.
Monday, April 10 Warm-up Using the given data Create a scatterplot
The Practice of Statistics in the Life Sciences Fourth Edition
Inferences for Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 12 Regression.
Regression.
Regression.
Regression.
Chapter 14 Inference for Regression
Regression Chapter 8.
Regression.
Chapter 12 Review Inference for Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Regression.
Chapter 14 Inference for Regression
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Inference for Regression
CHAPTER 12 More About Regression
Inferences for Regression
Inference for Regression
Presentation transcript:

Inference for Regression Chapter 27 Inference for Regression

Objectives Review of Regression coefficients Hypothesis Tests Confidence Intervals

Inference for Regression In the last few topics we explored inference for proportions and means. The same principles used for these topics will be applied in our study of inference for regression. Our inference for regression will focus on the slope (𝛽) of the least squares regression equation. There are two forms of inference dealing with the slope, 𝛽. A t-Test for slope 𝛽 A Confidence Interval for the slope 𝛽

Two Continuous Variables Visually summarize the relationship between two continuous variables with a scatterplot Numerically, we focus on LSRL (regression) Education and Mortality Draft Order and Birthday Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday

Regression Parameters When we fit a LSRL, 𝑦 =𝑎+𝑏𝑥, to a data set of two quantitative variables, we are hoping to be able to predict values of the response variable 𝑦 for given values of the explanatory variable 𝑥 . Different samples produce different estimates. The mean, 𝜇 𝑦 , of all of the possible responses has a linear relationship that represents the true regression line where 𝜇 𝑦 =𝛼+𝛽𝑥.

Regression Parameters The parameters 𝛽 and 𝛼 are estimated by a and b of the regression equation. 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 Estimate of slope  Estimate of intercept  𝑎= 𝑦 −𝑏 𝑥

Degrees of Freedom True regression line: 𝜇 𝑦 =𝛼+𝛽𝑥 The parameters 𝛼 and 𝛽 are estimated by a and b of the regression equation, 𝑦 =𝑎+𝑏𝑥, this equation having been derived from sample data. The appropriate test is a t-test for slope. Since there are two estimates in our equation, the degrees of freedom = n – 2.

Hypothesis Test for the Slope 𝛽 of a LSRL

Significance of Regression Line Does the regression line show a significant linear relationship between the two variables? If there is not a linear relationship, then we would expect zero correlation (r = 0) So the slope b should also be zero Therefore, our test for a significant relationship will focus on testing whether our slope  is significantly different from zero H0 :  = 0 versus Ha :   0  > 0  < 0

Standard Error The slope, b, varies with the sample taken. The sampling distribution model for the regression slopes is centered at the slope of the true regression line, 𝛽. We can estimate the standard error of the regression slope b: You won’t have to calculate SEb. Usually, it will be provided as part of a computer printout. s is the standard error about the line.

Standard Error The standard error about the line (s) tells us how our sample data vary from our predicted values. The standard error of the slope (SEb) tells us how our regression slope might vary with different samples. We use s in the calculation of SEb. The variability of our data affects the reliability of our slope.

Test Statistic The test statistic, tdf, is calculated in the same way as the test statistic for means and proportions.

Example of Computer Output for Regression Analysis From the printout:

Assumptions and Conditions In Chapter 8 when we fit lines to data, we needed to check only the Straight Enough Condition. Now, when we want to make inferences about the coefficients of the line, we’ll have to make more assumptions (and thus check more conditions). We need to be careful about the order in which we check conditions. If an initial assumption is not true, it makes no sense to check the later ones.

Assumptions and Conditions (cont.) Linearity Assumption: Straight Enough Condition: Check the scatterplot—the shape must be linear or we can’t use regression at all.

Assumptions and Conditions (cont.) Linearity Assumption: If the scatterplot is straight enough, we can go on to some assumptions about the errors. If not, stop here, or consider re-expressing the data to make the scatterplot more nearly linear. Check the Quantitative Data Condition. The data must be quantitative for this to make sense.

Assumptions and Conditions (cont.) Independence Assumption: Randomization Condition: the individuals are a representative sample from the population. Check the residual plot (part 1)—the residuals should appear to be randomly scattered.

Assumptions and Conditions (cont.) Equal Variance Assumption: Does The Plot Thicken? Condition: Check the residual plot (part 2)—the spread of the residuals should be uniform.

Assumptions and Conditions (cont.) Normal Population Assumption: Nearly Normal Condition: Check a histogram of the residuals. The distribution of the residuals should be unimodal and symmetric. Outlier Condition: Check for outliers.

Conditions Straight Enough Condition Quantitative Data Condition Randomization Condition Does The Plot Thicken? Condition Nearly Normal Condition Outlier Condition

Assumptions and Conditions (cont.) If all four assumptions are true, the idealized regression model would look like this: At each value of x there is a distribution of y-values that follows a Normal model, and each of these Normal models is centered on the line and has the same standard deviation.

Example: Draft Lottery Is the negative linear association we see between birthday and draft order statistically significant? H0: 𝛽 = 0 There is no linear relationship between birthday and draft order. Ha: 𝛽 ≠ 0 There is a linear relationship between birthday and draft order. p-value

Example: Draft Lottery Conclusion: P-value = 0.0001, so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order Statistical evidence that the randomization was not done properly!

Example: Education Dataset of 78 seventh-graders: relationship between IQ and GPA Clear positive association between IQ and grade point average

Example: Education Is the positive linear association we see between GPA and IQ statistically significant? H0: 𝛽 = 0 There is no linear relationship between GPA and IQ. Ha: 𝛽 ≠ 0 There is a linear relationship between GPA and IQ. p-value

Example: Education Conclusion: P-value = 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA.

Ti - 84 Xlist: Stat/Tests/LinRegTTest Ylist: Freq: 𝛽 & α: ≠0 <0 >0 RegEQ: Calculate df = n – 2

Confidence Interval for the Slope 𝛽

Confidence Interval To construct a confidence interval 𝛽, the true slope of the LSRL, we use the usual format for a confidence interval: statistic ± (critical)(standard deviation of statistic) CI for : b ± t* (SEb) The values for b and SEb are obtained from a computer printout. The t* critical value is obtained from the t-distribution with n-2 degrees of freedom.

Example: Education and Mortality

Confidence Interval for Example We have n = 60, so our critical value t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00 95 % confidence interval for slope  : -37.6 ± (2.0)(8.307) = (-54.2,-21.0) Conclusion: We are 95% confident that the true slope is between -54.2 and -21.0. Note that this interval does not contain zero!