Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.

Similar presentations


Presentation on theme: "The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company."— Presentation transcript:

1 The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company

2 Chapter Objectives Compute a confidence interval for the slope of the regression line. Conduct a test of the hypothesis that the slope of the regression line is 0(or that the correlation is 0) in the population. We are doing inference on the LSRL – “least-squares regression line” We will be using to estimate a parameter

3 Example. Crying and IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and their later IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured the intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at the age of three years using the Standford-Binet IQ test.

4 Infants’ crying and IQ scores CryingIQCryingIQCryingIQCryingIQ 1087209017941294 1297161001910312103 9 231031310414106 16106271081810910109 18109151121811223113 1511421114161189119 12119121201912016124 20132151332213531135 16136171413015522157 3315913162 Discuss the W 5 HW for the data.

5 Create a scatterplot of the data and calculate the correlation. The correlation between crying and IQ is r = 0.455. LSRL: Recall: To get the LSRL on the scatterplot enter linreg(a + bx) L1, L2, Y1 Interpret the LSRL.

6 Conditions for the Regression Model

7 Inference for Regression The heart of this model is that there is an “on the average” straight- line relationship between y and x. The true regression line says that the mean response moves long a straight line as the explanatory variable x changes. We can’t observe the true regression line. The values of y that we do observe vary about their means according to the Normal distribution. If we hold x fixed and take many observations of y, the Normal pattern will eventually appear in a stemplot or a histogram

8 Inference for Regression In practice, we observe y for many different values of x, so that we see an overall linear pattern formed by points scattered about the true line. The standard deviation σ determines whether the points fall close to the true regression line (small σ) or widely scattered (large σ).

9 The line in the figure is the true regression line. The mean of the response y moves along this line as the explanatory variable x takes different values. The Normal curves show how y will vary when x is held fixed at different values. All of the curves have the same σ, so the variability of y is the same for all values of x.

10 Checking Conditions Before we do inference, we must check these conditions one by one. 1.The observations are independent. Repeated observations on the same individual are not allowed. 2.The true relationship is linear. Always check your residual plot and original scatterplot 3.The standard deviation of the response about the true line is the same everywhere Check residuals for fanning (this is bad) The standard deviation needs to remain fixed, not change with x as the mean response changes with x 4.The response varies Normally about the true regression line. Make a histogram of the residuals and check for clear skewness or other major departures from Normality.

11 Getting the residuals After you perform linear regression using your graphing calculator, the residuals will automatically be stored as a list. To get them into L3 so you can use them, highlight L3 and press 2 nd STAT (LIST) and scroll down until you see RESID and press ENTER –Or you can enter L2 – Y1(L1).

12 Checking the residuals There does not seem to be a pattern, so a linear model is appropriate. The spread of the line does not seem to be changing as x increases which tells us the standard deviation is constant.

13 Checking the residuals A stemplot or a histogram can be used to demonstrate that the residuals are approximately Normal. There is a slight right-skew, but we see no gross violations of the condition. HW: pg. 894 #15.1, 15.2, 15.4

14 Estimating the Parameters The first step in inference is to estimate the unknown parameters α, β, and σ. The slope b of the LSRL is an unbiased estimator of β. The intercept a of the LSRL is an unbiased estimator of α. We will use s (the sample standard deviation of the residuals) to estimate σ.

15 The degrees of freedom of s are n – 2. To get Σx 2 perform 1-Var Stats on L3 (where you stored you resid) This is a very “reserved” method. You will get a better estimate if you perform a LinRegTTest (later)

16 Confidence Intervals for the Regression Slope This formula looks like on the formula sheet.

17 You should rarely have to calculate the standard error by hand. Regression software will give the standard error along with b itself. Press STAT, then TEST, then LinRegTTest. Use the data from crying for IQ. Confidence Intervals for the Regression Slope s = 17.4987 is the estimate for σ. SE b = b/t

18 Construct and interpret a 95% confidence interval for the mean IQ increase for each additional peak in crying. b ± t*SE b = 1.4929 ± (2.042)(0.4870) = 1.4929 ± 0.9944 = (0.4985, 2.4873) SE b = b/t = 1.4929/3.0655 = 0.4870 t*(36) = invT(.975, 36) = 2.0281 or table with df = 30, t*=2.042 We are 95% confident that the true mean IQ increases by between 0.5 and 2.5 points for each additional peak in crying.

19 Minitab output You are going to have be able to read different computer outputs for the exam.

20 Testing the Hypothesis of No Linear Relationship The most common hypothesis about the slope is H 0 : β = 0 The regression line with slope 0 is horizontal. –The mean of y does not change at all when x changes. So this H 0 says that there is no true linear relationship between x and y. We can use the test for zero slope to test the hypothesis of zero correlation between any two quantitative variables. Note: testing correlation only makes sense if the observations are a random sample.

21 Note: most software gives t and it’s P-value for a two-sided test

22 Crying and IQ H0: β = 0 Ha: β ≠ 0 Our test statistic t = 3.0655 and our p-value is 0.004 We have very strong evidence that IQ is correlated with crying.

23 StudentBeersBACStudentBeersBAC 150.10930.02 220.031050.05 390.191140.07 480.121260.10 530.041350.085 670.0951470.09 730.071510.01 850.061640.05 Beer and BAC A previous example looked at how well the number of beers a student drinks predicts his/her blood alcohol content. 16 student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, their BAC was measured by a police officer.

24 1) Perform a hypothesis test to prove that the number of beers has a positive effect on BAC. 2) Construct and interpret a 95% confidence interval for the true mean increase of blood alcohol content for each additional beer. First check that the 4 conditions are satisfied. 1) Independence – We can treat the sample of students as an independent random sample, since the number of beers were assigned randomly. 2) Linear relationship – check scatterplot and residual plot.

25 Scatterplot Residuals The scatterplot shows a strong linear relationship and the residual plot shows no patter. A linear model is appropriate. 3)Checking standard deviation. The residuals overall seem to be the same distance away from y = 0 2) Checking linear

26 4)Checking Normality A histogram of the residuals shows that the data are slightly skewed right, but there doesn’t seem to be any gross variations. We can now proceed with our calculations.

27 Here is the output from Minitab What is the t test statistic? (show your work, not just reading it from the chart) t = b/SE b = 0.017964/0.002402 ≈ 7.48

28 The p-value for a two-sided test is 0.000 (for three decimal places) The one-sided p-value is half of that, so it is also close to 0. We can reject H 0 and conclude that there is strong evidence that an increased number of beers does increase BAC. The number of beers predicts blood alcohol level quite well. Five beers produced an average BAC of ŷ = -0.0127 + (0.0180)(5) = 0.077 which is close to the legal driving limit of 0.08 in many states.

29 The 95% confidence interval is b ± t*SE b t*(14) = invT(.975, 14) = 2.145SE b = 0.002402 b = 0.017964 0.017964 ± (2.145)(0.002402) 0.017964 ± 0.00515 = (0.012814, 0.023114) We are 95% confident that the true mean increase in BAC for each additional beer consumed is between 0.012814 and 0.023114. HW: pg. 900 #15.6, 15.8/ pg. 908 #15.11


Download ppt "The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company."

Similar presentations


Ads by Google