Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference for Linear Regression

Similar presentations


Presentation on theme: "Inference for Linear Regression"— Presentation transcript:

1 Inference for Linear Regression

2 L I N E R Conditions for Regression Inference
Linear- Examine the scatterplot to check that the overall pattern is roughly linear. Check to see that the residuals center on the “residual = 0” line at each x-value in the residual plot. Independent Look at how the data were produced. If sampling is done without replacement, remember to check the 10% condition. Normal Make a stemplot or histogram and check for clear skewness or other major departures from Normality. Equal variance Look at the scatter of the residuals above and below the “residual = 0” line in the residual plot. The amount of scatter should be roughly the same from the smallest to the largest x-value. Random See if the data were produced by random sampling or a randomized experiment. L I N E R

3 Does seat location matter?
Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement, or do better students simply choose to sit in the front? To investigate, an AP Statistics teacher randomly assigned students to seat locations in his classroom for a particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the students were assigned (row 1 is closest to the front and row 7 is the farthest away). Here are the results: Row 1: 76, 77, 94, 99 Row 2: 83, 85, 74, 79 Row 3: 90, 88, 68, 78 Row 4: 94, 72, 101, 70, 79 Row 5: 76, 65, 90, 67, 96 Row 6: 88, 79, 90, 83 Row 7: 79, 76, 77, 63 Construct a scatter plot of the data and find the equation of the least square regression line. Interpret the slope and the y intercept in the context of the problem.

4 A scatterplot, residual plot, histogram and Normal probability plot of the residuals are shown below. Check whether the conditions for performing inference about the regression model are met.

5 where = predicted score and x = row number
Here is computer output for the least-squares regression analysis on the seating chart data Regression Analysis: Score versus Row Predictor Coef SE Coef T P Constant Row S = R-Sq = 4.7% R-Sq(adj) = 1.3% (b) Interpret the slope, y intercept (if possible), and standard deviation of the residuals (a) State the equation of the least-squares regression line. Define any variables you use. where = predicted score and x = row number

6 Regression Analysis: Score versus Row
Predictor Coef SE Coef T P Constant Row S = R-Sq = 4.7% R-Sq(adj) = 1.3% (a) Identify the standard error of the slope SEb from the computer output. Interpret this value in context Based on your interval, is there convincing evidence that seat location affects scores? SEb = If we repeated the random assignment many times, the slope of the estimated regression line would typically vary by about from the slope of the true regression line for predicting test score from row number. Because the interval of plausible slopes includes 0, we do not have convincing evidence that there is an association between test score and row number. (b) Calculate and interpret the 95% confidence interval for the true slope. We are 95% confident that the interval from – to captures the slope of the true regression line relating a student’s test score y and the student’s row number x.

7 Fresh flowers? For their second-semester project, two AP Statistics students decided to investigate the effect of sugar on the life of cut flowers. They went to the local grocery store and randomly selected 12 carnations. All the carnations seemed equally healthy when they were selected. When they got home, the students prepared 12 identical vases with exactly the same amount of water in each vase. They put one tablespoon of sugar in 3 vases, two tablespoons of sugar in 3 vases, and three tablespoons of sugar in 3 vases. In the remaining 3 vases, they put no sugar. After the vases were prepared and placed in the same location, the students randomly assigned one flower to each vase and observed how many hours each flower continued to look fresh. Here are the data: Sugar (tbs) Freshness (hours) 0 168 0 180 0 192 1 192 1 204 2 204 2 210 3 222 3 228 3 234

8 Don’t forget the 4-steps
(a) Construct and interpret a 99% confidence interval for the slope of the true regression line. Sugar (tbs) Freshness (hours) 0 168 0 180 0 192 1 192 1 204 2 204 2 210 3 222 3 228 3 234 Don’t forget the 4-steps Conclude: We are 99% confident that the interval from 9.04 to captures the slope of the true regression line relating hours of freshness y to amount of sugar x.

9 Significance test for β
Crying and IQ Significance test for β Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants 4 to 10 days old and their later IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The table below contains data from a random sample of 38 infants

10 (a) Here is a scatterplot of the data with the least-squares line added and the minitab output. Describe what this graph tells you about the relationship between these two variables. (b) What is the equation of the least-squares regression line for predicting IQ at age 3 from the number of crying peaks (crying count)? Define any variables you use. (c) Interpret the slope and y intercept of the regression line in context. (d) Do these data provide convincing evidence that there is a positive linear relationship between crying counts and IQ in the population of infants? Carry out an appropriate test to help answer this question. predicted IQ score = (cry count)

11 Tipping at a buffet Do customers who stay longer at buffets give larger tips? Charlotte, an AP statistics student who worked at an Asian buffet, decided to investigate this question for her second semester project. While she was doing her job Time (minutes) Tip (dollars) 23 5.00 39 2.75 44 7.75 55 61 7.00 65 8.88 67 9.01 70 74 7.29 85 7.50 90 6.00 99 6.50 as a hostess, she obtained a random sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the amount of the tip (in dollars). Do these data provide convincing evidence that customers who stay longer give larger tips? Here is the data:

12 (d) Carry out an appropriate test to answer Charlotte’s question.
(a) Here is a scatterplot of the data with the least-squares regression line added. Describe what this graph tells you about the relationship between the two variables (c) Interpret the slope and y intercept of the least-squares regression line in context. Regression Analysis: Tip (dollars) versus Time (minutes) Predictor Coef SE Coef T P Constant Time (minutes) S = R-Sq = 13.2% R-Sq(adj) = 4.5% (b) What is the equation of the least-squares regression line for predicting the amount of the tip from the length of the stay? Define any variables you use. (d) Carry out an appropriate test to answer Charlotte’s question.

13 Exercises on page 759, #5-19 odds


Download ppt "Inference for Linear Regression"

Similar presentations


Ads by Google