Download presentation
Published byCurtis Dalton Modified over 9 years ago
1
1) Use the data in the table to find the value of the linear correlation coefficient r.
2nd Day: Bear Example Length (in) Weight (lb) 53 80 67.5 344 72 416 348 73.5 262 68.5 360 73 332 37 34 2) Based on this data, does there appear to be a relationship between the length of a bear and its weight? If so, what is the relationship? Comment on the form, direction, and strength. 3) Find the residuals and make a residual plot. Is the LSRL a good model for the data? Why? 4) If a researcher anesthetizes a bear and uses a tape measure to find that it is 71 inches long, how do we use that length to predict the bear's weight?
2
Residual = observed y – predicted y
A residual plot plots the residuals on the vertical axis against the explanatory variable on the horizontal axis. The plot magnifies residuals and makes patterns easier to see. The mean of the residuals is always zero
3
Residual Plot Y = 0 helps orient us TI83
Enter data from table, p. 234 Find vital stats Find residuals for data
4
Coefficient of Determination
Numerical quantity that tells us how well the LSRL predicts values of y. R-sq: 2 components (SSM and SSE) Shows us how much better the LSRL is at predicting y than if we just used y-bar as our prediction for every point. If we have little info on predicting y (or if r is weak), we use as a predictor of y instead of y-hat.
5
Example Data set: x 0 3 6 y Association between x and y: positive, but weak = 3, = 4 Some use as a predictor of y, since r = (weak!) and we have little info on predicting y.
6
SSM Measures the total variation of the y-values if we use y-bar to make predictions Sum of Squares about the Mean = 4. The total area of these 3 squares is a measure of the total sample variability. SSM = X Y = 56
7
Sum of Squares for Error (SSE) (Sum of the squares of the deviations of the points about LSRL)
If x is a good predictor of y, then the deviations and SSE will be small. If all the points fall exactly on a regression line, SSE = 0. LSRL: Y-intercept = 3, passing through = (3, 4) (always the case) SSE = X Y = 54
8
Coefficient of Determination
The difference SSM-SSE measures the amount of variation of y that can be explained by the regression line of y on x. The ratio of these two quantities is the proportion of the total sample variability that is explained by the least-squares regression of y on x. For data set A, = (56-54)/56 = .0357 That is, 3.57% of the variation in y is explained by least-squares regression of y on x Check with Calculator
9
Points …. If x is a poor predictor of y, then SSM and SSE are about the same In our ex: if SSM = 56 and SSE = 54 Poor prediction line.
10
Understanding Regression
When you report a regression, is a measure of how successful the regression was in explaining the response (y). When you see a correlation, square it to get a better feel for the strength of the association. Perfect correlation means = 1, so 100% of the variation in one variable is accounted for by the linear relationship with the other variable. If r = -.7 or +.7, = .49 and about half the variation is accounted for by the linear relationship.
11
Another Ex.
12
3.3: Correlation and Regression Wisdom (Residual Plot: Helps identify outliers)
15
Outlier vs. Influential point
Child 19 = outlier (but doesn’t affect regression line much due to other points with similar x-values nearby) Child 18 = influential point, small residual (close to line but far out in x-direction, strong influence on regression line)
16
Misc. Not all outliers are influential
The LSRL is most likely to be heavily influenced by observations that are outliers in the x direction. Influential points often have small residuals since they pull the LSRL towards themselves. Find the LSRL with and without the suspect point. If the line moves more than a small amount, the point is influential.
17
Strong positive linear association. The correlation is r =. 9749
Strong positive linear association. The correlation is r = Since r-sq = .9504, the regression of of y on x will explain 95% of the variation in the values of y.
18
The AP Statistics exam was first administered in May 1997 to the largest first-year group in any discipline in the AP program. Since that time, the number of students taking the exam has grown at an impressive rate. Here are the actual data. Begin by entering them into your calculator lists. Year # students ,667 ,486 ,240 ,118 ,259 ,824 ,230 ,878 ,786 1. Use your calculator to construct a scatterplot of these data using 1997 as Year 1, 1998 as Year 2, etc. Describe what you see. 2. Find the equation of the least-squares line on your calculator. Record the equation below. Be sure to define any variables used. 3. Interpret the slope of the least-squares line in context. 4. How many students would you predict took the AP Statistics exam in 2006? Show your method. 5. Construct a residual plot. Sketch it in the space below. Comment on what the residual plot tells you about the quality of your linear model. 6. Interpret the value of from your calculator in the context of this problem.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.