Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.

Similar presentations


Presentation on theme: "1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points."— Presentation transcript:

1

2 1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points

3  Often computer output and graphs are provided with bivariate data analysis questions, but you cannot be sure that these will be provided.  You need to know how to use your calculator to make a scatter plot, find the equation of a least-squares regression line, and make a residual plot.

4  Comment on the direction, (positive or negative), shape (linear or curved), and strength of the relationship.  As always, IN CONTEXT.  Also, comment on unusual features (such as outliers).

5  The Least Squares Regression Line (LSRL) passes through the point and has slope  When you write the equation of a LSRL, be sure to include the “hat” on the y-variable.  Be sure to identify the variables in the equation; that is, tell what x and y stand for in the problem.  You must be able to read computer output: the constant is the y-intercept, and the named quantity is the slope.

6 Always put your explanation in context Do NOT make a deterministic statement The y-intercept provides an estimate for the value of y when x is zero. The slope provides information about the estimated amount that the y- variable changes (or the amount that the y-variable changes on average) for each unit change in the x-variable.

7  Suppose we took a series of measurements on students’ foot length (X) and height (Y). We suspect that there is a positive, linear relationship between these two variables.

8 Predictor Coef StDev t-ratio p-value Constant 58.2 3.21 1.17 0.069 Foot 1.24 0.17 5.88 <0.00001 (“Height” is the dependent variable) * Here, the LSRL would be Y-hat = 1.24X + 58.2, where X is the foot length and y-hat is the predicted height. * Slope Interpretation: “On average, for every one inch increase in foot length, there is a 1.24 inch increase in predicted height.”

9  Residual = observed y-value – predicted y- value (Y minus Y-hat) Positive residuals are associated with points lying above the line; negative residuals are associated with points lying below the line. To determine whether the model is a good fit for the data, examine a residual plot : The residuals should be randomly scattered above and below the horizontal axis with no pattern showing curvature.

10  Extrapolation is predicting outside the observed set of data and can be risky. Interpreting the y- intercept often is meaningless when it involves extrapolation.  Interpolation is predicting within the observed data.  An influential point is one that noticeably affects the equation of the least squares line when it is added to or removed from the data set.  An outlier is a point that noticeably stands apart from the other points.

11 The correlation coefficient (r) gives information about the strength and direction of the linear relationship between two variables. The correlation coefficient gives information about how tightly points are clustered about a line. Always between -1 and 1. Special values: r = 1, r = -1, r = 0. The correlation coefficient is sensitive to the effect of outliers. Correlation NEVER has units and is the same for (y, x), (x, y), (kx, y), (x, ky). Correlation IS NOT CAUSATION. A correlation value close to 1 or -1 does not guarantee that a linear model is appropriate.

12  r 2 is the “Coefficient of Determination” – it tells what percent of the variation in the observed y-values is explained by the linear relationship with the x-variable.  Sometimes it’s just pronounced “r- squared”!  Be sure that you can interpret r 2 in context.

13 FOR EXAMPLE: Predictor Coef StDev t-ratio p-value Constant 58.2 3.21 1.17 0.069 Foot 1.24 0.17 5.88 <0.00001 R-sq = 0.934 “93.4% of the variation in height (the Y’s) can be explained by our linear model with foot length being the explanatory variable (the X’s).”


Download ppt "1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points."

Similar presentations


Ads by Google