Download presentation
Presentation is loading. Please wait.
Published byMitchell French Modified over 9 years ago
1
1 Chapter 10, Part 2 Linear Regression
2
2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable is explanatory, and the other is the response. Today: If we know the value of the explanatory variable, can we predict the value of the response variable? Predictions with Scatterplots
3
The Regression Line To make predictions, we’ll find a straight line that is the “best fit” for the points in the scatterplot. This is not so simple….
4
Regression Line in JMP Start by making a scatterplot. Red Triangle menu -> “Fit Line.” The equation of the regression line appears under the “Linear Fit” group. JMP uses column headings as variable names (instead of x and y). Example from the Cars 1993 file: MaxPrice = 2.3139014 + 1.1435971*MinPrice
5
Predicted Values We use the equation of the regression line to make predictions about… Individuals not in the original data set. Later measurements of the same individuals. Example: In 1994, a vehicle had a Min. Price of $15,000. Use the previous data to predict the Max. Price. You can do this by hand from the equation: MaxPrice = 2.3139014 + 1.1435971*MinPrice 2.3139014+1.1435971*(15) = 19.4678579
6
Are the Predictions Useful? In some cases, the regression line is more useful for predicting values. Consider the following examples (from Cars 1993):
7
7 Coefficient of Determination If the scatterplot is well-approximated by a straight line, the regression equation is more useful for making predictions. Correlation is one measure of this. The square of the correlation has a more intuitive meaning: What proportion of variation in the Response Variable is explained by variation in the Explanatory Variable? JMP: “RSquare” under “Summary of Fit”
8
Coefficient of Determination In predicting Max. Price from Min. Price, we had RSquare = 0.822202. About 82% of the variation in Max. Price is explained by a variation in Min. Price. In predicting Highway MPG from Engine size, we have RSquare = 0.392871 Only 39% of the variation in Highway MPG is explained by a variation in Engine Size.
9
Coefficient of Determination RSquare takes values from 0 to 1. For values close to 0, the regression line is not very useful for predictions. For values close to 1, the regression line is more useful for making predictions. RSquare makes no distinction between positive and negative association of variables.
10
10 Residuals For each individual in the data set we can compute the difference (error) between the actual and predicted values of the response variable. This difference is called a residual: Residual = (actual value) – (predicted value) In JMP: Click the red triangle by “Linear Fit” and select “Save Residuals” from the drop- down menu. You can also “Plot Residuals.”
11
11 How does JMP find the Regression Line? JMP uses the most popular method, Ordinary Least Squares (OLS). To measure how a given line fits the data: Compute all residuals, take the square of each. Add up the results to get a “total error.” The closer this total is to zero, the better the line fits the data. Choose the line with the smallest “total error.” (Thankfully) JMP takes care of the details.
12
12 Limitations of Correlation and Linear Regression: Both describe linear relationships only. Both are sensitive to outliers. Beware of extrapolation: predicting outside of the given range of the explanatory variable. Beware of lurking variables: other factors that may explain a strong correlation. Correlation does not imply causality!
13
13 Beware Extrapolation! A child’s height was plotted against her age... Can you predict her height at age 8 (96 months)? Can you predict her height at age 30 (360 months)?
14
14 Beware Extrapolation! Regression line: y = 71.95 +.383 x Height at 96 months? y = 94.93cm (3' 6'') Height at 360 months? y = 209.8cm ( 6’ 10'') Height at birth (x = 0)? y = 71.95cm (2’ 4”)
15
Beware Lurking Variables! Although there may be a strong correlation (statistical relationship) between two variables, there might not be a direct practical (cause-and-effect) relationship. A lurking variable is a third variable (not in the scatterplot) that might cause the apparent relationship between explanatory and response variables.
16
Example: Pizza vs. Subway Fare The regression line to the right shows a strong correlation (0.9878) between the cost of: A slice of pizza Subway fare Q: Does the price of pizza affect the price of the subway?
17
17 In a study of emergency services, it was noted that larger fires tend to have more firefighters present. Suppose we used: –Explanatory Variable: Number of firefighters –Response Variable: Size of the fire We would expect a strong correlation. But it’s ludicrous to conclude that having more firefighters present causes the fire to be larger. Caution: Correlation Does Not Imply Causation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.