Download presentation
Presentation is loading. Please wait.
Published byArthur Snow Modified over 8 years ago
1
Least Squares Regression Textbook section 3.2
2
Regression LIne A regression line describes how the response variable (y) changes as an explanatory variable (x) changes.A regression line describes how the response variable (y) changes as an explanatory variable (x) changes. We use the regression line to PREDICT the value for y for a given value x.We use the regression line to PREDICT the value for y for a given value x.
3
Interpreting a regression line A regression line is a model for the data – line the density curves in chapter 2A regression line is a model for the data – line the density curves in chapter 2 It’s a compact description of the relationship between the variables.It’s a compact description of the relationship between the variables. The equation for a regression line isThe equation for a regression line is (read “y hat”) is the predicted value for the response variable. (read “y hat”) is the predicted value for the response variable. The regression line for our data from before would beThe regression line for our data from before would be
4
Prediction The accuracy of predictions from a regression line depends on how much the data are scattered about the line.The accuracy of predictions from a regression line depends on how much the data are scattered about the line. Using our line, what if we wanted to predict the selling price for a truck with 100,000 miles.Using our line, what if we wanted to predict the selling price for a truck with 100,000 miles. This is a fairly reasonable prediction.This is a fairly reasonable prediction. What if we wanted to find the price of a truck with 300,000 miles?What if we wanted to find the price of a truck with 300,000 miles? This is called extrapolation because the value is outside of our data set.This is called extrapolation because the value is outside of our data set. We’d have to pay someone else to take our truck!! Always check for a reasonable answer.We’d have to pay someone else to take our truck!! Always check for a reasonable answer.
5
Least squares regression & Residuals In most cases, no line will pass exactly through the data. A good regression line minimizes the vertical distance between the actual data points and the line itself.In most cases, no line will pass exactly through the data. A good regression line minimizes the vertical distance between the actual data points and the line itself. The red lines denote the residuals – the distance between the actual data and the predicted point on the line. The Least Squares Regression Line minimizes the sum of the squared residuals.
6
Residual Plots Residual Plots help us to determine if our regression model is appropriate for our data. Residual Plots help us to determine if our regression model is appropriate for our data. On a residual plot, if the points are scattered evenly above and below the line with no distinct pattern, we can be confident that our linear model is appropriate.On a residual plot, if the points are scattered evenly above and below the line with no distinct pattern, we can be confident that our linear model is appropriate.
7
The role of r 2 r 2 is called the coefficient of determination and is the proportion of variation of y values that is accounted for by the least-squares regression line.r 2 is called the coefficient of determination and is the proportion of variation of y values that is accounted for by the least-squares regression line. For our truck prices example, r 2 = 0.664. Therefore, we say that “66.4% of the variation in price is accounted for by the linear model relating price to miles driven.”For our truck prices example, r 2 = 0.664. Therefore, we say that “66.4% of the variation in price is accounted for by the linear model relating price to miles driven.” We’re still discussing how well the line fits the data. If r 2 is close to 1, then the line fits the data well.We’re still discussing how well the line fits the data. If r 2 is close to 1, then the line fits the data well.
8
Interpreting computer regression output y-intercept (a) Slope (b) Standard Deviation of the Residuals r2r2 We almost always ignore these values. We’ll discuss these in Chapter 12
9
Putting it all together Does the age at which a child begins to talk predict a later score on a test of mental ability? A study of the development of young children recorded the age in months at which each of 21 children spoke their first word and their Gesell Adaptive Score, the result of an aptitude test taken much later.Does the age at which a child begins to talk predict a later score on a test of mental ability? A study of the development of young children recorded the age in months at which each of 21 children spoke their first word and their Gesell Adaptive Score, the result of an aptitude test taken much later. Should we use a linear model to predict a child’s Gesell score from his or her age at first word? If so, how accurate will predictions be?Should we use a linear model to predict a child’s Gesell score from his or her age at first word? If so, how accurate will predictions be? Age1526109152018118207 Score95718391102879310010494113 Age91011 101242171110 Score9683841021001055712186100
10
Continued… How do we answer the questions: Is a linear model appropriate and if so, how well does the least-squares regression line fit the data?How do we answer the questions: Is a linear model appropriate and if so, how well does the least-squares regression line fit the data? 1.Make a scatterplot of the data. 1.Describe what you see. (FODS) – Negative, moderately strong, linear pattern. There appears to be two outliers as one child has a very high score for the age at first word, and another child din’t speak until much later. 2.Check Residuals. 1.No pattern distinguished – linear model seems appropriate 3.Calculator output gives us r 2 = 0.41 which means that only 41% of the variation in Gesell score is accounted for by our linear model of age and score.
11
Correlation and Regression Wisdom 1.The distinction between response and explanatory variables is important in regression!! 2.Correlation and regression lines describe only linear relationships. 1.Pictures p. 188 3.Correlation and least-squares regression lines are not resistant. 4.Strong Association does NOT imply causation!!
12
Outliers vs. Influential points This point has a large residual, but does not change the LSRL. It is an outlier, but not very influential. This point has a small residual, but changes the LSRL quite a bit. It is very influential. Influential points often have small residuals because they pull the line towards themselves.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.