Least-Squares Regression: Linear Regression Section 3.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates, Moore
Warm up/ quiz Draw a quick sketch of three scatterplots: –Draw a plot with r ≈.9 –Draw a plot with r ≈ -.5 –Draw a plot with r ≈ 0
Today’s Objective
Your Poster! Take a look at your poster: Do you think you could draw a straight line that would go straight through the middle where you have ½ your points above and ½ your points below? –Calculate your line: m = y2-y1 / x2-x1 Point slope form: y – y1 = m ( x - x1) In math-land this is known as a “line of best fit”
The local ice cream shop keeps track of how much ice cream they sell versus the noon temperature on that day. Temperature ° CIce Cream Sales 14.2 °$ °$ °$ °$ °$ °$ °$ °$ °$ °$ °$ °$ 408
Algebra …Line of Best Fit We can draw a “Line of Best Fit” on our scatter plot: When creating a line of best fit we try to have the line as close as possible to all points, and as many points above the line as below.
Statistics… Regression Line In Algebra our line is known as a “line of best fit” In statistics, this is called a regression line! A line that describes how a response variables y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Definition: Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form ŷ = a + bx In this equation, ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x. b is the slope, the amount by which y is predicted to change when x increases by one unit. a is the y intercept, the predicted value of y when x = 0.
Equation of Regression
The Meaning of Slope
Example in Context
Least-Squares Regression Interpreting a Regression Line Consider the regression line from the example “ Does Fidgeting Keep You Slim? ” Identify the slope and y - intercept and interpret each value in context. The y-intercept a = kg is the fat gain estimated by this model if NEA does not change when a person overeats. The slope b = tells us that the amount of fat gained is predicted to go down by kg for each added calorie of NEA.
Least-Squares Regression Prediction We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. Use the NEA and fat gain regression line to predict the fat gain for a person whose NEA increases by 400 cal whenshe overeats. We predict a fat gain of 2.13 kg when a person with NEA = 400 calories.
Extrapolation Take a look at your poster! Take a look at the range of your data along the x-axis Your line is linear- so it goes on and on even past your data points… Predict an output value when you input a large number outside your data range. Put it into context: examples? You just extrapolated! Going outside your range of data! Definition: Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate.
Extrapolation Warning The use of a regression line for prediction outside the interval of values of the explanatory variable x used to obtain the line are often not accurate. “Just because your line behaves the way it does within the confines, does not mean its gets all squirrely later on! We cant predict the behavior of data to extremes.” Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data.
Example in Context
Least-Squares Regression Residuals In most cases, no line will pass exactly through all the points in a scatterplot. A good regression line makes the vertical distances ofthe points from the line as small as possible. Definition: A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, residual = observed y – predicted y residual = y - ŷ residual Positive residuals (above line) Positive residuals (above line) Negative residuals (below line) Negative residuals (below line)
Residuals Look at your graph, how far away are your points from your graph? Residuals is the difference between an observed value of the response variable and the value predicted by the regression line. Residual = observed y – predicted y
Finding a Residual
Least-Squares Regression Line
LSRL TI-83/ TI-89 TI-83 –Put your data in L1, and L2 –STAT> CALC>#8 >Enter –Did you know your TI-83 will default to using L1 and L2 as our lists, so as long as you put your data in L1 and L2, you don’t have to tell it! TI-89 –Statistics/List Editor> F4 (CALC)>#3> #1 –Practice next slide!
Practice with your TI Calculator Body Weight (lbs) Backpack Weight (lbs)
Today’s Objective
Homework Worksheet