Download presentation
Presentation is loading. Please wait.
Published byAlicia Newton Modified over 9 years ago
1
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression
2
+ On your Calculator ALWAYS graph the data first! (use a scatterplot) Ask yourself: is the data linear? If it is not, linear regression is not appropriate! If it is, press STAT/CALC/8:LinReg L1, L2, Y1 The output will tell you the equation of the regression line as well as the correlation, r. Choose ZOOM 9 to see the line and the scatterplot together.
3
+ Let’s use the TI to calculate the LSR for the body weight/back pack data. Body WeightBackpack Weight 12026 18730 10926 10324 13129 16535 15831 11628
4
+ Least-Squares Regression Residuals In most cases, no line will pass exactly through all the points in a scatterplot. A good regression line makes the vertical distances of thepoints from the line as small as possible. Definition: A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, residual = observed y – predicted y residual = y - ŷ residual Positive residuals (above line) Positive residuals (above line) Negative residuals (below line)
5
+ Least-Squares Regression Least-Squares Regression LineDifferent regression lines produce different residuals. The regression line we want is the one that minimizes the sum ofthe squared residuals. Definition: The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. Luckily, technology calculate the LSR line for us!
6
+ Least-Squares Regression Least-Squares Regression LineWe can use technology to find the equation of the least- squares regression line. We can also write it in terms of themeans and standard deviations of the two variables andtheir correlation. Definition: Equation of the least-squares regression line We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and standard deviations of the two variables and their correlation. The least squares regression line is the line ŷ = a + bx with slope and y intercept Interpretation – The LSR Line will pass through the mean of (x,y)
7
+ Least-Squares Regression Residual PlotsOne of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. Aregression line describes the overall pattern of a linearrelationship between two variables. We see departures fromthis pattern by looking at the residuals. Definition: A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data.
8
+ Least-Squares Regression Interpreting Residual PlotsA residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations andpatterns. 1) The residual plot should show no obvious patterns 2) The residuals should be relatively small in size. If there is no discernible pattern in the residual plot, then we will conclude that the model is appropriate. How you remember this: SNOW = GOOD. How you articulate this: Since there is no discernible pattern on the residual plot, we conclude that the linear model is appropriate. Pattern in residuals Linear model not appropriate
9
+ The Coefficient of Determination Man, these names are all so similar! r is the correlation coefficient. It tells us how strong the linear relationship is. Interpreting r involves simply describing form, direction, and strength. r 2, on the other hand, is the coefficient of determination. It tells us what percent of the variation in the response variable can be explained by regression on the explanatory variable. Interpret r 2 =.721 if x is hours spent studying and y is GPA. 72.1% of the variation in GPA can be explained by regression on hours spent studying.
10
+ Interpreting r 2 It should make sense that the closer r 2 is to 100%, the better the LSR line is at predicting values of the response variable. Think of it this way, suppose we’re examining the relationship between IQ and GPA. If r 2 is 0.967, then that means that 96.7% of the variation in GPAs can be explained by differences in IQ. Would you spend much time studying? What if r 2 is 0.128?
11
+ Interpreting s s stands for standard deviation. When we are studying regression, s is specifically the standard deviation of the RESIDUALS. So, it’s an average distance between our actual y-values and the predicted y-values.
12
+ Reading Generic Computer Output # of manatee deaths vs. registered boats The cell that represents the constant coefficient is the y- intercept. The cell that represents the boats coefficient is the slope. So, my least-squares regression equation is
13
More on Computer Output Natural Gas Usage vs. Degree Days (a measure of temp) Predictor CoefStdevt- ratio p Constant 1.08920.13897.840.000 D-days 0.18890.00493438.310.000 s=0.339R-sq=99.1%R-sq(adj) = 99.0% This is the y- intercept. This is the slope. This is r 2. We can take the square root to find r. This is s.
14
+ Least-Squares Regression Correlation and Regression WisdomCorrelation and regression are powerful tools for describing the relationship between two variables. When you use thesetools, be aware of their limitations 1. The distinction between explanatory and response variables is important in regression.
15
+ Least-Squares Regression Correlation and Regression Wisdom 2. Correlation and regression lines describe only linear relationships. 3. Correlation and least-squares regression lines are not resistant. Definition: An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.
16
+ Least-Squares Regression Correlation and Regression Wisdom 4. Association does not imply causation. An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. Association Does Not Imply Causation A serious study once found that people with two cars live longer than people who only own one car. Owning three cars is even better, and so on. There is a substantial positive correlation between number of cars x and length of life y. Why?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.