Download presentation
Presentation is loading. Please wait.
Published byOphelia Welch Modified over 9 years ago
1
C HAPTER 3: E XAMINING R ELATIONSHIPS
2
S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship Least-squares regression Method for finding a line that summarizes that relationship between two variables in a specific setting. Regression line Describes how a response variable y changes as an explanatory variable x changes Used to predict the value of y for a given value of x Unlike correlation, requires an explanatory and response variable. 2
3
L EAST - SQUARES REGRESSION LINE (LSRL) If you believe the data show a linear trend, it would be appropriate to try to fit an LSRL to the data We will use the line to predict y from x, so you want the LSRL to be as close as possible to all the points in the vertical direction That’s because any prediction errors we make are errors in y, or the vertical direction of the scatterplot Error = actual – predicted 3
5
L EAST - SQUARES REGRESSION LINE (LSRL) The least squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible 5
6
L EAST - SQUARES REGRESSION LINE (LSRL) 6
7
E XAMPLE 1 – FINDING THE LSRL r = -.946
8
F INDING THE LSRL AND O VERLAYING IT ON YOUR S CATTERPLOT Press the STAT key Scroll over to CALC Use option 8 After the command is on your home screen: Put the following L 1, L 2, Y 1 To get Y 1, press VARS, Y-VARS, Function Press enter The equation is now stored in Y 1 Press zoom 9 to see the scatterplot with the LSRL 8
9
U SE THE LSRL TO P REDICT With an equation stored on the calculator it makes it easy to calculate a value of y for any known x. Using the trace button 2 nd Trace, Value x = 18 Using the table 2 nd Graph Go to 2 nd window if you need change the tblstart Example 2 - Use the LSRL to predict the overall grade for a student who has had 18 absences. Also, interpret the slope and intercept of the regression line. A student who has had 18 absences is predicted to have an overall grade of about 14% The slope is -4.81 which in terms of this scenario means that for each day that a student misses, their overall grade decreases about 4.81 percentage points The intercept is at 101.04 which means that a student who hasn’t missed any days is predicted to have a grade of about 101%. 9
10
R EADING MINITAB OUTPUT
11
T HE ROLE OF R 2 IN REGRESSION. Coefficient of determination The proportion of the total sample variability that is explained by the least-squares regression of y on x It is the square of the correlation coefficient ( r ), and is therefore referred to as r 2 In the student absence vs. overall grade example, the correlation was r = -.946 The coefficient of determination would be r 2 =.8949 This means that about 89% of the variation in y is explained by the LSRL In other words, 89% of the data values are accounted for by the LSRL 11
12
F ACTS ABOUT LEAST - SQUARES REGRESSION 1. Distinction between explanatory and response variables is essential a. If we reversed the roles of the two variables, we get a different LSRL 2. There is a close connection between correlation and the slope of the regression line a. A change of one standard deviation in x corresponds to r standard deviations in y 3. The LSRL always passes through the point a) We can describe regression entirely in terms of basic descriptive measures 4. The coefficient of determination is the fraction of the variation in values of y that is explained by the least-squares regression of y on x 12
13
R ESIDUALS Residuals Deviations from the overall pattern Measured as vertical distances Difference between an observed value of the response variable and the value predicted by the regression line Residual = Observed y – predicted y The mean of the least-squares residuals is always zero If you round the residuals you will end up with a value very close to zero Getting a different value due to rounding is known as roundoff error 13
14
R ESIDUAL PLOT A residual plot is a scatterplot of the regression residuals against the explanatory variable Residual plots help us assess the fit of a regression line Below is a residual plot that shows a linear model is a good fit to the original data Reason There is a uniform scatter of points
15
R ESIDUAL PLOT Below are two residual plots that show a linear model is not a good fit to the original data Reasons Curved pattern Residuals get larger with larger values of x 15
16
I NFLUENTIAL OBSERVATIONS : Outlier An observation that lies outside the overall pattern in the y direction of the other observations. Influential Point An observation is influential if removing it would markedly change the result of the LSRL Are outliers in the x direction of a scatterplot Have small residuals, because they pull the regression line toward themselves. If you just look at residuals, you will miss influential points. Can greatly change the interpretation of data. 16
17
L OCATION OF I NFLUENTIAL OBSERVATIONS Child 19 Outlier Child 18 Influential Point 17
18
S EE ALL OF THE RESIDUALS AT ONCE The calculator calculates the residuals for all points every time it runs a linear regression command To see this, press 2 nd STAT and under NAMES scroll down to RESID The residuals will be in the order of the data 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.