Verbal SAT vs Math SAT V: mean=596.3 st.dev=99.5 M: mean=612.2 st.dev=96.1 r = Write the equation of the LSRL Interpret the slope of this line Interpret the intercept of this line. Slope = 0.685(96.1/99.5) = Y-int = – (0.662*596.3) = Math SAT = (Verbal SAT) For every point on the Verbal SAT, your Math SAT increases by approx pts If you get a zero Verbal score, you are predicted to get a on the Math
Linear Regression Continued AP Statistics Chapter 8 Day 2
Learning Target What tools do we have to make sure our model is a good fit to our data?
Coefficient of Determination Measures the success of the regression model in terms of the fraction of the variation of y accounted for by the regression.
Interpreting r-squared: _____% of the variation in _y_ that is explained by __x__.
Residuals Residuals are the difference between an observed value of the response variable and the value predicted by the regression line. Residual = observed – predicted e = y – ŷ The mean of the residuals is always zero.
Residual Plot Plots the residuals on the vertical axis against the explanatory variable on the horizontal axis. Extreme points have a strong influence on the position of the regression line.
Extreme Points Outliers lie outside the overall pattern (y- direction) Influential- if removing it would markedly change the position of the regression line, they have small residuals (x-direction)
Examining Residuals Look for the following: Patterns and increasing or decreasing spread about the line as x increases show it is not linear BAD The residual plot shows points randomly distributed above/below the line GOOD Individual points with large residuals (outliers) BAD Individual points that are extreme in the x direction (influentials) BAD
What do these Residual Plots tell us? All of these show residual plots for bad models. What is the biggest problem with each of the graphs? Patterns indicate that we shouldn’t use a linear model
Example The data below consists of the chest sizes (in inches) and weights (in pounds) of a sample of male bears. Graph a scatterplot, find the LSRL, find r-squared, and do a residual plot. Chest Weight
Cost-to-charge ratio(the percentage of the amount billed that represents the actual cost) for inpatient and outpatient services at 11 Oregon hospitals is shown below.
OutpatientInpatient
1. Is the observation for Hospital 11 influential or an outlier? Justify. 2. Is the observation for Hospital 5 influential or an outlier? Justify.
Review The LSRL is,y = 11.6 and we want to find x. Find the residual for (4.5, 32) from Iffind r.
Homework Page P. 192 # 3, 9, 12, 14, 17, 21