Download presentation
Presentation is loading. Please wait.
Published byDwight Gaines Modified over 9 years ago
1
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression
2
Burger King- Fat vs Protein x variable = protein y variable = fat predict how much fat is in a menu item based on its protein How much fat is in a sandwich that has 25 grams of protein?
3
Linear Model To predict values we need an equation of the “best fit” line to go with our scatterplot Note: this is a MODEL. it will not tell you exactly how much fat is in a sandwich based on its protein content. It will give you a predicted value. Linear Model: an equation of a straight line through the data summarizes the general pattern helps us understand how the variables are associated the line will not hit all the points, it might not hit any of the points
4
Notation “putting a hat on it” is standard statistics notation to indicate that something has been predicted by a model. Whenever you see a hat over a variable name or symbol, you can assume it is the predicted version of that variable or symbol
5
Residuals the difference between the observed value and its associated predicted value tells us how off the model’s prediction is at that point residual = observed value – predicted value residual = y – y ( y hat) = predicted value negative residual means the predicted value is too big = overestimate positive residual means the predicted value is too small = underestimate
6
Residual - Example In the figure, the estimated fat of the BK Broiler chicken sandwich is 36 grams, while the true value of fat is 25 grams residual is -11 grams 25 – 36 = -11
7
Best Fit Line When we draw a line through a scatterplot, some residuals are positive and some are negative we can’t add these up to tell us anything because the positive and negative would just cancel each other out. What can we do to out residuals to that we can add them up for a number that would actually tell us something?? Squaring all the residuals will make them all positive it will also emphasize the larger ones When we add up all these squared residuals the sum tells us how well the line we drew fits the data the smaller the sum the better the fit the larger the sum the worse the fit
8
Line of Best Fit Is the line for which the sum of the squared residuals is the smallest Taking a look at the residual Taking a look at the residual
9
Correlation and the Line The figure shows the scatterplot of z-scores for fat and protein. If a burger has average protein content, it should have about average fat content too. Moving one standard deviation away from the mean in x moves us r standard deviations away from the mean in y.
10
Looking at standardized data Scatterplot: z y (standardized fat) vs. z x (standardized protein) The line must go through the point (x, y) when plotting the z-scores the line must pass through the origin (0,0) (because it’s the mean) Put generally, moving any number of standard deviations away from the mean in x moves us r times that number of standard deviations away from the mean in y.
11
How big can a predicted value get? r cannot be bigger than 1 (in absolute value) Each predicted y tends to be closer to its mean (in standard deviations) than its corresponding x was. this property is called regression to the mean the line is called the regression line
12
The Regression Line in Real Units Remember from Algebra that a straight line can be written as: In Statistics we use a slightly different notation: We write to emphasize that the points that satisfy this equation are just our predicted values, not the actual data values. This model says that our predictions from our model follow a straight line. If the model is a good one, the data values will scatter closely around it.
13
Slide 8 - 13 The Regression Line in Real Units(cont.) We write b 1 and b 0 for the slope and intercept of the line. b 1 is the slope, which tells us how rapidly changes with respect to x. b 0 is the y-intercept, which tells where the line crosses (intercepts) the y -axis.
14
The Regression Line in Real Units (cont.) In our model, we have a slope (b 1 ): The slope is built from the correlation and the standard deviations: Our slope is always in units of y per unit of x.
15
The Regression Line in Real Units (cont.) In our model, we also have an intercept (b 0 ). The intercept is built from the means and the slope: Our intercept is always in units of y.
16
Fat Versus Protein: An Example The regression line for the Burger King data fits the data well: The equation is The predicted fat content for a BK Broiler chicken sandwich (with 30 g of protein) is 6.8 + 0.97(30) = 35.9 grams of fat.
17
The Regression Line in Real Units (cont.) Since regression and correlation are closely related, we need to check the same conditions for regressions as we did for correlations: Quantitative Variables Condition Straight Enough Condition Outlier Condition
18
Checking In Let’s look at the relationship between house prices (in thousands of $) and house size (in thousands of ft 2 ). The regression model is: price = 9.564 + 122.74 size What does the slope of 122.74 mean? What are the units? How much can a homeowner expect the value of his house to increase if he builds on an additional 2000 square feet? How much would you expect to pay for a house of 3000 ft 2 ?
19
Example Fill in the missing information in the table.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.