Download presentation
1
Chapter 9: Regression Wisdom
AP Statistics
2
Issues and Problems with Regression
Subsets and curves Dangers of extrapolation Possible effects of outliers, high leverage, and influential points Problems with regression of summary data Mistakes of inferring causation
3
What else can residuals tell us?
Histograms (and other graphs) of residuals can reveal “Subsets” of data that will enhance our understanding of the original data. May lead us to analyzing the “subsets” seperately.
4
What else can residuals tell us?
Histogram of residuals Scatterplot of residuals
5
Hard to See Curves Sometimes the scatterplot looks “straight enough”, but a non-linear relationship only comes to light after you look at residual plot.
6
Extrapolation The farther our x value is from the mean of x, the less we trust our predicted value. Once we venture into new x territory our predicted value is an extrapolation. Our extrapolations not reliable because we are operating under the assumption that the relationship between x and y has changed, even for these extreme values of x. Don’t extrapolate into the future!!!!!!!!
7
Extrapolation
8
Outliers, Leverage and Influence
Unusual point vocabulary: High Leverage Points: Points that have an x value that is far from Influential Points: Points that change the model (change the slope of the line) High leverage points can also be influential, but do not need to be
9
Outliers, Leverage and Influence
Three types of unusual points: 1. High Leverage points with small residuals. These points confirm the pattern, but are extreme values. The slope and intercept are mostly unaffected, but the R-squared value will increase—don’t be misled that the model is now stronger.
10
Outliers, Leverage, and Influence
2. Outliers—Not high leverage, not influential and large residual: Does not affect slope, but aren’t consistent with pattern. Will change the intercept. Don’t throw away. x value is near center of mean of x values
11
Outliers, Leverage, and Influence
Influential Points—also high leverage and probably small residual: These are most troublesome. They aren’t consistent with model and if the point is removed the slope of line dramatically changes—it changes the model. Don’t throw it our without thinking.
12
Lurking Variables and Causation
With observational data, as opposed to designed experiments, there is not way to be sure that a lurking variable is not the cause of any apparent association. The lurking variable is some third variable (not the explanatory or predictor variable) that is driving both variables you have observed.
13
Lurking Variables and Causation z is the lurking variable
14
Lurking Variables and Causation
There have been many studies showing a strong positive association between hours spent in religious activities (going to church, attending religious classes, praying, etc) and life expectancy. NOT CAUSATION. There is confoudnding—on average, people who attend relgious activites also take better care of themselves than non-church attendants. They are also less likely to smoke, more likely to exercise and less likely to be overweight. These effects of good habits (lurking variables) are confounded with the direct effects of attending religious activities.
15
Working With Summary Values
Be cautious when working with data values that are summaries, such as mean and medians. These values have less variability and therefore inflate the strength of the relationship (correlation).
16
Summary Data
17
All Data Points
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.