Download presentation
Presentation is loading. Please wait.
Published byAnnis Floyd Modified over 9 years ago
1
Regression
2
Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression line, regression equation Regression line is used for prediction
3
Predicting weights from heights Independent variable: height Dependent variable: weight How can we predict one from the other ? Regression is to a scatter plot as the mean is to a histogram.
4
Weights vs. Heights
5
Salary by years employed
6
Regression by local averages Approximation of Local averages by regression line Inappropriate use of regression line (use other methods)
7
The equation of a line a represents the y-intercept –when x equals zero, y equals a –Is this always meaningful in the context of a problem? –Is it always useful in defining a line? b represents the slope of the line (rise/run) –for every unit change in x, y changes by b. –Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?
8
Regression equation What is the predicted weight of somebody whose height is h cm ? w = intercept + slope x h This is known as the regression equation. How do we get this formula ? We have a statistical model
9
A residual Regression line by minimising residual errors i = error of i-th obs from regression line The best candidate line will minimise these errors No line can make all errors vanish (some +ve, some –ve)
10
Regression and correlation Want to predict weight for those people who are 1 SD more than avg. height. SD line says: pred. wt. = overall avg. wt. + SD of wt. Regression line says: Predicted wt. = overall avg. wt. + r x SD of wt. For people who are k SDs away from avg. height: Predicted wt. = overall avg. wt. + r x k SD of wt. Clearly valid for r 0 or r 1
11
RMS error of regression RMS error = SD of y RMS inversely related to correlation RMS error is to regression what SD is to average
12
Residuals residual = observed -predicted
13
Example: ozone vs. temperature > air[,c(1,3)] ozone temperature 3.45 67 3.30 72 2.29 74 2.62 62 2.84 65... > cor(ozone,temperature) [1] 0.7531038
14
Fitting a regression model in S > ozone.lm <- lm(ozone ~ temperature, data = air) Coefficients:. Value Std. Error tvalue Pr(>|t|) (Intercept) -2.23 0.46 -4.82 0.0000 temperature 0.07 0.01 11.95 0.0000 Multiple R-Squared: 0.5672 > var(ozone) [1] 0.7928069 > var(resid(ozone.lm)) [1] 0.3431544 > cor(ozone,temperature) [1] 0.7531038
15
Checking model appropriateness What assumptions have we made in the regression model ? Checking model assumptions in S-plus > par(mfrow=c(2,3)) > plot(ozone.lm)
16
Residual diagnostics for ozone data
17
Pizza party at the Frat. How many laps would you predict a pledge could run if he ate 6 slices of pizza? How many laps if he ate 9 slices of pizza? A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run? Beware of extrapolation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.