Download presentation
Presentation is loading. Please wait.
Published byAnabel Mason Modified over 9 years ago
1
Regression
2
The Basic Problem How do we predict one variable from another?How do we predict one variable from another? How does one variable change as the other changes?How does one variable change as the other changes? Cause and effectCause and effect
3
An Example Cigarettes and CHD Mortality from Chapter 9Cigarettes and CHD Mortality from Chapter 9 Data repeated on next slideData repeated on next slide We want to predict level of CHD mortality in a country averaging 10 cigarettes per day.We want to predict level of CHD mortality in a country averaging 10 cigarettes per day.
4
The Data
6
Why Do We Care? We may want to make a prediction.We may want to make a prediction. More likely, we want to understand the relationship.More likely, we want to understand the relationship. XHow fast does CHD mortality rise with a one unit increase in smoking? XNote we speak about predicting, but often don’t actually predict.
7
Regression Line FormulaFormula X = the predicted value of Y (CHD mortality) XX = smoking incidence for that country
8
Regression Coefficients “Coefficients” are a and b“Coefficients” are a and b b = slopeb = slope XChange in predicted Y for one unit change in X a = intercepta = intercept Xvalue of when X = 0
9
Calculation SlopeSlope InterceptIntercept
10
For Our Data Cov XY = 11.13Cov XY = 11.13 s 2 X = 2.33 2 = 5.43s 2 X = 2.33 2 = 5.43 b = 11.13/5.43 = 2.04b = 11.13/5.43 = 2.04 a = 14.52 - 2.04*5.95 = 2.37a = 14.52 - 2.04*5.95 = 2.37 See SPSS printout on next slideSee SPSS printout on next slide Answers are not exact due to rounding error and desire to match SPSS.
11
SPSS Printout
12
Note: The values we obtained are shown on printout.The values we obtained are shown on printout. The intercept is labeled “constant.”The intercept is labeled “constant.” Slope is labeled by name of predictor variable.Slope is labeled by name of predictor variable.
13
Making a Prediction Assume that we want CHD mortality when cigarette consumption of 6.Assume that we want CHD mortality when cigarette consumption of 6. We predict 14.61 people/10,000 in that country will die of coronary heart disease.We predict 14.61 people/10,000 in that country will die of coronary heart disease.
14
Accuracy of Prediction Finnish smokers smoke 6 cigarettes/adult/dayFinnish smokers smoke 6 cigarettes/adult/day We predict 14.61 deaths/10,000We predict 14.61 deaths/10,000 They actually have 23 deaths/10,000They actually have 23 deaths/10,000 Our error (“residual”) = 23 - 14.61 = 8.39Our error (“residual”) = 23 - 14.61 = 8.39 Xa large error
15
Cigarette Consumption per Adult per Day 12108642 CHD Mortality per 10,000 30 20 10 0 Residual Prediction
16
Errors of Prediction Residual varianceResidual variance XThe variability of predicted values Standard error of estimateStandard error of estimate XThe standard deviation of predicted values
17
Standard Error of Estimate A common measure of the accuracy of our predictionsA common measure of the accuracy of our predictions XWe want it to be as small as possible.
18
r 2 as % Predictable Variability Define Sum of SquaresDefine Sum of Squares The remaining error divided by the original errorThe remaining error divided by the original error
19
For Our Data r =.713r =.713 r 2 =.713 2 =.508r 2 =.713 2 =.508 Approximately 50% in variability of incidence of CHD mortality is associated with variability in smoking.Approximately 50% in variability of incidence of CHD mortality is associated with variability in smoking. Elaborate on what this means.Elaborate on what this means.
20
Hypothesis Testing Null hypothesesNull hypotheses Xb * = 0 Xa * = 0 Define b* and a*Define b* and a* Xpopulation correlation ( ) = 0 We saw how to test the last one in Chapter 9.We saw how to test the last one in Chapter 9.
21
Testing Slope and Intercept These are given in computer printout as a t test.These are given in computer printout as a t test.
22
Testing The t values in the second from right column are tests on slope and intercept.The t values in the second from right column are tests on slope and intercept. The associated p values are next to them.The associated p values are next to them. The slope is significantly different from zero, but not the intercept.The slope is significantly different from zero, but not the intercept. Why do we care?Why do we care? Cont.
23
Testing--cont. What does it mean if slope is not significant?What does it mean if slope is not significant? XHow does that relate to test on r? What if the intercept is not significant?What if the intercept is not significant? Does significant slope mean we predict quite well?Does significant slope mean we predict quite well?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.