Download presentation
Presentation is loading. Please wait.
Published byNaomi Boyd Modified over 8 years ago
1
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation
2
Association Between Two Variables Regression Analysis -- we want to predict the dependent variable using the independent variable Correlation Analysis -- measures the strength of the linear association between 2 quantitative variables
3
3 11.7 Correlation Analysis
4
4
5
5 Calculating the Correlation Coefficient
6
6 Notation: So --
7
7 Rejection Region Test Statistic t > t 2 or t < t /2 df n 2 Testing Statistical Significance of Correlation Coefficient
8
8 Study Time Exam (hours) Score (X) (Y) 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 The data below are the study times and the test scores on an exam given over the material covered during the two weeks.
9
9
10
10 Correlation Between Study Time and Score H 0 : There is No Correlation Between Study Time and Score H a : There is a Correlation Between Study Time and Score Rejection Region: Test Statistic: Conclusion: P-value: SAS: p
11
11 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score 8 82.50000 5.18239 660.00000 74.00000 92.00000 time 8 14.62500 4.74906 117.00000 8.00000 22.00000 Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score 1.00000 -0.77490 0.0239 time -0.77490 1.00000 0.0239 Study Time by Score
12
12
13
13 11.1-5 Regression Analysis
14
14 Notation Theoretical Model Regression line -- these are evaluated from the data
15
15 Data we write
16
16
17
17 Least Squares Estimates Computation Formula
18
18 Study Time Exam (hours) Score (X) (Y) 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find the equation of the regression line for prediction exam score from study time.
19
19 Calculations: Study Time Data Equation of Regression Line:
20
20 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model 1 112.8883610 112.8883610 9.02 0.0239 Error 6 75.1116390 12.5186065 Corrected Total 7 188.0000000 R-Square Coeff Var Root MSE score Mean 0.600470 4.288684 3.538164 82.50000 Source DF Type I SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Source DF Type III SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Standard Parameter Estimate Error t Value Pr > |t| Intercept 94.86698337 4.30408629 22.04 <.0001 time -0.84560570 0.28159265 -3.00 0.0239 PROC GLM; MODEL score=time; RUN; YX
21
21 To Predict Y for a given x: -- plug x into the regression equation and solve for Y Example: If a student studied 10 hours, then the predicted score would be
22
22 Notes: - is called the sum-of-squared residuals -- SS(Residuals) -- SSE is the estimate of the error variance
23
23 Testing for Significance of the Regression If knowing x is of absolutely no help in predicting Y, then it seems reasonable that the regression line for predicting Y from x should have slope ________. That is, to test for a “significant regression” we test Test Statistic Rejection Region: where t has n 2 df
24
24 Study Time Data
25
25 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model 1 112.8883610 112.8883610 9.02 0.0239 Error 6 75.1116390 12.5186065 Corrected Total 7 188.0000000 R-Square Coeff Var Root MSE score Mean 0.600470 4.288684 3.538164 82.50000 Source DF Type I SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Source DF Type III SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Standard Parameter Estimate Error t Value Pr > |t| Intercept 94.86698337 4.30408629 22.04 <.0001 time -0.84560570 0.28159265 -3.00 0.0239 PROC GLM; MODEL score=time; RUN;
26
26 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score 8 82.50000 5.18239 660.00000 74.00000 92.00000 time 8 14.62500 4.74906 117.00000 8.00000 22.00000 Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score 1.00000 -0.77490 0.0239 time -0.77490 1.00000 0.0239 Study Time by Score
27
27 Note: The t values for testing H 0 : and for testing H 0 : are the same. - both tests depend on the normality assumption
28
28 Recall: One-sample Test about a Mean In general: df = n – 1
29
29 (1 – )100% Confidence Interval for df = n – 1
30
30 Similarly
31
31 df = n – 2 Can also find confidence interval for - not as useful Alternative form
32
32 Prediction Setting:
33
33 2 Intervals 1. Confidence Interval on Y|x n+1
34
34
35
35
36
36 2. Prediction Interval for y n+1 Notes:
37
37
38
ExtrapolationExtrapolation l Predicting beyond the range of predictor variables
39
39 Predict the price of a car that weighs 3500 lbs. - extrapolation would say it’s about $16,000
40
ExtrapolationExtrapolation l Predicting beyond the range of predictor variables 6 NOT a good idea -- called extrapolation penalty
41
41 Predict the price of a car that weighs 3500 lbs. - extrapolation would say it’s about $16,000 oops!!!
42
42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.