Download presentation
1
Simple Linear Regression and Correlation
TOPIC 12 Simple Linear Regression and Correlation
2
Models Representation of some phenomenon
Mathematical model is a mathematical expression of some phenomenon Often describe relationships between variables Types Deterministic models Probabilistic models .
3
Deterministic Models Hypothesize exact relationships
Suitable when prediction error is negligible Example: force is exactly mass times acceleration F = m·a
4
Probabilistic Models Hypothesize two components
Deterministic Random error Example: sales volume (y) is 10 times advertising spending (x) + random error y = 10x + Random error may be due to factors other than advertising
5
Regression Models Probabilistic Models Regression Models
Correlation Models 7
6
Regression Models Answers ‘What is the relationship between the variables?’ Equation used One numerical dependent (response) variable What is to be predicted One or more numerical or categorical independent (explanatory) variables Used mainly for prediction and estimation
7
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
8
Types of Regression Models
Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
9
Population y-intercept
Linear Regression Model Relationship between variables is a linear function Population y-intercept Population Slope Random Error y x 1 Dependent (Response) Variable Independent (Explanatory) Variable
10
$ $ $ $ $ $ Population & Sample Regression Model
Random Sample $ Unknown Relationship $ $ $ $ $ 31
11
y i = Random error x Population Linear Regression Model
Observed value i = Random error (line of means) x Observed value 35
12
y x Sample Linear Regression Model i = Random error ^
Unsampled observation x Observed value 36
13
Scatter Diagram Plot of all (xi, yi) pairs
Suggests how well model will fit 20 40 60 x y
14
Thinking Challenge How would you draw a line through the points?
How do you determine which line ‘fits best’? 20 40 60 x y 42
15
Least Squares Method ‘Best fit’ means difference between actual y values and predicted y values are a minimum But positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE) 49
16
Least Squares Graphically
^ e ^ 4 2 ^ e e ^ 1 3 x 52
17
Coefficient Equations
Prediction Equation Slope y-intercept 53
18
Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) Find the least squares line relating sales and advertising.
19
xi yi xi yi xiyi Parameter Estimation Solution Table 15 10 55 26 37 1
4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 58
20
Parameter Estimation Solution
59
21
Parameter Estimation Computer Output
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADVERT ^ 0 ^ 1
22
Regression Line Fitted to the Data
Sales 4 3 2 1 1 2 3 4 5 Advertising 57
23
Least Squares Thinking Challenge
You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) Find the least squares line Relating crop yield and fertilizer. 62
24
xi yi xi yi xiyi Parameter Estimation Solution Table 32 24.0 296
3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 66
25
Parameter Estimation Solution
67
26
Regression Line Fitted to the Data*
Yield (lb.) 10 8 6 4 2 5 10 15 Fertilizer (lb.) 65
27
Linear Regression Assumptions
Mean of probability distribution of error, ε, is 0 Probability distribution of error has constant variance Probability distribution of error, ε, is normal Errors are independent
28
Error Probability Distribution
E(y) = β0 + β1x x x1 x2 x3 91
29
Random Error Variation
^ Variation of actual y from predicted y, y Measured by standard error of regression model Sample standard deviation of : s ^ Affects several factors Parameter significance Prediction accuracy
30
y x xi Variation Measures yi y Unexplained sum of squares
Total sum of squares Explained sum of squares y x xi 78
31
Estimation of σ2
32
Calculating SSE, s2, s Example
You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) Find SSE, s2, and s.
33
Calculating SSE Solution
xi yi 1 1 .6 .4 .16 2 1 1.3 -.3 .09 3 2 2 4 2 2.7 -.7 .49 5 4 3.4 .6 .36 SSE=1.1
34
Calculating s2 and s Solution
Estimation of σ2 Calculating s2 and s Solution
35
Test of Slope Coefficient
Shows if there is a linear relationship between x and y Involves population slope 1 Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship) Theoretical basis is sampling distribution of slope
36
Sampling Distribution of Sample Slopes
y Sample 1 Line All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: : : Very large number of sample slopes Sample 2 Line Population Line x b 1 Sampling Distribution 1 S ^ 105
37
Slope Coefficient Test Statistic
106
38
Test of Slope Coefficient Example
You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = Ad $ Sales (Units) Is the relationship significant at the .05 level of significance? ^ ^
39
Solution Table xi yi xi yi xiyi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4
8 5 4 25 16 20 15 10 55 26 37 108
40
Test Statistic Solution
41
Test of Slope Coefficient Solution
H0: 1 = 0 Ha: 1 0 .05 df = 3 Critical Value(s): Test Statistic: Decision: Conclusion: t 3.182 -3.182 .025 Reject H0 Reject at = .05 There is evidence of a relationship 109
42
Test of Slope Coefficient Computer Output
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADVERT ‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP. ^ ^ S ^ t = 1 / S 1 ^ 1 1 P-Value
43
Correlation Model Map Probabilistic Models Regression Models
Correlation Models 130
44
Correlation Models Answers ‘How strong is the linear relationship between two variables?’ Coefficient of correlation Sample correlation coefficient denoted r Values range from –1 to +1 Measures degree of association Does not indicate cause–effect relationship
45
Coefficient of Correlation
where
46
Perfect Negative Correlation Perfect Positive Correlation
Coefficient of Correlation Values Perfect Negative Correlation Perfect Positive Correlation No Linear Correlation –1.0 –.5 +.5 +1.0 Increasing degree of negative correlation Increasing degree of positive correlation 134
47
Coefficient of Correlation Example
You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units) Calculate the coefficient of correlation. 83
48
Solution Table xi yi xi yi xiyi 15 10 55 26 37 1 1 1 1 1 2 1 4 1 2 3 2
9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 108
49
Coefficient of Correlation Solution
50
Coefficient of Correlation Thinking Challenge
You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) Find the coefficient of correlation. 62
51
xi yi xi yi xiyi Solution Table 4 3.0 16 9.00 12 6 5.5 36 30.25 33 10
6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 66
52
Coefficient of Correlation Solution
53
Coefficient of Determination
Proportion of variation ‘explained’ by relationship between x and y 0 r2 1 r2 = (coefficient of correlation)2 79
54
Coefficient of Determination Example
You’re a marketing analyst for Hasbro Toys. You know r = .904. Ad $ Sales (Units) Calculate and interpret the coefficient of determination. 83
55
Coefficient of Determination Solution
r2 = (coefficient of correlation)2 r2 = (.904)2 r2 = .817 Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model. 83
56
r2 Computer Output Root MSE R-square Dep Mean Adj R-sq C.V r2 r2 adjusted for number of explanatory variables & sample size
57
Prediction With Regression Models
Types of predictions Point estimates Interval estimates What is predicted Population mean response E(y) for given x Point on population regression line Individual response (yi) for given x
58
y ^ y ^ ^ ^ x xP What is Predicted? yi = b0 + b1x Mean y, E(y)
Individual yi = b0 + b1x ^ Mean y, E(y) E(y) = b0 + b1x ^ Prediction, y x xP 115
59
Confidence Interval Estimate for Mean Value of y at x = xp
df = n – 2
60
Factors Affecting Interval Width
Level of confidence (1 – ) Width increases as confidence increases Data dispersion (s) Width increases as variation increases Sample size Width decreases as sample size increases Distance of xp from mean Width increases as distance increases
61
y y x x1 x x2 Why Distance from Mean Sample 1 Line Sample 2 Line
The closer to the mean, the less variability. This is due to the variability in estimated slope parameters. Sample 1 Line Greater dispersion than x1 y Sample 2 Line x x1 x x2 118
62
Confidence Interval Estimate Example
You’re a marketing analyst for Hasbro Toys. You find β0 = -.1, β 1 = .7 and s = Ad $ Sales (Units) Find a 95% confidence interval for the mean sales when advertising is $4. ^ ^
63
Solution Table 2 2 x y x y x y i i i i i i 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 120
64
Confidence Interval Estimate Solution
x to be predicted 121
65
Prediction Interval of Individual Value of y at x = xp
Note the 1 under the radical in the standard error formula. The effect of the extra Syx is to increase the width of the interval. This will be seen in the interval bands. Note! df = n – 2 122
66
e y ^ x xp Why the Extra ‘S’ ? yi = b0 + b1xi E(y) = b0 + b1x ^
y we're trying to predict The error in predicting some future value of Y is the sum of 2 errors: 1. the error of estimating the mean Y, E(Y|X) 2. the random error that is a component of the value of Y to be predicted. Even if we knew the population regression line exactly, we would still make error. e Expected (Mean) y E(y) = b0 + b1x ^ Prediction, y x xp 123
67
Prediction Interval Example
You’re a marketing analyst for Hasbro Toys. You find β0 = -.1, β 1 = .7 and s = Ad $ Sales (Units) Predict the sales when advertising is $4. Use a 95% prediction interval. ^ ^
68
Solution Table 2 2 x y x y x y i i i i i i 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 120
69
Prediction Interval Solution
x to be predicted 121
70
Interval Estimate Computer Output
Dep Var Pred Std Err Low95% Upp95% Low95% Upp95% Obs SALES Value Predict Mean Mean Predict Predict Predicted y when x = 4 Confidence Interval Prediction Interval SY ^
71
y ^ x x Confidence Intervals vs. Prediction Intervals yi = b0 + b1xi
Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) ^ yi = b0 + b1xi x x 124
72
Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) Any Questions ? 124
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.