Download presentation
Presentation is loading. Please wait.
1
Statistics for Business and Economics
Chapter 10 Simple Linear Regression
2
Learning Objectives Describe the Linear Regression Model
State the Regression Modeling Steps Explain Least Squares Compute Regression Coefficients Explain Correlation Predict Response Variable As a result of this class, you will be able to...
3
What is Regression? 1. Method of modeling relationships
between a response variable Y and one or more predictors X 2. A way of “fitting line through data”
4
Example 1: Store Site Selection
Model sales Y at existing sites as a function of demographic variables: X1 = Population in store vicinity X2 = Income in area X3 = Age of houses in area X4 = X5 = From equation, predict sales at new sites
5
Example 2: Marketing Research
Model consumer response to a product on basis of product characteristics: Y = Taste score on soft drink X1 = Sugar level X2 = Carbonation level X3 = X4 =
6
Example 3: Operations/Quality
Model product quality in plant as a function of manufacturing process characteristics: Y = Quality score of sheet metal X1 = Raw material purity score X2 = Molten aluminum temp X3 = Line speed X4 =
7
Example 4: Real Estate Pricing
Y = Selling price of houses X1 = Square feet X2 = Taxes X3 = Lot acreage X4 = X5 = X6 =
8
One-Predictor Regression
25 Homes Sold in Essex County, New Jersey, 1996
9
One-Predictor Regression
Regression Equation: Y= X
10
Regression Models Answers ‘What is the relationship between the variables?’ Equation used One numerical dependent (response) variable What is to be predicted One or more numerical or categorical independent (explanatory) variables Used mainly for prediction and estimation
11
Prediction Using Regression
Y= (4000)= $300,999
12
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
13
Specifying the Model Define variables
Conceptual (e.g., Advertising, price) Empirical (e.g., List price, regular price) Measurement (e.g., $, Units) Hypothesize nature of relationship Expected effects (i.e., Coefficients’ signs) Functional form (linear or non-linear) Interactions
14
Model Specification Is Based on Theory
Theory of field (e.g., Sociology) Mathematical theory Previous research ‘Common sense’
15
Thinking Challenge: Which Is More Logical?
Sales Sales With positive linear relationship, sales increases infinitely. Discuss concept of ‘relevant range’. Advertising Advertising Sales Sales Advertising Advertising 17
16
Types of Regression Models
Simple 1 Explanatory Variable 2+ Explanatory Variables Multiple This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Linear Non- Linear Linear Non- Linear 19
17
Linear Regression Model
Relationship between variables is a linear function Population y-intercept Population Slope Random Error y x 1 Dependent (Response) Variable Independent (Explanatory) Variable
18
Line of Means y x High school teacher E(y) = β0 + β1x (line of means)
Change in y β1 = Slope Change in x β0 = y-intercept x High school teacher 28
19
Linear Regression Model
1. Relationship between variables is a linear function Population Y-intercept Population slope Independent (explanatory) variable Y X i 1 i i Dependent (response) variable Random error
20
Population Linear Regression Model
Observedvalue i = Random error Observed value 35
21
Sample Linear Regression Model
Y True Unknown Line X Observed value 36
22
Sample Linear Regression Model
Y i = Random error ^ X Observed value 36
23
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
24
Scattergram Plot of all (xi, yi) pairs
Suggests how well model will fit 20 40 60 x y
25
Thinking Challenge How would you draw a line through the points?
How do you determine which line ‘fits best’? 20 40 60 x y 42
26
Least Squares ‘Best fit’ means difference between actual y values and predicted y values are a minimum But positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE) 49
27
Least Squares Graphically
^ e ^ 4 2 ^ e e ^ 1 3 x 52
28
Coefficient Equations
Prediction Equation Slope y-intercept 53
29
Computation Table xi yi xi yi xiyi x1 y1 x1 y1 x1y1 x2 y2 x2 y2 x2y2 :
xn yn xn 2 yn xnyn 2 2 Σxi Σyi Σxi Σyi Σxiyi 54
30
Interpretation of Coefficients
^ Slope (1) Estimated y changes by 1 for each 1unit increase in x If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x) Y-Intercept (0) Average value of y when x = 0 If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0 ^
31
Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) Find the least squares line relating sales and advertising.
32
Scattergram Sales vs. Advertising
4 3 2 1 1 2 3 4 5 Advertising 57
33
Parameter Estimation Solution Table
xi yi xi 2 yi 2 xiyi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 58
34
Parameter Estimation Solution
59
35
Parameter Estimation Computer Output
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADVERT ^ 0 ^ 1
36
JMP “Fit Model” Results
37
Coefficient Interpretation Solution
Slope (1) Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x) ^ Y-Intercept (0) Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0 Difficult to explain to marketing manager Expect some sales without advertising ^
38
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
39
Linear Regression Assumptions
Mean of probability distribution of error, ε, is 0 Probability distribution of error has constant variance Probability distribution of error, ε, is normal Errors are independent
40
Error Probability Distribution
^ f( ) Y X 1 X 2 X 91
41
Error Probability Distribution
^ f( ) Y X 1 X 2 X 91
42
Error Probability Distribution
^ f( ) Y X 1 X 2 X 91
43
Estimating s2 Recall that: where is our estimator of the mean.
Now substitute (1) Yi for Xi , (2) , and n-2 df for n-1:
44
Why are df = n - 2? 1. Two statistics are estimated to compute the regression line, b0 and b1 2. As a result, if I know any n-2 residuals, the other two can be computed from ei = 0 and SeiXi = 0. ^ ^ ^ ^
45
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
46
Test of Slope Coefficient
Shows if there is a linear relationship between x and y Involves population slope 1 Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship) Theoretical basis is sampling distribution of slope
47
Sampling Distribution of Sample Slopes
y Sample 1 Line All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: : : Very large number of sample slopes Sample 2 Line Population Line x b 1 Sampling Distribution 1 S ^ 105
48
Slope Coefficient Test Statistic
106
49
Test of Slope Coefficient Example
You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = Ad $ Sales (Units) Is the relationship significant at the .05 level of significance? ^ ^
50
Test Statistic Solution
51
JMP Fit Model Results S ^ ^ 1 t = 1 / S ^ 1 ^ 1 P-Value
52
Computing a CI for a Slope
A (1 - a)100% confidence interval for the true slope parameter b1 is given by: This assumes that all of the standard regression assumptions are met! Example from previous slide: ‘Standard Error’ is the estimated standard deviation of the sampling distribution, sbP.
53
Coefficient of Determination
Proportion of variation ‘explained’ by relationship between x and y 0 r2 1 r2 = (coefficient of correlation)2 79
54
Coefficient of Determination Examples
80
55
Coefficient of Determination Example
You’re a marketing analyst for Hasbro Toys. You know r = .904. Ad $ Sales (Units) Calculate and interpret the coefficient of determination. 83
56
Coefficient of Determination Solution
r2 = (coefficient of correlation)2 r2 = (.904)2 r2 = .817 Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model. 83
57
r2 Computer Output Root MSE R-square Dep Mean Adj R-sq C.V r2 r2 adjusted for number of explanatory variables & sample size
58
JMP Fit Model Results r2 r2 adjusted for number of explanatory variables & sample size
59
Correlation Models 1. Answer ‘How strong is the linear relationship between 2 variables?’ 2. Coefficient of correlation used Population coefficient denoted (rho) Values range from -1 to +1 Measures degree of association 3. Used mainly for understanding
60
Sample Coefficient of Correlation
1. Pearson product moment coefficient of correlation, r 132
61
Coefficient of Correlation Values
Perfect Negative Correlation Perfect Positive Correlation No Correlation -1.0 -.5 +.5 +1.0 139
62
Coefficient of Correlation Examples
141
63
Regression Modeling Steps
Hypothesize deterministic component Estimate unknown model parameters Specify probability distribution of random error term Estimate standard deviation of error Evaluate model Use model for prediction and estimation
64
Prediction With Regression Models
Types of predictions Point estimates Interval estimates What is predicted Population mean response E(y) for given x Point on population regression line Individual response (yi) for given x
65
What Is Predicted y ^ y ^ ^ ^ x xP yi = b0 + b1x Mean y, E(y)
Individual yi = b0 + b1x ^ Mean y, E(y) E(y) = b0 + b1x ^ Prediction, y x xP 115
66
Confidence Interval Estimate for Mean Value of y at x = xp
df = n – 2
67
Factors Affecting Interval Width
Level of confidence (1 – ) Width increases as confidence increases Data dispersion (s) Width increases as variation increases Sample size Width decreases as sample size increases Distance of xp from meanx Width increases as distance increases
68
Confidence Interval Estimate Example
You’re a marketing analyst for Hasbro Toys. You find β0 = -.1, β 1 = .7 and s = Ad $ Sales (Units) Find a 95% confidence interval for the mean sales when advertising is $4. ^ ^
69
Confidence Interval Estimate Solution
x to be predicted 121
70
Prediction Interval of Individual Value of y at x = xp
Note the 1 under the radical in the standard error formula. The effect of the extra Syx is to increase the width of the interval. This will be seen in the interval bands. Note! df = n – 2 122
71
e Why the Extra ‘S’? y ^ x xp yi = b0 + b1xi E(y) = b0 + b1x ^
y we're trying to predict e Expected The error in predicting some future value of Y is the sum of 2 errors: 1. the error of estimating the mean Y, E(Y|X) 2. the random error that is a component of the value of Y to be predicted. Even if we knew the population regression line exactly, we would still make error. (Mean) y E(y) = b0 + b1x ^ Prediction, y x xp 123
72
Prediction Interval Solution
x to be predicted 121
73
Interval Estimate Computer Output
Dep Var Pred Std Err Low95% Upp95% Low95% Upp95% Obs SALES Value Predict Mean Mean Predict Predict Predicted y when x = 4 Confidence Interval Prediction Interval SY ^
74
Confidence Intervals v. Prediction Intervals
y ^ yi = b0 + b1xi Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) x x 124
75
Confidence Intervals vs. Prediction Intervals in JMP
Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx) 124
76
Conclusion Described the Linear Regression Model
Stated the Regression Modeling Steps Explained Least Squares Computed Regression Coefficients Used the model for prediction and estimation Evaluated model on the basis of significance, Rsquare, size of prediction intervals Discussed Rsquare and correlation coefficient
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.