Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.

Similar presentations


Presentation on theme: "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model."— Presentation transcript:

1 Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model

2 Copyright © 2014, 2011 Pearson Education, Inc. 2 21.1 The Simple Regression Model What is the turn around time for small orders of specialized parts?  Use a simple regression model with response time as y and order size as x  Use inference related to regression: standard errors, confidence intervals and hypothesis tests

3 Copyright © 2014, 2011 Pearson Education, Inc. 3 21.1 The Simple Regression Model Estimated Production Time = 172 + 2.44 Number of Units

4 Copyright © 2014, 2011 Pearson Education, Inc. 4 21.1 The Simple Regression Model  Simple Regression Model (SRM): model for the association in the population between an explanatory variable x and response y.  Consider the data to be a sample from a population.

5 Copyright © 2014, 2011 Pearson Education, Inc. 5 21.1 The Simple Regression Model Linear on Average  The equation of the SRM describes how the conditional mean of Y depends on X.  The SRM shows that these means lie on a line with intercept β 0 and slope β 1 :

6 Copyright © 2014, 2011 Pearson Education, Inc. 6 21.1 The Simple Regression Model Deviations from the Mean  The deviations of responses around are called errors.  Error, is denoted by, and E( ) = 0.

7 Copyright © 2014, 2011 Pearson Education, Inc. 7 21.1 The Simple Regression Model Deviations from the Mean The SRM makes three assumptions about : 1. Independent. Errors are independent of each other. 2. Equal variance. All errors have the same variance, Var( ) =. 3. Normal. The errors are normally distributed.

8 Copyright © 2014, 2011 Pearson Education, Inc. 8 21.1 The Simple Regression Model Data Generating Process  Let Y denote monthly sales of a company and let X denote its spending on advertising (both in thousands of dollars).  Assume the following population model:

9 Copyright © 2014, 2011 Pearson Education, Inc. 9 21.1 The Simple Regression Model Data Generating Process The SRM assumes a normal distribution at each x.

10 Copyright © 2014, 2011 Pearson Education, Inc. 10 21.1 The Simple Regression Model Data Generating Process Eventually the data shown below are observed.

11 Copyright © 2014, 2011 Pearson Education, Inc. 11 21.1 The Simple Regression Model Data Generating Process  The true regression line is a characteristic of the population, not the observed data.  The SRM is a model and offers a simplified view of reality.

12 Copyright © 2014, 2011 Pearson Education, Inc. 12 21.1 The Simple Regression Model Simple Regression Model (SRM) Observed values of the response Y are linearly related to the values of the explanatory variable X by the equation:, ~ N(0, ). The observations are independent of one another, have equal variance around the regression line, and are normally distributed around the regression line.

13 Copyright © 2014, 2011 Pearson Education, Inc. 13 21.2 Conditions for the SRM Conditions for the SRM – Checklist  Is the association between Y and X linear?  Have we ruled out lurking variables?  Are the errors evidently independent?  Are the variances of the residuals similar?  Are the residuals nearly normal?

14 Copyright © 2014, 2011 Pearson Education, Inc. 14 21.2 Conditions for the SRM Conditions – Production Time Example Linearity satisfied; no pattern in the residuals. Similar variances satisfied; spread of residuals constant around horizontal line.

15 Copyright © 2014, 2011 Pearson Education, Inc. 15 21.2 Conditions for the SRM Conditions – Production Time Example No obvious lurking variable. Without knowing more about context, we can only guess at a lurking variable (e.g., complexity of parts ordered). Evidently independent. Is there any reason to believe that the time needed for one run influences those of others? If data are time series, plot residuals over time.

16 Copyright © 2014, 2011 Pearson Education, Inc. 16 21.2 Conditions for the SRM Conditions for the SRM – Production Time Example Nearly normal condition satisfied. If not, need to have sample size condition (satisfied) to use CLT.

17 Copyright © 2014, 2011 Pearson Education, Inc. 17 21.2 Conditions for the SRM Modeling Process Before looking at plots, ask two questions: 1. Does a linear relationship make sense? 2. Is the relationship free of lurking variables? Then begin working with data.

18 Copyright © 2014, 2011 Pearson Education, Inc. 18 21.2 Conditions for the SRM Modeling Process  Plot y versus x and verify a linear association.  Fit the least squares line and obtain residuals.  Plot the residuals versus x.  If time series data, construct a timeplot of residuals.  Inspect the histogram and quantile plot of the residuals.

19 Copyright © 2014, 2011 Pearson Education, Inc. 19 21.3 Inference in Regression Parameters and Estimates for SRM

20 Copyright © 2014, 2011 Pearson Education, Inc. 20 21.3 Inference in Regression Standard Errors  Describe the sample-to-sample variability of b 0 and b 1  The estimated standard error of b 1 is

21 Copyright © 2014, 2011 Pearson Education, Inc. 21 21.3 Inference in Regression Estimated Standard Error of b 1 Influenced by:  Standard deviation of the residuals. As it increases, the standard error increases.  Sample size. As it increases, the standard error decreases.  Standard deviation of x. As it increases, the standard error increases.

22 Copyright © 2014, 2011 Pearson Education, Inc. 22 21.3 Inference in Regression More variation in x leads to better estimate of slope.

23 Copyright © 2014, 2011 Pearson Education, Inc. 23 21.3 Inference in Regression Results for Production Time Example

24 Copyright © 2014, 2011 Pearson Education, Inc. 24 21.3 Inference in Regression Confidence Intervals The 95% confidence interval for β 1 is The 95% confidence interval for β 0 is

25 Copyright © 2014, 2011 Pearson Education, Inc. 25 21.3 Inference in Regression Confidence Intervals – Production Time Example The 95% confidence interval for β 1 is The 95% confidence interval for β 0 is

26 Copyright © 2014, 2011 Pearson Education, Inc. 26 21.3 Inference in Regression Hypothesis Tests To test H 0 : β 1 = 0 use To test H 0 : β 0 = 0 use

27 Copyright © 2014, 2011 Pearson Education, Inc. 27 21.3 Inference in Regression Hypothesis Tests – Production Time Example  The t-statistic of 16.37 with p-value of < 0.0001 indicates that the slope is significantly different from zero.  The t-statistic of 4.2 with p-value of 0.0001 indicates that the intercept is significantly different from zero.

28 Copyright © 2014, 2011 Pearson Education, Inc. 28 21.3 Inference in Regression Equivalent Inferences for SRM We reject the claim that a parameter in the SRM equals zero with 95% confidence (or a 5% chance of Type I error) if  Zero lies outside the 95% confidence interval;  The absolute value of the associated t-statistic is larger than 2; or  The p-value reported with the t-statistic is less than 0.05.

29 Copyright © 2014, 2011 Pearson Education, Inc. 29 4M Example 21.1: LOCATING A FRANCHISE OUTLET Motivation Does traffic volume affect gasoline sales? How much more gasoline can be expected to be sold at a franchise location with an average of 40,000 drive-bys compared to one with an average of 32,000 drive-bys?

30 Copyright © 2014, 2011 Pearson Education, Inc. 30 4M Example 21.1: LOCATING A FRANCHISE OUTLET Method Use sales data from a recent month obtained from 80 franchise outlets. The 95% confidence interval for 8,000 times the estimated slope will indicate how much more gas is expected to sell at the busier location.

31 Copyright © 2014, 2011 Pearson Education, Inc. 31 4M Example 21.1: LOCATING A FRANCHISE OUTLET Method Association is linear; no obvious lurking variable.

32 Copyright © 2014, 2011 Pearson Education, Inc. 32 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics

33 Copyright © 2014, 2011 Pearson Education, Inc. 33 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics Residual plot confirms similar variances.

34 Copyright © 2014, 2011 Pearson Education, Inc. 34 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics Residuals appear normally distributed.

35 Copyright © 2014, 2011 Pearson Education, Inc. 35 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics The 95% confidence interval for β 1 is approximately 0.188 to 0.285 gallons/car. Hence, a difference of 8,000 cars in daily traffic volume implies a difference in average daily sales of approximately 1,507 to 2,281 more gallons per day.

36 Copyright © 2014, 2011 Pearson Education, Inc. 36 4M Example 21.1: LOCATING A FRANCHISE OUTLET Message Based on a sample of 80 gas stations, we expect that a station located at a site with 40,000 drive-bys will sell, on average, from 1,507 to 2,281 more gallons of gas daily than a location with 32,000 drive bys.

37 Copyright © 2014, 2011 Pearson Education, Inc. 37 21.4 Prediction Intervals Leveraging the SRM  Prediction interval: an interval designed to hold a fraction (usually 95%) of the values of the response for a given value of x.  A prediction interval differs from a confidence interval because it makes a statement about the location of a new observation rather than a parameter of a population.

38 Copyright © 2014, 2011 Pearson Education, Inc. 38 21.4 Prediction Intervals Leveraging the SRM The 95% prediction interval for y new is where and

39 Copyright © 2014, 2011 Pearson Education, Inc. 39 21.4 Prediction Intervals Leveraging the SRM  A simple approximation for a 95% prediction interval is.  Prediction intervals are reliable within the range of observed data. They are also sensitive to the assumptions of constant variance and normality.

40 Copyright © 2014, 2011 Pearson Education, Inc. 40 21.4 Prediction Intervals Leveraging the SRM – Production Time Example At x = 300 units, = 904.65 minutes. The resulting 95% prediction interval is [660.9 to 1,148.4] minutes.

41 Copyright © 2014, 2011 Pearson Education, Inc. 41 21.4 Prediction Intervals Leveraging the SRM – Production Time Example 95% prediction intervals hold about 95% of the data if the SRM holds.

42 Copyright © 2014, 2011 Pearson Education, Inc. 42 21.4 Prediction Intervals Reliability of Prediction Intervals Prediction intervals fail when the SRM does not hold. This is the problem with extrapolation.

43 Copyright © 2014, 2011 Pearson Education, Inc. 43 4M Example 21.2: MANAGING NATURAL RESOURCES Motivation In managing commercial fishing fleets, the level of effort (number of boat-days) is assumed to influence the size of the catch. What is the predicted crab catch in a season with 7,500 days of effort?

44 Copyright © 2014, 2011 Pearson Education, Inc. 44 4M Example 21.2: MANAGING NATURAL RESOURCES Method Use regression with Y equal to the catch near Vancouver Island from 1980 – 2007 measured in thousands of pounds of Dungeness crabs with X equal to the level of effort (total number of days by boats catching Dungeness crabs).

45 Copyright © 2014, 2011 Pearson Education, Inc. 45 4M Example 21.2: MANAGING NATURAL RESOURCES Method Linear association is evident.

46 Copyright © 2014, 2011 Pearson Education, Inc. 46 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics

47 Copyright © 2014, 2011 Pearson Education, Inc. 47 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Evidently independent.

48 Copyright © 2014, 2011 Pearson Education, Inc. 48 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Similar variances confirmed.

49 Copyright © 2014, 2011 Pearson Education, Inc. 49 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Nearly normal condition could be satisfied.

50 Copyright © 2014, 2011 Pearson Education, Inc. 50 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics The t-statistic (and p-value) indicate that the slope is significantly different from zero. The predicted catch in a year with x = 7500 days of effort is 1,173.25 thousand pounds. The exact 95% prediction interval (from software) is from 908.44 to 1,438.11 thousand pounds.

51 Copyright © 2014, 2011 Pearson Education, Inc. 51 4M Example 21.2: MANAGING NATURAL RESOURCES Message There is a statistically significant linear association between days of effort and total catch. On average, each additional day of effort (per boat) increases the harvest by about 160 pounds. In a season with 7,500 days of effort, there is an expected total harvest of about 1.2 million pounds. There is a 95% probability that the catch will be between 910,000 and 1.4 million pounds.

52 Copyright © 2014, 2011 Pearson Education, Inc. 52 Best Practices  Verify that your model makes sense, both visually and substantively.  Consider other possible explanatory variables.  Check the conditions, in the listed order.

53 Copyright © 2014, 2011 Pearson Education, Inc. 53 Best Practices (Continued)  Use confidence intervals to express what you know about the slope and intercept.  Check the assumptions of the SRM carefully before using prediction intervals.  Be careful when extrapolating.

54 Copyright © 2014, 2011 Pearson Education, Inc. 54 Pitfalls  Don’t overreact to residual plots.  Do not mistake varying amounts of data for unequal variances.  Do not confuse confidence intervals with prediction intervals.  Do not expect that r 2 and s e must improve with a larger sample.


Download ppt "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model."

Similar presentations


Ads by Google