Lecture 8 Simple Linear Regression (cont.). Section 10.1. Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Objectives 10.1 Simple linear regression
BA 275 Quantitative Business Methods
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
CHAPTER 24: Inference for Regression
Objectives (BPS chapter 24)
Inference for Regression 1Section 13.3, Page 284.
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple regression analysis
Chapter 10 Simple Regression.
Chapter 12 Multiple Regression
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Chapter 10 Inference for Regression
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 4: Inference about regression Priyantha Wijayatunga,
23. Inference for regression
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
AP Statistics Chapter 14 Section 1.
Inference for Regression
Chapter 11: Simple Linear Regression
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Presentation transcript:

Lecture 8 Simple Linear Regression (cont.)

Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation of the parameters Confidence intervals and significance tests Confidence intervals for mean response vs. Prediction intervals (for future observation)

Settings of Simple Linear Regression Now we will think of the least squares regression line computed from the sample as an estimate of the true regression line for the population. Different Notations than Ch. 2.Think b 0 =a, b 1 =b.

The statistical model for simple linear regression: Data: n observations in the form (x 1, y 1 ), (x 2, y 2 ), … (x n, y n ). The deviations are assumed to be independent and normally distributed with mean 0 and constant standard deviation . The parameters of the model are:  0,  1, and .

ANOVA: groups with same SD and different means:

Linear regression: many groups with means depending linearly on quantitative x

Example: 10.1 page 636 See R code.

Verifying the Conditions for inference: Look to the errors. They are supposed to be: -independent, normal and have the same variance. The errors are estimated using residuals: (y − ŷ)

Residual plot: The spread of the residuals is reasonably random—no clear pattern. The relationship is indeed linear. But we see one low residual (3.8, −4) and one potentially influential point (2.5, 0.5). Normal quantile plot for residuals: The plot is fairly straight, supporting the assumption of normally distributed residuals.  Data okay for inference.

Confidence interval for regression parameters Estimating the regression parameters  0,  1 is a case of one-sample inference with unknown population variance.  We rely on the t distribution, with n – 2 degrees of freedom. A level C confidence interval for the slope,  1, is proportional to the standard error of the least-squares slope: b 1 ± t* SE b1 A level C confidence interval for the intercept,  0, is proportional to the standard error of the least-squares intercept: b 0 ± t* SE b0 t* is the critical value for the t (n – 2) distribution with area C between –t* and +t*.

Significance test for the slope We can test the hypothesis H 0 :  1 = 0 versus a 1 or 2 sided alternative. We calculatet = b 1 / SE b1 which has the t (n – 2) distribution to find the p-value of the test. Note: Software typically provides two-sided p-values.

Testing the hypothesis of no relationship We may look for evidence of a significant relationship between variables x and y in the population from which our data were drawn. For that, we can test the hypothesis that the regression slope parameter β is equal to zero. H 0 : β 1 = 0 vs. H 0 : β 1 ≠ 0 Testing H 0 : β 1 = 0 also allows to test the hypothesis of no correlation between x and y in the population. Note: A test of hypothesis for  0 is irrelevant (  0 is often not even achievable).

Using technology Computer software runs all the computations for regression analysis. Here is software output for the car speed/gas efficiency example. p-values for tests of significance The t-test for regression slope is highly significant (p < 0.001). There is a significant relationship between average car speed and gas efficiency. To obtain confidence intervals use the function confint() Slope Intercept

Exercise: Calculate (manually) confidence intervals for the mean increase in gas consumption with every unit of (logmph) increase. Compare with software. confint(model.2_logmodel) 2.5 % 97.5 % LOGMPH

Confidence interval for µ y Using inference, we can also calculate a confidence interval for the population mean μ y of all responses y when x takes the value x* (within the range of data tested): This interval is centered on ŷ, the unbiased estimate of μ y. The true value of the population mean μ y at a given value of x, will indeed be within our confidence interval in C% of all intervals calculated from many different random samples.

The level C confidence interval for the mean response μ y at a given value x* of x is centered on ŷ (unbiased estimate of μ y ): ŷ ± t n − 2 * SE  ^ A separate confidence interval is calculated for μ y along all the values that x takes. Graphically, the series of confidence intervals is shown as a continuous interval on either side of ŷ. t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*. 95% confidence interval for  y

Inference for prediction One use of regression is for predicting the value of y, ŷ, for any value of x within the range of data tested: ŷ = b 0 + b 1 x. But the regression equation depends on the particular sample drawn. More reliable predictions require statistical inference: To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of y obtained for a particular x following N(0, σ) around the mean response µ y.

The level C prediction interval for a single observation on y when x takes the value x* is: C ± t* n − 2 SE ŷ The prediction interval represents mainly the error from the normal distribution of the residuals  i. Graphically, the series confidence intervals is shown as a continuous interval on either side of ŷ. 95% prediction interval for ŷ t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*.

 The confidence interval for μ y contains with C% confidence the population mean μ y of all responses at a particular value of x.  The prediction interval contains C% of all the individual values taken by y at a particular value of x. 95% prediction interval for ŷ 95% confidence interval for  y Estimating  y uses a smaller confidence interval than estimating an individual in the population (sampling distribution narrower than population distribution).

1918 flu epidemics The line graph suggests that 7 to 9% of those diagnosed with the flu died within about a week of diagnosis. We look at the relationship between the number of deaths in a given week and the number of new diagnosed cases one week earlier.

EXCEL Regression Statistics Multiple R R Square Adjusted R Square 0.82 Standard Error Observations Coefficients St. Error t Stat P-value Lower 95% Upper 95% Intercept (14.720) FluCases flu epidemic: Relationship between the number of deaths in a given week and the number of new diagnosed cases one week earlier. s r = 0.91 P-value for H 0 : β 1 = 0 b1b1 P-value very small  reject H 0  β 1 significantly different from 0 There is a significant relationship between the number of flu cases and the number of deaths from flu a week later.

Least squares regression line 95% prediction interval for ŷ 95% confidence interval for  y CI for mean weekly death count one week after 4000 flu cases are diagnosed: µ y within about 300–380. Prediction interval for a weekly death count one week after 4000 flu cases are diagnosed: ŷ within about 180–500 deaths.

What is this? A 90% prediction interval for the height (above) and a 90% prediction interval for the weight (below) of male children, ages 3 to 18.