Inference for Regression

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Copyright © 2010 Pearson Education, Inc. Slide
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Objectives (BPS chapter 24)
Inference for Regression 1Section 13.3, Page 284.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Linear Regression Example Data
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Lecture 5 Correlation and Regression
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Inference for regression - Simple linear regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Stats Methods at IC Lecture 3: Regression.
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inferences for Regression
Inference for Regression
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Correlation and Simple Linear Regression
Inference in Linear Models
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Inferences for Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Inference for Regression STAT E-150 Statistical Methods Inference for Regression

We have discussed how to find a simple linear regression model to make predictions. Now we will investigate our model further:   - How do we evaluate the effectiveness of the model? - How do we know that the relationship is significant? - How much of the variability in the response variable can be explained by its relationship to the predictor? The simple linear model is of the form y = β1x + β0 + ε where ε ~ N(0, σε). Recall that inference methods can address questions about the population based on our sample data.

Consider our earlier example: Medical researchers have noted that adolescent females are more likely to deliver low-birthweight babies than are adult females. Because LBW babies tend to have higher mortality rates, studies have been conducted to examine the relationship between birthweight and the mother’s age. We found the fitted regression model

How can we assess this model? If the slope β1 is equal to zero, then there is no change in the response variable associated with a change in the predictor. We will therefore test the value of the slope to investigate whether there is a linear relationship between the two variables

  T-test for the Slope of a Simple Linear Model H0: β1=0 Ha: β1≠0 The test statistic is with n-2 degrees of freedom.

Assumptions for the model and the errors:   1. Linearity Assumption Straight Enough Condition: does the scatterplot appear linear? Check the residuals to see if they appear to be randomly scattered Quantitative Data Condition: Is the data quantitative?

Assumptions for the model and the errors:   2. Independence Assumption: the errors must be mutually independent Randomization Condition: the individuals are a random sample Check the residuals for patterns, trends, clumping

Assumptions for the model and the errors:   3. Equal Variance Assumption: the variability of y should be about the same for all values of x Does The Plot Thicken? Condition: Is the spread about the line nearly constant in the scatterplot? Check the residuals for any patterns

Assumptions for the model and the errors:   4. Normal Population Assumption: the errors follow a Normal model at each value of x Nearly Normal Condition: Look at a histogram or NPP of the residuals

We can check the conditions to see if all assumptions are true. If this is the case, the idealized regression line will have a distribution of y-values for each x-value, and these distributions will be approximately normal with equal variation and with means along the regression line:

Here are the steps: Create a scatterplot to see if the data is “straight enough”. Fit a regression model and find the residuals (ε) and predicted values ( ). Draw a scatterplot of the residuals vs. x or ; this should have no pattern or bend, or thickening, or thinning, or outliers. If the scatterplot is “straight enough”, create a histogram and NPP of the residuals to check the “nearly normal” condition. Continue with the inference if all conditions are reasonably satisfied.

Here are the results for our data: - The scatterplot of the data indicates a positive linear relationship between the variables  

- The scatterplot of the residuals shows no particular pattern  

- The Normal Probability Plot for the residuals indicates a Normal - The Normal Probability Plot for the residuals indicates a Normal distribution 

Here is our data: H0: Ha: Observation 1 2 3 4 5 6 7 8 9 10   Ha: Observation 1 2 3 4 5 6 7 8 9 10 Maternal Age (in years) 15 17 18 16 19 Birthweight (in grams) 2289 3393 3271 2648 2897 3327 2970 2535 3138 3573

Here is our data: H0: β1 = 0 Ha: β1 ≠ 0 Observation 1 2 3 4 5 6 7 8 9   Ha: β1 ≠ 0 Observation 1 2 3 4 5 6 7 8 9 10 Maternal Age (in years) 15 17 18 16 19 Birthweight (in grams) 2289 3393 3271 2648 2897 3327 2970 2535 3138 3573

The SPSS output included this table: What does this value represent?  

The SPSS output included this table: -1163.45 What does this value represent?  

The SPSS output included this table: -1163.45 What does this value represent? This is the observed (sample) value of the y-intercept  

The SPSS output included this table: What does this value represent?  

The SPSS output included this table: 245.15 What does this value represent?  

The SPSS output included this table: 245.15 What does this value represent? This is the observed (sample) value of the slope  

The SPSS output included this table: What does this value represent?  

The SPSS output included this table: 45.908 What does this value represent?  

The SPSS output included this table: 45.908 What does this value represent? This is the standard error for the slope; this is how much we expect the sample slope to vary from one sample to another. 

What is the value of the test statistic?  

5.34 What is the value of the test statistic?  

5.34 What is the value of the test statistic? 5.34  

What is the p-value?  

What is the p-value? p = .001  

What is the p-value? p = .001   What is your decision?

What is the p-value? p = .001   What is your decision? Since p is small, reject the null hypothesis

What is the p-value? p = .001   What is your decision? Since p is small, reject the null hypothesis What can you conclude?

What is the p-value? p = .001   What is your decision? Since p is small, reject the null hypothesis What can you conclude? The data indicates that there is a linear relationship between the mother’s age and the baby’s birthweight.

Confidence Interval for the Slope   The slope of the population regression line, , is the rate of change of the mean response as the explanatory variable increases. The slope of the least squares line, , is an estimate of . A confidence interval for the slope will show how accurate the estimate is.

What is the confidence interval for the slope of the regression line?

What is the confidence interval for the slope of the regression line? (139.285, 351.015)

Confidence Interval for the Slope of a Simple Linear Model   The confidence interval has the form where t* is the critical value for the tn-2 density curve to obtain the desired confidence level. How can we construct this?

Confidence Interval for the Slope of a Simple Linear Model   The confidence interval has the form We know that = 245.15 and that = 45.908, but how do we find t*?

Confidence Interval for the Slope of a Simple Linear Model   The confidence interval has the form We know that df = n - 2 = 8. On the line for df=8, the value of t for a 95% confidence interval is 2.306.

Calculate = 245.15 ± 2.306(45.908)   = 245.15 ± 105.86 = (139.29, 351.01)

Calculate = 245.15 ± 2.306(45.908)   = 245.15 ± 105.86 = (139.29, 351.01) What is the confidence interval found by SPSS?

Calculate = 245.15 ± 2.306(45.908)   = 245.15 ± 105.86 = (139.29, 351.01) What is the confidence interval found by SPSS?

Calculate = 245.15 ± 2.306(45.908)   = 245.15 ± 105.86 = (139.29, 351.01) What does this interval tell us?

Calculate = 245.15 ± 2.306(45.908)   = 245.15 ± 105.86 = (139.29, 351.01) What does this interval tell us? Based on the sample data, we are 95% confident that the true average increase in the weight of the baby associated with a one-year increase in age of the mother is between 139.29 and 351.01 g.

Partitioning Variability - ANOVA   ANOVA measures the effectiveness of the model by measuring how much of the variability in the response variable y is explained by the predictions based on the fitted model. We can partition this variability into two parts: the variability explained by the model, and the unexplained variability due to error, as measured by the residuals.

In our SPSS output,   SS(Model) = SS(Error) = SS(Total) =

In our SPSS output,   SS(Model) = 1201970.45 SS(Error) = 337212.45 SS(Total) = 1539182.9

What is the value of the test statistic?   What is the p-value? Decision: Conclusion:

What is the value of the test statistic? 28.515   What is the p-value? Decision: Conclusion:

What is the value of the test statistic? 28.515   What is the p-value? .001 Decision: Conclusion:

What is the value of the test statistic? 28.515   What is the p-value? .001 Decision: Since p is small, reject the null hypothesis Conclusion: The data indicates that there is a linear relationship between the mother’s age and the baby’s birthweight.

The Coefficient of Determination   This represents the amount of variation in the response variable can be explained by the model: In our example, This tells us that 75.44% of the variation in the birthweights can be explained by the model.

Inference for Correlation   There is a relationship between the correlation between the variables and the slope of the least squares line: So testing the value of the slope is the same as testing that there is no correlation between the variables; that is, that the correlation coefficient = 0.

T-Test for Correlation   H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is If the conditions of the simple linear model hold, we find the p-value using the t-distribution with n-2 degrees of freedom.

The three tests we have discussed are equivalent; it can be shown that the test statistic F is the square of the test statistic t: In our t-test, we found the test statistic t = 5.34 In the ANOVA, we found the test statistic F = 28.515 t2 = (5.34)2 = 28.5156 = F Also, for both tests we found p = .001

Standard Errors for Predicted Values Let x be a specific value of x. The predicted value of y is We can create two different intervals: a prediction interval for an individual value of x a confidence interval for the mean predicted value at x υ = “nu”

The basic format for an interval is When we want to find a mean predicted value, When we want to find an individual predicted value, Since individual values vary more than means, and the prediction interval will be wider than the confidence interval.