Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Simple Linear Regression Analysis
CHAPTER 14: Confidence Intervals: The Basics
Objectives 10.1 Simple linear regression
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression Analysis
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Correlation & Regression
Active Learning Lecture Slides
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
CHAPTER 12 More About Regression
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inferences for Regression
Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Making Inferences about Slopes
Inferences for Regression
Inference for Regression
Presentation transcript:

Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University

Homework 8 (due Monday, 4/9) Homework 8 Project 2 Proposal (due Wednesday, 4/11) Project 2 Proposal To Do

Hypothesis Test > 2*pt(16.21,5,lower.tail=FALSE) [1] e-05 There is strong evidence that the slope is significantly different from 0, and that there is an association between cricket chirp rate and temperature.

Test for a correlation between temperature and cricket chirps (r = ). Correlation Test > 2*pt(16.21,5,lower.tail=FALSE) [1] e-05

Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for an association between two quantitative variables.

Simple linear regression estimates the population model with the sample model: Simple Linear Regression

Inference based on the simple linear model is only valid if the following conditions hold: 1)Linearity 2)Constant Variability of Residuals 3)Normality of Residuals 4)Independence Conditions

The relationship between x and y is linear (it makes sense to draw a line through the scatterplot) Linearity

Dog Years 1 dog year = 7 human years Linear: human age = 7×dog age Happy Birthday Charlie! From “The old rule-of-thumb that one dog year equals seven years of a human life is not accurate. The ratio is higher with youth and decreases a bit as the dog ages.” LINEAR ACTUAL A linear model can still be useful, even if it doesn’t perfectly fit the data.

“All models are wrong, but some are useful” -George Box

Simple Linear Model

Residual Plot A residual plot is a scatterplot of the residuals against the predicted responses Should have: 1)No obvious pattern or trend 2)Constant variability

Residual Plots Obvious patternVariability not constant

Residuals (errors) The errors are normally distributed The average of the errors is 0 The standard deviation of the errors is constant for all cases Conditions for residuals: Check with a histogram (Always true for least squares regression) Constant vertical spread in the residual plot

The independence assumption is that the residuals are independent from case to case The residual of one case is not associated with the residual of another case This condition is confirmed by thinking about the data, not necessarily by looking at plot Can you think of data for which the independence assumption doesn’t hold? Independence

Data collected over time usually violates the independent condition, because the residual of one case may be similar to the residuals of the cases directly before or after Dow Jones index over the past month: Data over Time

If the association isn’t linear:  Try to make it linear (next week)  If can’t make linear, then simple linear regression isn’t a good fit for the data If variability is not constant, residuals are not normal, or cases are not independent:  The model itself is still valid, but inference may not be accurate Conditions not Met?

1) Plot your data! (linear association?) 2) Fit the model (least squares) 3) Use the model (interpret coefficients and/or make predictions) 4) Look at residual plot (no obvious pattern? constant variability?) 5) Look at histogram of residuals (normal?) 6) Cases independent? 7) Inference (extend to population) Simple Linear Regression

Can we use Obama’s approval rating to predict his margin of victory (or defeat) when he runs for re-election? Data on all* incumbent U.S. presidential candidates since 1940 (11 cases) Response: margin of victory (or defeat) Explanatory: approval rating President Approval and Re-Election Source: Silver, Nate, “Approval Ratings and Re-Election Odds", fivethirtyeight.com, posted on 1/28/11. *Except 1944 because Gallup went on a wartime hiatus

President Approval and Re-Election 1. Plot the data: Is the trend linear? (a)Yes(b) No

President Approval and Re-Election 2. Fit the Model:

President Approval and Re-Election 3. Use the model: Which of the following is a correct interpretation? a)For every percentage point increase in margin of victory, approval increases by 0.84 percentage points b)For every percentage point increase in approval, predicted margin of victory increases by 0.84 percentage points c)For every 0.84 increase in approval, predicted margin of victory increases by 1

President Approval and Re-Election 3. Use the model: Obama’s current (based on polls 3/26/12 – 4/1/12) approval rating is 46%. Based only on this information, do you think Obama will a)Win b)Lose Source: presidential-job-approval.aspxhttp:// presidential-job-approval.aspx Predicted margin of victory = *46 = 2.14, which is positive

President Approval and Re-Election 4. Look at residual plot: Is there no obvious trend? (a) Yes (b) No Is the variability constant? (a) Yes (b) No

President Approval and Re-Election 5. Look at histogram of residuals: Are the residuals approximately normally distributed? (a) Yes (b) No

President Approval and Re-Election 6. Cases independent? Are the cases independent? a)Yes b)No

President Approval and Re-Election 7. Inference Should we do inference? (a) Yes (b) No Due to the non-constant variability, the non- normal residuals, and the small sample size, our inferences may not be entirely accurate. However, they are still better than nothing!

President Approval and Re-Election 7. Inference Give a 95% confidence interval for the slope coefficient. Is it significantly different than 0? (a) Yes (b) No

President Approval and Re-Election 7. Inference: We don’t really care about the slope coefficient, we care about Obama’s margin of victory in the upcoming election! How do we create a 95% interval predicting Obama’s margin of victory???

We would like to use the regression equation to predict y for a certain value of x This includes not only a point estimate, but interval estimates also We will predict the value of y at x = x* Prediction

The point estimate for the average y value at x=x* is simply the predicted value: Alternatively, you can think of it as the value on the line above the x value The uncertainty in this point estimate comes from the uncertainty in the coefficients Point Estimate

We can calculate a confidence interval for the average y value for a certain x value “We are 95% confident that the average y value for x=x* lies in this interval” Equivalently, the confidence interval is for the point estimate, or the predicted value This is the amount the line is free to “wiggle,” and the width of the interval decreases as the sample size increases Confidence Intervals

We need a way to assess the uncertainty in predicted y values for a certain x value… any ideas? Take repeated samples, with replacement, from the original sample data (bootstrap) Each sample gives a slightly different fitted line If we do this repeatedly, take the middle P% of predicted y values at x* for a confidence interval of the predicted y value at x* Bootstrapping

Bootstrap CI Middle 95% of predicted values gives the confidence interval for average (predicted) margin of victory for an incumbent president with an approval rating of 46%

Confidence Interval

For x = 46%: (-2.89, 6.80) For an approval rating of 46%, we are 95% confident that the average margin of victory for incumbent U.S. presidents is between and 6.80 percentage points But wait, this still doesn’t tell us about Obama! We don’t care about the average, we care about an interval for one incumbent president with an approval rating of 46%!

We can also calculate a prediction interval for y values for a certain x value “We are 95% confident that the y value for x = x* lies in this interval” This takes into account the variability in the line (in the predicted value) AND the uncertainty around the line (the random errors) Prediction Intervals

Intervals

A confidence interval has a given chance of capturing the mean y value at a specified x value A prediction interval has a given chance of capturing the y value for a particular case at a specified x value For a given x value, which will be wider? a)Confidence interval b)Prediction interval Intervals

--- Prediction --- Confidence

As the sample size increases: the standard errors of the coefficients decrease we are more sure of the equation of the line the widths of the confidence intervals decrease for a huge n, the width of the CI will be almost 0 The prediction interval may be wide, even for large n, and depends more on the correlation between x and y (how well y can be linearly predicted by x) Intervals

Prediction Interval Based on the data and the simple linear model (and using Obama’s current approval rating): The predicted margin of victory for Obama is 2.14 percentage points We are 95% confident that Obama’s margin of victory (or defeat) will be between and percentage points