Regression model Y represents a value of the response variable.

Slides:



Advertisements
Similar presentations
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Objectives (BPS chapter 24)
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Regression Diagnostics - I
1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between.
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Stats Methods at IC Lecture 3: Regression.
Chapter 14: More About Regression
CHAPTER 12 More About Regression
23. Inference for regression
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Statistical Data Analysis - Lecture /04/03
Linear Regression.
Inferences for Regression
Inference for Regression
CHAPTER 12 More About Regression
Chapter 14: Correlation and Regression
(Residuals and
Regression and Residual Plots
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Simple Linear Regression
Residuals The residuals are estimate of the error
Two Independent Samples
Unit 3 – Linear regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Least Squares Regression
Residuals and Residual Plots
Simple Linear Regression
CHAPTER 12 More About Regression
Product moment correlation
CHAPTER 3 Describing Relationships
Indicator Variables Response: Highway MPG
CHAPTER 3 Describing Relationships
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
CHAPTER 12 More About Regression
Inferences for Regression
Algebra Review The equation of a straight line y = mx + b
Chapter 3.2 Regression Wisdom.
Chapter 9 Regression Wisdom.
Presentation transcript:

Regression model Y represents a value of the response variable. represents the population mean response for a given value of the explanatory variable, x. represents the random error For each individual in our population we can model the value of the variable of interest (in our example global temperature) as being a population mean value that is related to the carbon dioxide concentration plus some random error (the random error could be a positive or negative values).

Linear Regression Model The Y-intercept parameter. The slope parameter. The linear model says that the mean response for values of x is linearly related to the x values. The linear relationship has a Y-intercept parameter and a slope parameter.

Residual (Observed Y – Fitted Y) Fit Residual The linear model says that the mean response for values of x is linearly related to the x values. The linear relationship has a Y-intercept parameter and a slope parameter. The residual is the difference between an observed value of the response and the fitted value.

Conditions The relationship is linear. The random error term, , is Independent Identically distributed Normally distributed with standard deviation, . The relationship being linear will be looked at separately from the distributional conditions. These are the usual normal model conditions mentioned in Stat 101. They are really no different from the one-sample, or two-sample model conditions. The last condition is actually two conditions in one. The first is the errors should be normally distributed. The second is that all error terms have a constant standard deviation (variance). We will come back to these later, after we have fit a linear model to the data and examined its usefulness. Draw picture on the blackboard.

Residual vs. Explanatory To begin to examine whether the conditions are satisfied we need to look at different plots of residuals. The first plot is residuals versus values of the explanatory variable. This is used to assess whether or not a linear model is adequate for explaining the relationship between the explanatory and the response variables.

Residual vs. Predicted

Interpretation Random scatter around the zero line indicates that the linear model is adequate for the relationship between carbon dioxide and temperature. A random scatter indicates that a linear model is adequate, as good as we can do.

Patterns Over/Under/Over or Under/Over/Under The linear model may not be adequate. We could do better by accounting for curvature with a different model. If the plot shows a distinct pattern, it may indicate that a different model should be used. For example an over/under/over or under/over/under pattern can indicate that a model that accounts for curvature would be better.

Speed and Stopping Distance The linear relationship between speed of a car and the stopping distance for that speed is not adequate. A curved relationship would fit the data better.

Patterns Two, or more, groups May require separate regression models for each group. If the plot reveals two, or more, different groups, then this may suggest that separate regression models may do a better job.

Gas used vs. Temperature The linear relationship between the amount of natural gas used and the outdoor temperature is not adequate because some of the data comes from before the house was insulated and some from after. Separate regressions, one for before insulation and one for after insulation was added would fit the data better.

Checking Conditions Independence. Hard to check this but the fact that we obtained the data through a random sample of years assures us that the statistical methods should work.

Checking Conditions Identically distributed. Check using an outlier box plot. Unusual points may come from a different distribution Check using a histogram. Bi-modal shape could indicate two different distributions. The outlier box plot sets up “fences” beyond which individual values are considered unusual when compared to the rest of the sample.

Checking Conditions Normally distributed. Check with a histogram. Symmetric and mounded in the middle. Check with a normal quantile plot. Points falling close to a diagonal line. Histograms can be misleading. Different groupings (bar placements) can different impressions. Do not always rely on the default histogram given to you by JMP. You may have to fool with the horizontal axis settings or use the grabber tool. The normal quantile plot is a more reliable means of assessing whether the sample could have come from a normal distribution.

Residuals from the Temperature vs. CO2 data Residuals from the Temperature vs. CO2 data. For each observed Temp value subtract of the predicted Temp value from the fitted linear regression model.. Display all 20 residuals in one analysis.

Residuals Histogram is skewed right and mounded to the left of zero. Box plot is skewed right with no unusual points. Normal quantile plot has points that do not follow the diagonal, normal model, line very well.

Checking Conditions Constant variance. Check the plot of residuals versus the explanatory or predicted. Points should show the same spread for all values of the explanatory variable. Histograms can be misleading. Different groupings (bar placements) can different impressions. Do not always rely on the default histogram given to you by JMP. You may have to fool with the horizontal axis settings or use the grabber tool. The normal quantile plot is a more reliable means of assessing whether the sample could have come from a normal distribution.

Non-constant variance Residual There is more variation for larger explanatory (predicted) values and less variability for small explanatory (predicted) values. This means that predictions for smaller values of the explanatory variable will be more precise, closer to the true value, than predictions for larger values of the explanatory variable. Explanatory or Predicted

Residual vs. Explanatory Note that if you have a megaphone pattern, this indicates unequal variation for different values of the explanatory variable. So predictions for some values of the explanatory variable will be better than predictions for others.

Residual vs. Predicted

Constant Variance Points show about the same amount of spread for all values of the explanatory variable. Draw on the blackboard a picture of a residual plot that would indicate a non-constant variance.

Conclusion The independence, identically distributed and common variance conditions appear to be satisfied. The normal distribution condition may not be met for these data.

Consequences The P-values for tests may not be correct. However, the P-value was so small, there is still strong evidence for a linear relationship between carbon dioxide and temperature.

Consequences The stated confidence level may not give the true coverage rate. We have confidence in the intervals, it may not be 95%.