Regression model Y represents a value of the response variable.

Regression model Y represents a value of the response variable.
represents the population mean response for a given value of the explanatory variable, x. represents the random error For each individual in our population we can model the value of the variable of interest (in our example global temperature) as being a population mean value that is related to the carbon dioxide concentration plus some random error (the random error could be a positive or negative values).

Linear Regression Model
The Y-intercept parameter. The slope parameter. The linear model says that the mean response for values of x is linearly related to the x values. The linear relationship has a Y-intercept parameter and a slope parameter.

Residual (Observed Y – Fitted Y) Fit Residual
The linear model says that the mean response for values of x is linearly related to the x values. The linear relationship has a Y-intercept parameter and a slope parameter. The residual is the difference between an observed value of the response and the fitted value.

Conditions The relationship is linear. The random error term, , is
Independent Identically distributed Normally distributed with standard deviation, . The relationship being linear will be looked at separately from the distributional conditions. These are the usual normal model conditions mentioned in Stat They are really no different from the one-sample, or two-sample model conditions. The last condition is actually two conditions in one. The first is the errors should be normally distributed. The second is that all error terms have a constant standard deviation (variance). We will come back to these later, after we have fit a linear model to the data and examined its usefulness. Draw picture on the blackboard.

Residual vs. Explanatory
To begin to examine whether the conditions are satisfied we need to look at different plots of residuals. The first plot is residuals versus values of the explanatory variable. This is used to assess whether or not a linear model is adequate for explaining the relationship between the explanatory and the response variables.

Residual vs. Predicted

Interpretation Random scatter around the zero line indicates that the linear model is adequate for the relationship between carbon dioxide and temperature. A random scatter indicates that a linear model is adequate, as good as we can do.

Patterns Over/Under/Over or Under/Over/Under
The linear model may not be adequate. We could do better by accounting for curvature with a different model. If the plot shows a distinct pattern, it may indicate that a different model should be used. For example an over/under/over or under/over/under pattern can indicate that a model that accounts for curvature would be better.

Speed and Stopping Distance
The linear relationship between speed of a car and the stopping distance for that speed is not adequate. A curved relationship would fit the data better.

Patterns Two, or more, groups
May require separate regression models for each group. If the plot reveals two, or more, different groups, then this may suggest that separate regression models may do a better job.

Gas used vs. Temperature
The linear relationship between the amount of natural gas used and the outdoor temperature is not adequate because some of the data comes from before the house was insulated and some from after. Separate regressions, one for before insulation and one for after insulation was added would fit the data better.

Checking Conditions Independence.
Hard to check this but the fact that we obtained the data through a random sample of years assures us that the statistical methods should work.

Checking Conditions Identically distributed.
Check using an outlier box plot. Unusual points may come from a different distribution Check using a histogram. Bi-modal shape could indicate two different distributions. The outlier box plot sets up “fences” beyond which individual values are considered unusual when compared to the rest of the sample.

Checking Conditions Normally distributed.
Check with a histogram. Symmetric and mounded in the middle. Check with a normal quantile plot. Points falling close to a diagonal line. Histograms can be misleading. Different groupings (bar placements) can different impressions. Do not always rely on the default histogram given to you by JMP. You may have to fool with the horizontal axis settings or use the grabber tool. The normal quantile plot is a more reliable means of assessing whether the sample could have come from a normal distribution.

Residuals from the Temperature vs. CO2 data
Residuals from the Temperature vs. CO2 data. For each observed Temp value subtract of the predicted Temp value from the fitted linear regression model.. Display all 20 residuals in one analysis.

Residuals Histogram is skewed right and mounded to the left of zero.
Box plot is skewed right with no unusual points. Normal quantile plot has points that do not follow the diagonal, normal model, line very well.

Checking Conditions Constant variance.
Check the plot of residuals versus the explanatory or predicted. Points should show the same spread for all values of the explanatory variable. Histograms can be misleading. Different groupings (bar placements) can different impressions. Do not always rely on the default histogram given to you by JMP. You may have to fool with the horizontal axis settings or use the grabber tool. The normal quantile plot is a more reliable means of assessing whether the sample could have come from a normal distribution.

Non-constant variance
Residual There is more variation for larger explanatory (predicted) values and less variability for small explanatory (predicted) values. This means that predictions for smaller values of the explanatory variable will be more precise, closer to the true value, than predictions for larger values of the explanatory variable. Explanatory or Predicted

Residual vs. Explanatory
Note that if you have a megaphone pattern, this indicates unequal variation for different values of the explanatory variable. So predictions for some values of the explanatory variable will be better than predictions for others.

Residual vs. Predicted

Constant Variance Points show about the same amount of spread for all values of the explanatory variable. Draw on the blackboard a picture of a residual plot that would indicate a non-constant variance.

Conclusion The independence, identically distributed and common variance conditions appear to be satisfied. The normal distribution condition may not be met for these data.

Consequences The P-values for tests may not be correct.
However, the P-value was so small, there is still strong evidence for a linear relationship between carbon dioxide and temperature.

Consequences The stated confidence level may not give the true coverage rate. We have confidence in the intervals, it may not be 95%.

Regression model Y represents a value of the response variable.

Similar presentations

Presentation on theme: "Regression model Y represents a value of the response variable."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regression model Y represents a value of the response variable.

Similar presentations

Presentation on theme: "Regression model Y represents a value of the response variable."— Presentation transcript:

Similar presentations

About project

Feedback