Model validation and prediction

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Linear Regression and Correlation
STA291 Statistical Methods Lecture 27. Inference for Regression.
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
BPS - 5th Ed. Chapter 231 Inference for Regression.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Stats Methods at IC Lecture 3: Regression.
23. Inference for regression
Inference for Regression
Regression and Correlation
Statistical models, estimation, and confidence intervals
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Correlation, Bivariate Regression, and Multiple Regression
Model validation and prediction
Statistics for Managers using Microsoft Excel 3rd Edition
Inferences for Regression
Inference for Regression
(Residuals and
Slides by JOHN LOUCKS St. Edward’s University.
The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 8 Part 2 Linear Regression
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
The normal distribution
PENGOLAHAN DAN PENYAJIAN
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
SIMPLE LINEAR REGRESSION
CHAPTER 12 More About Regression
SIMPLE LINEAR REGRESSION
Inferences for Regression
Linear Regression and Correlation
St. Edward’s University
Presentation transcript:

Model validation and prediction Chapter 7 Model validation and prediction

Example: Stearic acid and digestibility Digestibility of fat for different proportions of stearic acid in the fat. The line is y = −0.93· x + 96.53.

Example: Stearic acid and digestibility Residuals for the dataset on digestibility and stearic acid. The vertical lines between the model (the straight line) and the observations are the residuals.

Residual standard error The sample standard error (SE) measures the average distance from the observations to the predicted. In linear regression the residuals measure the distance from the observed value to the predicted value. Thus, we can calculate the standard error of the residuals. We can use it to describe the effectiveness of our prediction—if the residual standard deviation is small then the observations are generally closer to the predicted line, and they are further away if the residual standard deviation is large.

Residual analysis The residuals are standardized with their standard error: The standardized residuals are standardized such that they resemble the normal distribution with mean zero and standard deviation one—if the model assumptions hold. Models are usually validated with a residual plot.

Example: Stearic acid and digestibility Residual analysis for the digestibility data: residual plot (left) and QQ-plot (right) of the standardized residuals. The straight line has intercept zero and slope one.

Model validation based on residuals Plot the standardized residuals against the predicted values. The points should be spread randomly in the vertical direction, without any systematic patterns. In particular, points should be roughly equally distributed between positive and negative values in all parts of the plot (from left to right). there should be roughly the same variation in the vertical direction in all parts of the plot (from left to right). there should be no too extreme points. Systematic deviations correspond to problems with the mean structure, the variance homogeneity, or the normal distribution, respectively.

Example: Stearic acid and digestibility There seem to be both positive and negative residuals in all parts of the plot (from left to right; for small, medium, as well as large predicted values). This indicates that the specification of the digestibility mean as a linear function of the stearic acid level is appropriate. There seems to be roughly the same vertical variation for small, medium, and large predicted values. This indicates that the standard deviation is the same for all observations. There are neither very small nor very large standardized residuals This indicates that there are no outliers and that it is not unreasonable to use the normal distribution.

Example: Growth of duckweed Top panel shows the original duckweed data. Bottom left shows the data and fitted regression line after logarithmic transformation and bottom right shows the fitted line transformed back to the original scale.

Example: Growth of duckweed Residual plots for the duckweed data. Left panel: linear regression with the leaf counts as response. Right panel: linear regression with the logarithmic leaf counts as response.

Example: Chlorophyll concentration Upper left panel: scatter plot of the data. Remaining panels: residual plots for the regression of nitrogen concentration (N) predicted by chlorophyll content (C) in the plants (upper right), for the regression of log (N) on C (lower left), and for the regression of the square root of N (lower right).

Confidence interval for prediction The expected value of prediction is obtained by the model with the estimates of intercept and the slope: It takes into account the estimation error and thus gives rise to the confidence interval for the expected value y0 = α+β x0.

Prediction interval However, y0 is subject to observation error. The observational error has standard deviation σ, and the prediction interval should take this source of variation into account, too. Intuitively, this corresponds to adding s to the residual standard error. Hence, the 95% prediction interval is computed as follows: The interpretation is that a (new) random observation with x = x0 will belong to this interval with probability 95%.

Confidence and prediction intervals Interpretation. The confidence interval includes the expected values that are in accordance with the data (with a certain degree of confidence), whereas a new observation will be within the prediction interval with a certain probability. Interval widths. The prediction interval is wider than the corresponding confidence interval. Dependence on sample size. The confidence interval can be made as narrow as we want by increasing the sample size. This is not the case for the prediction interval.

Example: Stearic acid and digestibility Predicted values (solid line), pointwise 95% prediction intervals (dashed lines), and pointwise 95% confidence intervals (dotted lines) for the digestibility data. The prediction intervals are wider than the confidence intervals. Also notice that the confidence bands and the prediction bands are not straight lines: the closer x0 is to the mean value, the more precise the prediction—reflecting that there is more information close to the mean.