Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression and Correlation
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Linear Regression and Correlation
Introduction to Probability and Statistics Linear Regression and Correlation.
Ch. 14: The Multiple Regression Model building
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Correlation and Regression Analysis
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Understanding Multivariate Research Berry & Sanders.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Chapter 6 & 7 Linear Regression & Correlation
Introduction to Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Stats Methods at IC Lecture 3: Regression.
Estimating standard error using bootstrap
Bootstrap and Model Validation
CHAPTER 10 Comparing Two Populations or Groups
Regression and Correlation
Inference for Least Squares Lines
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Regression.
Evgeniya Anatolievna Kolomak, Professor
Math 4030 – 12a Correlation.
Simple Linear Regression - Introduction
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Two-Variable Regression Model: The Problem of Estimation
LESSON 24: INFERENCES USING REGRESSION
Multiple Regression Models
CHAPTER- 17 CORRELATION AND REGRESSION
Simple Linear Regression
Regression.
Regression.
J.-F. Pâris University of Houston
BASIC REGRESSION CONCEPTS
Regression.
Regression Chapter 8.
Regression.
Simple Linear Regression
Bootstrapping Jackknifing
SIMPLE LINEAR REGRESSION
Product moment correlation
Simple Linear Regression
Topic 8 Correlation and Regression Analysis
CHAPTER 10 Comparing Two Populations or Groups
Chapter Thirteen McGraw-Hill/Irwin
Techniques for the Computing-Capable Statistician
Presentation transcript:

Cross-Validation vs. Bootstrap Estimates of Prediction Error in Statistical Modeling Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics:

Regression Analysis

Regression Analysis To find the regression line for data (xi, yi), minimize

Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables.

Regression Analysis To find the regression line for data (xi, yi), minimize Estimates linear relationships between dependent and independent variables. Applications: Prediction and Forecasting.

Classical Regression Procedure Choose a model: y = b0 + b1 x1 + b2 x2 + e . Verify assumptions: normality of the data. Fit the model, checking for significance of parameters. Check the model’s predictive capability.

Mean Squared Error of Prediction

Mean Squared Error of Prediction MSEP measures how well a model predicts the response value of a future observation.

Mean Squared Error of Prediction MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is

Mean Squared Error of Prediction MSEP measures how well a model predicts the response value of a future observation. For our regression model, the MSEP of a new observation yn + 1 is Small values of MSEP indicate good predictive capability.

What is Cross-Validation? Divide the data into two sub-samples: Treatment set (to fit the model), Validation set (to assess predictive value). Non-parametric approach: mainly used when normality assumption is not met. Criterion for model’s prediction ability: usually the MSEP statistic.

CV For Linear Regression: The “Withhold-1” Algorithm Use the model: y = b0 + b1 x1 + b2 x2 + e .

CV For Linear Regression: The “Withhold-1” Algorithm Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi).

CV For Linear Regression: The “Withhold-1” Algorithm Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations.

CV For Linear Regression: The “Withhold-1” Algorithm Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate

CV For Linear Regression: The “Withhold-1” Algorithm Use the model: y = b0 + b1 x1 + b2 x2 + e . Withhold one observation (x1i, x2i, yi). Fit the regression model to the remaining n – 1 observations. For each i, calculate Finally, calculate

What is the Bootstrap? The Bootstrap is: A computationally intensive technique, Involves simulation and resampling. Used here to assess the accuracy of statistical estimates for a model: Confidence intervals, Standard errors, Estimate of MSEP.

Algorithm For a Bootstrap

Algorithm For a Bootstrap From a data set of size n, randomly draw B samples with replacement, each of size n.

Algorithm For a Bootstrap From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples:

Algorithm For a Bootstrap From a data set of size n, randomly draw B samples with replacement, each of size n. Find the estimate of MSEP for each of the B samples: Average these B estimates of q to obtain the overall bootstrap estimate:

Schematic Diagram of Bootstrap Θ(x2*) Θ(x1*) Θ(xB*) Bootstrap Samples X1* X2* XB* Resampling Variablity Data X=(x1, x2, …,xn) Sampling Variablity Population F

Application: Heart Measurements on Children Study: Catheterize 12 children with heart defects and take measurements. Variables measured: y: observed catheter length in cm w: patient’s weight in pounds h: patient’s height in inches Goal: To predict y from w and h. Difficulties: Small n, non-normal data.

Model and Fitted Model

Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model:

Model and Fitted Model Model: y = b0 + b1w + b2h + e . Fitted Model: Parameter estimates for the heart data are: b0 estimated as 25.6, b1 estimated as 0.277, b2 term eliminated from model (not useful).

Regression Results Both parameters b0 and b1 are significantly different from 0 (important to the model): p-values: 0.000 (for b0) and 0.000 (for b1) R2 = 80% (of variation in y explained) Once weight is known, height does not provide additional useful information. Example: For a child weighing 50 lbs., the estimated distance is 39.45 cm.

Comparison of CV and Bootstrap MSEP Estimates: CV: MSEP = 18.05 Bootstrap: MSEP = 12.04 (smaller = better) For this example: The Bootstrap has the better prediction capability. In general: CV methods work well for large samples. Bootstrap is effective, even for small samples.

Cross-Validation vs. Bootstrap Estimates of Prediction in Statistical Modeling Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess Assistant Professor of Statistics: