Model Building III – Remedial Measures KNNL – Chapter 11.

Slides:



Advertisements
Similar presentations
Weighted Least Squares Regression Dose-Response Study for Rosuvastin in Japanese Patients with High Cholesterol "Randomized Dose-Response Study of Rosuvastin.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Hypothesis Testing Steps in Hypothesis Testing:
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Experimental Design, Response Surface Analysis, and Optimization
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Simple Linear Regression
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Chapter 12 Multiple Regression
Additional Topics in Regression Analysis
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Relationships Among Variables
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Ordinary Least Squares
Correlation & Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Correlation and Regression
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
12.1 Heteroskedasticity: Remedies Normality Assumption.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Autocorrelation in Time Series KNNL – Chapter 12.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 16 Multiple Regression and Correlation
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
The simple linear regression model and parameter estimation
Regression Diagnostics
Multivariate Analysis Lec 4
Regression Model Building - Diagnostics
Diagnostics and Transformation for SLR
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
Single-Factor Studies
Single-Factor Studies
CHAPTER- 17 CORRELATION AND REGRESSION
Simple Linear Regression
Bootstrapping Jackknifing
Regression Model Building - Diagnostics
Regression Forecasting and Model Building
Diagnostics and Remedial Measures
Diagnostics and Transformation for SLR
Model Adequacy Checking
Diagnostics and Remedial Measures
Presentation transcript:

Model Building III – Remedial Measures KNNL – Chapter 11

Unequal (Independent) Error Variances – Weighted Least Squares (WLS) Case 1 – Error Variances known exactly (VERY rare) Case 2 – Error Variances known up to a constant – Occasionally information known regarding experimental units regarding the relative magnitude (unusual) – If “observations” are means of different numbers of units (each with equal variance) at the various X levels, Variance of observation i is  2 /n i where n i is known Case 3 – Variance (or Standard Deviation) is related to one or more predictors, and relation can be modeled (see Breusch-Pagan Test in Chapter 3) Case 4 – Ordinary Least Squares with correct variances

WLS – Case 1 Known Variances - I

WLS – Case 1 Known Variances - II

WLS – Case 2 – Variance Known up to Constant - I Data are means of unequal numbers of replicates at X-levels, can use any weights generally

WLS – Case 2 – Variance Known up to Constant - II Japanese study of Rosuvastin for patients with high cholesterol. n = 6 doses, # of patients per dose (r i ) varied: (15, 17, 12, 14, 18, 13) After preliminary plots, determined that Y = change in LDL, X = ln(Dose)

WLS - Case 3 – Estimated Variances Use squared residuals (estimated variances) or absolute residuals (estimated standard deviations) from OLS to model their levels as functions of predictor variables – Plot Residuals, squared residuals and absolute residuals versus fitted values and predictors – Using model building techniques used on Y previously to obtain a model for either the variances or standard deviations as functions of the X s or the mean – Fit estimated WLS with the estimated weights (1/variances) – Iterate until regression coefficients are stable (iteratively re- weighted least squares)

Case 4 – OLS with Estimated Variances This is referred to White’s estimator, and is robust to specification of form of error variance

Multicollinearity – Remedial Measures - I Goal: Prediction – Multicollinearity not an issue if new cases have similar multicollinearity among predictors – Firm uses shipment size (number of pallets) and weight (tons) to predict unloading times. Future shipments have similar correlation between predictors, predictions and PI s are valid Linear Combinations of Predictors can be generated that are uncorrelated (Principal Components) – Good: Multicollinearity gone – Bad: New variables may not have “physical” interpretation Use of Cross-Section and Time Series in Economics – Model: Demand = f(Income, Price) – Step 1: Cross-section (1 time): Regress Demand on Income (b I ) – Step 2: Time Series: Regress Residuals from 1 on Price (b P )

Multicollinearity – Ridge Regression -I Mean Square Error of an Estimator = Variance + Bias 2 Goal: Add Bias to estimator to reduce MSE(Estimator) Makes use of Standard Regression Model with Correlation Transformation Goal: choose small c such that the estimators stabilize (flatten) and VIF s get small

Multicollinearity – Ridge Regression - II

Robust Regression for Influential Cases Influential cases, after having been ruled out as recording errors or indicative of new predictors, can be reduced in terms of their impacts on the regression model Two commonly applied Robust Regression Models: – Least Absolute Residuals (LAR) Regression – Choose the coefficients that minimize sum of absolute deviations (as opposed to squared deviations in OLS). No closed form solutions, must use specialized programs (can be run in R) – IRLS Robust Regression – Uses Iteratively Re-weighted Least Squares where weights are based on how much each case is an outlier (lower weights for larger outliers)

IRLS – Robust Regression

Nonparametric Regression Locally Weighted Regressions (Lowess) – Works best with few predictors, and when data have been transformed to normality with constant error variances, and predictors have been scaled by their standard deviation (sqrt(MSE) or MAD when outliers are present) – Applies WLS with weights as distances from the individual points in a neighborhood to the target (q = proportion of data used in the neighborhood, typically 0.40 to 0.60) Regression Trees – X-space is broken down into multiple sub-spaces (1-step at a time), and each region is modeled by the mean response – Each step is chosen to minimize the within region sum of squares

Lowess Method Assumes model has been selected and built so assumptions of errors being approximately normal with constant variance holds. Predictors of different units should be scaled by SD or MAD in distance measure

Regression Trees Splits region covered by the predictor variables into increasing number of sub-regions, where predicted value is mean response of all cases falling in the sub-region. Goal: Choose “cut-points” that minimize Error Sum of Squares for the current number of sub-regions – computationally intensive, no closed form solution Problem: SSE will continually drop until there are as many sub-regions as distinct combinations of predictors (SSE → 0) Validation samples can be used to determine which number of nodes (sub-regions – 1) that minimizes Mean Square Prediction Error (MSPR)

Bootstrapping to Estimate Precision Computationally intensive methods used to estimate precision of estimators in non-standard situations Based on re-sampling (with replacement) from observed samples, re-computing estimates repeatedly (nonparametric). Parametric methods also exist (not covered here) Once original sample is observed, and quantity of interest estimated, many samples of size n (selected with replacement) are obtained, each sample providing a new estimate. The standard deviation of the new sample quantities represents an estimate of the standard error

Two Basic Approaches Fixed X Sampling (Model is good fit, constant variance, predictor variables are fixed, i.e. controlled experiment) – Fit the regression, obtain all fitted values and residuals – Keeping the corresponding X-level(s) and fitted values, re- sample the n residuals (with replacement) – Add the bootstrapped residuals to the fitted values, and re-fit the regression (repeat process many times) Random X Sampling (Not sure of adequacy of model: fit, variance, random predictor variables) – After fitting regression, and estimating quantities of interest, sample n cases (with replacement) and re-estimate quantities of interest with “new” datasets (repeat many times)

Bootstrap Confidence Intervals – Reflection Method