Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

Assumptions underlying regression analysis
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
4.1: Linearizing Data.
Correlation and regression
Diagnostics – Part I Using plots to check to see if the assumptions we made about the model are realistic.
Simultaneous inference Estimating (or testing) more than one thing at a time (such as β 0 and β 1 ) and feeling confident about it …
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Psychology 202b Advanced Psychological Statistics, II February 15, 2011.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 5 Transformations and Weighting to Correct Model Inadequacies
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Inference for regression - Simple linear regression
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
Model Checking Using residuals to check the validity of the linear regression model assumptions.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Chapter 12 Multiple Regression and Model Building.
Simple Linear Regression Models
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Chapter 3: Diagnostics and Remedial Measures
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
M25- Growth & Transformations 1  Department of ISM, University of Alabama, Lesson Objectives: Recognize exponential growth or decay. Use log(Y.
Prediction concerning the response Y. Where does this topic fit in? Model formulation Model estimation Model evaluation Model use.
Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Time Series Analysis – Chapter 6 Odds and Ends
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Residual Analysis for ANOVA Models KNNL – Chapter 18.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Regression through the origin
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
More on data transformations No recipes, but some advice.
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Transforming the data Modified from:
Chapter 15 Multiple Regression Model Building
Simple Linear Regression
Multiple Regression.
Diagnostics and Transformation for SLR
Chapter 12 Review Inference for Regression
Diagnostics and Remedial Measures
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
Diagnostics and Remedial Measures
Remedial Procedure in SLR: transformation
Presentation transcript:

Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.

Options for fixing problems with the model Abandon simple linear regression model and find a more appropriate – but typically more complex – model. Transform the data so that the simple linear regression model works for the transformed data.

Abandoning the model If not linear: try a different function, like a quadratic (Ch. 7) or an exponential function (Ch. 13). If unequal error variances: use weighted least squares (Ch. 10). If error terms are not independent: try fitting a time series model (Ch. 12). If important predictor variables omitted: try fitting a multiple regression model (Ch. 6). If outlier: use robust estimation procedure (Ch. 10).

Choices for transforming the data Transform X values only. Transform Y values only. Transform both the X and the Y values.

Transforming the X values only

Appropriate when non-linearity is the only problem – normality and equal variance okay – with the model. Transforming the Y values would likely change the well-behaved error terms into badly-behaved error terms.

Memory retention time prop Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later Predictor time = time, in minutes, since initially memorized the list. Response prop = proportion of items recalled correctly. Example 1

Fitted line plot Example 1

Residual vs. fits plot Example 1

Normal probability plot Example 1

Transform the X values time prop log10_time Change (“transform”) the predictor time to log 10 (time). Example 1

Fitted line plot using transformed X values Example 1

Residuals vs. fits plot using transformed X values Example 1

Normal probability plot using transformed X values Example 1

Predicting new proportion Estimated regression function: Therefore, we predict the proportion of words recalled after 1000 minutes is: Example 1

Predicting new proportion Example 1 Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (0.282, 0.316) (0.245, 0.353) Values of Predictors for New Observations New Obs log10tim We can be 95% confident that a person will recall between 24.5% and 35.3% of the words after 1000 minutes.

Transforming the Y values only

Appropriate when non-normality and/or unequal variances are the problems. The transformation on Y may also help to “straighten out” a curved relationship.

Gestation time and birth weight for mammals Mammal Birthwgt Gestation Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Predictor Birthwgt = birth weight, in kg, of mammal. Response Gestation = number of days until birth Example 2

Fitted line plot Example 2

Residual vs. fits plot Example 2

Normal probability plot Example 2

Transform the Y values Mammal Birthwgt Gestation log10Gest Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Change (“transform”) the response Gestation to log 10 (Gestation). Example 2

Fitted line plot using transformed Y values Example 2

Residual vs. fits plot using transformed Y values Example 2

Normal probability plot using transformed Y values Example 2

Predicting new gestation Estimated regression function: Therefore, since: we predict the gestation length of another mammal at 50 kgs to be: Example 2

Predicting new gestation Example 2 Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (2.4494, ) (2.2951, ) Values of Predictors for New Observations New Birthwgt We can be 95% confident that the gestation length for a new mammal at 50 kgs will be between and days.

Transforming both the X and Y values

Appropriate when the error terms are not normal, have unequal variances, and the function is not linear. Transforming the Y values corrects the problems with the error terms (and may help the non-linearity). Transforming the X values corrects the non- linearity.

Diameter (inches) and volume (cu. ft.) of 70 shortleaf pines Example 3

Residuals vs. fits plot Example 3

Normal probability plot Example 3

Transform the Y values only Diameter Volume logVol … and so on … Transform response volume to log e (volume) Example 3

Fitted line plot using transformed Y values Example 3

Residuals vs. fits plot using transformed Y values Example 3

Normal probability plot using transformed Y values Example 3

Transform both the X and Y values Diameter Volume logDiam logVol … and so on … Transform predictor diameter to log e (diameter) Transform response volume to log e (volume) Example 3

Fitted line plot using transformed X and Y values Example 3

Residual plot using transformed X and Y values Example 3

Normal probability plot using transformed X and Y values Example 3

Transformation strategies

Effects of transformations Transforming the Y values corrects the problems with the error terms – and may simultaneously help non-linearity. Transforming the X values can only correct non-linearity.

Transformation strategies If form of the relationship between x and y is known, then it may be possible to find a linearizing transformation analytically. Fitting a regression model empirically generally requires trial and error – try different transformations to see which does best.

Transformation strategies Finding a linearizing transformation analytically

Knowing functional relationship is of the power form If the relationship between x and y is of the power form: taking log of both sides transforms it into a linear form:

Knowing functional relationship is of the exponential form If the relationship between x and y is of exponential form: taking log of both sides transforms it into a linear form:

Transformation strategies Finding a transformation by trial and error

Family of power transformations The most common transformation involves transforming the response by taking it to some power λ. That is: Most commonly, for interpretation reasons, λ is a number between -1 and 2, such as -1, -0.5, 0, 0.5, (1), 1.5, and 2. When λ = 0, the transformation is taken to be the log transformation. That is:

Effect of log e transformation

Some guidelines for specifying λ To make smaller values more spread out, use a smaller λ. To make larger values more spread out, use a larger λ.

Possible transformations

Transformation strategies Variance stabilizing transformations

Common variance stabilizing transformations If the response is a Poisson count, so that the variance is proportional to the mean, use the square root transformation: If the response is a binomial proportion, use the arcsine square root transformation:

Common variance stabilizing transformations If the variance is proportional to the mean squared, use the natural log transformation: If the variance is proportional to the mean to the fourth power, use the reciprocal transformation:

Transforming data in Minitab Select Calc >> Calculator … In box labeled “Store result in variable,”, tell Minitab in which column (variable) you want the transformed data stored. Type (input) the expression for the desired transformation in the box labeled Expression. (Use the available functions.) Select OK. The data will appear in the column of the worksheet that you specified.