Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Polynomial Regression and Transformations STA 671 Summer 2008.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
Time Trends Simplest time trend is a linear trend Examine National Population data set. How well does a linear model work? Did you examine the residuals.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
REGRESSION AND CORRELATION
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
14-1 Transformations in Statistical Analysis Assumptions of linear statistical models. Types of Transformations Alternatives to Transformations Outline.
Polynomial regression models Possible models for when the response function is “curved”
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
A (second-order) multiple regression model with interaction terms.
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
Simple linear regression Linear regression with one predictor variable.
Simple Linear Regression Models
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
M25- Growth & Transformations 1  Department of ISM, University of Alabama, Lesson Objectives: Recognize exponential growth or decay. Use log(Y.
Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Detecting and reducing multicollinearity. Detecting multicollinearity.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Non-linear Regression Example.
Time Series Analysis – Chapter 6 Odds and Ends
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Fitting Curves to Data 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5: Fitting Curves to Data Terry Dielman Applied Regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
A first order model with one binary and one quantitative predictor variable.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Statistics and Numerical Method Part I: Statistics Week VI: Empirical Model 1/2555 สมศักดิ์ ศิวดำรงพงศ์ 1.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Regression through the origin
732G21/732G28/732A35 Lecture 4. Variance-covariance matrix for the regression coefficients 2.
Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
David Housman for Math 323 Probability and Statistics Class 05 Ion Sensitive Electrodes.
Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and.
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Least Square Regression
9/19/2018 ST3131, Lecture 6.
Diagnostics and Transformation for SLR
CHAPTER 29: Multiple Regression*
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
Presentation transcript:

Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.

Where does this topic fit in? Model building –Model formulation –Model estimation –Model evaluation Model use

Just a reminder Data analysis is an artful (subjective decisions!) science (objective tools!). Data transformation definitely requires a “trial and error” process.

Options for fixing problems with the model Abandon simple linear regression model and find a more appropriate – but typically more complex – model. Transform the data so that the simple linear regression model works for the transformed data.

Abandoning the model If not linear: try a different function, like a quadratic or an exponential function. If unequal error variances: use weighted least squares. If error terms are not independent: try fitting a time series model. If important predictor variables omitted: try fitting a multiple regression model. If outlier: use a robust estimation procedure.

Choices for transforming the data Transform predictor (x) values only. Transform response (y) values only. Transform both the predictor (x) values and the response (y) values.

Transforming the predictor (x) values only Building the model.

Transforming the x values only Appropriate when non-linearity is a problem – normality and equal variance okay. It may be necessary to correct the non-linearity before you can assess the normality and equal variance assumptions. If the error terms are well-behaved, transforming the y values could change them into badly-behaved error terms.

Memory retention time prop Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later Predictor time = time, in minutes, since initially memorized the list. Response prop = proportion of items recalled correctly. Example 1

Fitted line plot Example 1

Residual vs. fits plot Example 1

Normal probability plot Example 1

The logarithmic transformation Most useful transformation. Most common scale for scientific work is the natural logarithm (denoted ln or log) based on the number e = … General rules: –ln(e) = 1 –ln(1) = 0 –ln(e x ) = x

Transform the x values Change (“transform”) the predictor time to ln(time). Example 1 time prop lntime

Fitted line plot using transformed x values Example 1

Residuals vs. fits plot using transformed x values Example 1

Normal probability plot using transformed x values Example 1

What if transform y values instead when nonlinearity is main problem? Example 1

The residuals are an improvement… (although not great)… Example 1

…but we now have non-normal error terms. Example 1

Transforming the predictor (x) values only Using the model to answer your research question.

What is the nature of the association between time since memorized and effectiveness of recall?

Is there an association between time since memorized and effectiveness of recall? The regression equation is prop = lntime Predictor Coef SE Coef T P Constant lntime S = R-Sq = 99.0% R-Sq(adj) = 98.9% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

What proportion of words can we expect a person to recall after 1000 minutes? Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (0.282, 0.316) (0.245, 0.353) Values of Predictors for New Observations New Obs lntime

How much does expected recall change if time increases ten-fold? We can say a ten-fold increase in x is associated with a β 1 ×ln(10) change in the mean of y. And, we can say a two-fold increase in x is associated with a β 1 ×ln(2) change in the mean of y. Choose the multiple so that it makes sense for scope of model.

How much does expected recall change if time increases ten-fold? Predictor Coef SE Coef T P Constant lntime We expect the proportion of recalled words to change by: for each ten-fold increase in time since memorization took place.

How much does expected recall change if time increases ten-fold? Predictor Coef SE Coef T P Constant lntime Since: we can be 95% confident that the proportion of recalled words will change between: for each ten-fold increase in time since memorization took place. and

Transforming the y values only Building the model.

Transforming the y values only You should consider transforming the y values when non-normality and/or unequal variances are problems. As an added bonus, the transformation on y may also help to “straighten out” a curved relationship.

Gestation time and birth weight for mammals Mammal Birthwgt Gestation Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Predictor Birthwgt = birth weight, in kg, of mammal. Response Gestation = number of days until birth Example 2

Fitted line plot Example 2

Residual vs. fits plot Example 2

Normal probability plot Example 2

Transform the y values Mammal Birthwgt Gestation lnGest Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Change (“transform”) the response Gestation to ln(Gestation). Example 2

Fitted line plot using transformed y values Example 2

Residual vs. fits plot using transformed y values Example 2

Normal probability plot using transformed y values Example 2

Transforming the response (y) values only Using the model to answer your research question.

What is nature of association between birth weight and length of gestation?

Is there an association between birth weight and length of gestation? The regression equation is lnGest = Birthwgt Predictor Coef SE Coef T P Constant Birthwgt S = R-Sq = 80.3% R-Sq(adj) = 78.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

What is the expected gestation length of a new 50 kg mammal? Estimated regression function: Therefore, since: we predict the gestation length of another mammal at 50 kgs to be: Example 2

What is the expected gestation length of a new 50 kg mammal? Example 2 Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (5.6401, ) (5.2847, ) Values of Predictors for New Observations New Obs Birthwgt We can be 95% confident that the gestation length for a new mammal at 50 kgs will be between and days.

What is expected change in length of gestation for each one pound increase in birth weight? The median changes by a factor of for each one unit increase in the predictor x.

What is expected change in length of gestation for each one pound increase in birth weight? Predictor Coef SE Coef T P Constant Birthwgt The estimated regression line tells us: The median gestation for a mammal weighing 3 kgs is 1.01 times the median gestation for a mammal weighing 2 kgs. The median gestation for a mammal weighing 30 kgs is = times the median gestation for a mammal weighing 20 kgs.

What is expected change in length of gestation for each one pound increase in birth weight? Since: we can be 95% confident that the median gestation will increase by a factor between for each one kilogram increase in birth weight. and Predictor Coef SE Coef T P Constant Birthwgt

Transforming both the x and y values Building the model.

Transforming both the x and y values You might have to do this when everything seems wrong – the error terms are not normal, have unequal variances, and the function is not linear. Transforming the y values corrects problems with the error terms (and may help the non-linearity). Transforming the x values primarily corrects the non-linearity.

Diameter (inches) and volume (cu. ft.) of 70 shortleaf pines Example 3

Residuals vs. fits plot Example 3

Normal probability plot Example 3

Transform the x values only Transform predictor diameter to ln(diameter) Example 3 Diameter Volume lnDiam … and so on …

Fitted line plot using transformed x values Example 3

Residuals vs. fitted plot using transformed x values Example 3

Normal probability plot using transformed x values Example 3

Transform both the x and y values Diameter Volume lnDiam lnVol … and so on … Transform predictor diameter to ln(diameter) Transform response volume to ln(volume) Example 3

Fitted line plot using transformed x and y values Example 3

Residual plot using transformed x and y values Example 3

Normal probability plot using transformed x and y values Example 3

Transforming both the x and y values Using the model to answer your research question.

What is the nature of the association between diameter and volume of pines?

Is there an association between diameter and volume of pines? The regression equation is lnVol = lnDiam Predictor Coef SE Coef T P Constant lnDiam S = R-Sq = 97.4% R-Sq(adj) = 97.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

What is the median volume of all pine trees that are 10" in diameter? Estimated regression function: Therefore, since: we predict the median volume of all 10" shortleaf pines to be: Example 2 cubic feet.

What is the median volume of all pine trees that are 10" in diameter? Example 2 We can be 95% confident that the median volume of all shortleaf pines, 10" diameter, to be between 19.9 and 21.6 cubic feet. Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (2.9922, ) (2.6908, ) Values of Predictors for New Observations New Obs lnDiam

What is expected change in volume for a two-fold increase in diameter? The median changes by a factor of for each two-fold increase in the predictor x.

What is expected change in volume for a two-fold increase in diameter? The estimated regression line tells us: The median volume of a 20" diameter tree is estimated to be 5.92 times the median volume of a 10" diameter tree. The median volume of a 10" diameter tree is estimated to be 5.92 times the median volume of a 5" diameter tree. Predictor Coef SE Coef T P Constant lnDiam

What is expected change in volume for a two-fold increase in diameter? Since: we can be 95% confident that the median volume will increase by a factor between for each two-fold increase in diameter. and Predictor Coef SE Coef T P Constant lnDiam