14-1 Transformations in Statistical Analysis Assumptions of linear statistical models. Types of Transformations Alternatives to Transformations Outline.

Slides:



Advertisements
Similar presentations
Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Advertisements

Inference for Regression
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Checking Assumptions 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Assessing the Assumptions of the Regression Model Terry.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Example Data
Correlation and Regression Analysis
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Simple linear regression Linear regression with one predictor variable.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Introduction to Linear Regression
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Non-linear Regression Example.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 12 Simple Linear Regression.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
23. Inference for regression
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Least Square Regression
Simple Linear Regression
Chapter 13 Simple Linear Regression
The Practice of Statistics in the Life Sciences Fourth Edition
Diagnostics and Transformation for SLR
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Multiple Regression Chapter 14.
Diagnostics and Transformation for SLR
Chapter 13 Simple Linear Regression
Presentation transcript:

14-1 Transformations in Statistical Analysis Assumptions of linear statistical models. Types of Transformations Alternatives to Transformations Outline Model Assumptions Effect addivitity Normality Homoscedasticity Independence

14-2 Order of Importance Homoscedasticity Normality Additivity Independence Additivity Homoscedasticity Normality Independence Experimental Analysis Models (ANOVA) Observational Analysis Models (Regression) All four are so interrelated that which is “most” important may be immaterial!

14-3 Independence When is this important? Measurements over time on the same individual. Time series data (rainfall, temperature, etc). Repeated measures - split plots in time. Growth curves. Measurements near each other in space. Split plot designs. Spatial data. How do I know it’s a problem? Rectifying a dependence problem. By design - how the data were collected. Temporal/spatial autocorrelation analysis. Modify the type of model to be fitted to the data.

14-4 Homoscedasticity How do I know I have a problem? Plot predicted (fitted) values versus residuals. What is the pattern of the spread in the residuals as the predicted values increase? Spread constant. Spread increases. Spread decreases then increases. Acceptable Problems x x x x x x x x x x x x x x x x x x x x x x x x x x x

14-5 What to do? Attempt a transformation. Weighted regression. Incorporate additional covariates. Non-linear regression. Lack of Homogeneity in Regression What to do if the spread of the residuals plotted versus X looks like this? or this? x   X Need another x variable.

14-6 Transforming the Response to achieve Linearity If a scatterplot of y versus x curves upward, proceed down on the scale to choose a transformation.

14-7

14-8 Handling Heterogeneity Regression? ANOVA no yes Test for Homoscedasticity reject accept OK Type of Transformation Transform Observations Box/Cox Family Power Family Traditional Fit linear model Plot residuals Group means OK

14-9 Transformations to Achieve Normality Regression? ANOVA no yes Fit linear model Estimate group means Residuals Normal? no yes OK Transform Different Model Q-Q plot Formal Tests

14-10 Transformations to Achieve Normality How can we determine if observations are normally distributed? Graphical examination: Normal quantile-quantile plot (QQ-plot). Histogram or boxplot. Goodness of fit tests: Kolmogorov-Smirnov test. Shapiro-Wilks test. D’Agostino’s test.

14-11 Non-normal! So what? Only very skewed distributions will have a marked effect on the significance level of the F-test for overall model or model effects. Often the same transformations which are used to achieve homoscedasticity will produce more normal- looking observations (residuals). Transformations to Achieve Model Simplicity GOAL: To provide as simple as possible a mathematical form for the relationship among response and explanatory variables. May require transforming both response and explanatory variables.

14-12 Alternative Models Generalized Linear Models Non-Linear Regression Non-Parametric Methods Weighted Least Squares complexitycomplexity high Regular Least Squares low

14-13 Example: Predicting brain weight from body weight in mammals via SLR Data are average brain (Y, g) and body (X, kg) weights for 62 species of mammals (2 omitted). Source: Allison & Chicchetti (1976), Science. Species (common name) body weight brain weight Arctic fox Owl monkey Horse Kangaroo Human African elephant Asian elephant … Chimpanzee Tree shrew Red fox Omit

14-14 Scatterplot of data is non-informative. Most species have small weights compared to the elephants. Viewing only those mammals with body weight below 300kgs suggests transforming to a log scale to linearize the relationship.

14-15 Scatterplot looks linear. Fitted regression equation is: Body weight is a very significant predictor of brain weight (p-value<0.0001). Also, R 2 =0.922.

14-16 Residual plot shows no obvious violations of the zero mean and constant variance assumption. QQ-Plot demonstrates that the normality assumption for the residuals is plausible. human opossum

14-17 Checking for influential observations (R) > fm_lm(log(y)~log(x)) > influence.measures(fm) Influence measures of lm(formula = log(y) ~ log(x)) : dfb.1. dfb.lg.. dffit cov.r cook.d hat inf e e e e (Owl Monk.) e e … e e * (Shrew) … e e * (Asian El.) … e e * (Human) e e * (African El.) e e * (Opossum) e e * (Rhesus Monk.) … e e * (Brown Bat) … e e In MTB: Stat > > Regression > Regression > Regression Storage

14-18 Decision: Leave out man (he doesn’t really fit in with the rest of the mammals) and re-run the analysis. FeatureFull ModelOmit Human R Slope p-value< < Even though results don’t change much, we will go with this last model:

14-19 This illustrates the idea of cross-validation in regression. It is often recommended that the data be split into two (equal?) portions; use one for model fitting; the other for model checking/verification. Mammal Predicted Brain Wt Prediction IntervalActual Brain Wt Tree Shrew (0.396, 5.667) Red Fox (6.359, ) Predicting the brain weights of the omitted mammals (R) > xh <- x[-32]; yh <- y[-32] > fmh <- lm(log(yh)~log(xh)) > new <- data.frame(xh=c(.104,4.235)) > predict(fmh, newdata=new, interval="prediction") fit lwr upr > exp(predict(fmh, newdata=new, interval="prediction")) fit lwr upr Exponentiate final results!

14-20 Predicting the brain weights of the omitted mammals (MTB) Influence measures can be selected here.

14-21 The regression equation is lbrain = lbody Predictor Coef SE Coef T P Constant lbody S = R-Sq = 92.2% R-Sq(adj) = 92.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Unusual Observations Obs lbody lbrain Fit SE Fit Residual St Resid R X R R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence. Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI (0.1249, ) ( , ) (3.0201, ) ( , ) MTB output (with man) Only available influence measures are: standard/student residuals; hat matrix; Cook’s dist; and dffits.