Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

i) Two way ANOVA without replication
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 13 Multiple Regression
Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)
Psychology 202b Advanced Psychological Statistics, II February 15, 2011.
Chapter 12 Multiple Regression
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Psychology 202b Advanced Psychological Statistics, II February 17, 2011.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Chapter 11 Multiple Regression.
Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.
Assumption and Data Transformation. Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly,
Chapter 5 Transformations and Weighting to Correct Model Inadequacies
Business Statistics - QBM117 Statistical inference for regression.
More problem The Box-Cox Transformation Sometimes a transformation on the response fits the model better than the original response. A commonly.
Assumptions of the ANOVA The error terms are randomly, independently, and normally distributed, with a mean of zero and a common variance. –There should.
Assumptions of the ANOVA
Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
9 - 1 Intrinsically Linear Regression Chapter Introduction In Chapter 7 we discussed some deviations from the assumptions of the regression model.
CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Correlation.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Applications The General Linear Model. Transformations.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
CHAPTER 3 Model Fitting. Introduction Possible tasks when analyzing a collection of data points: Fitting a selected model type or types to the data Choosing.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
AOV Assumption Checking and Transformations (§ )
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Problem 3.26, when assumptions are violated 1. Estimates of terms: We can estimate the mean response for Failure Time for problem 3.26 from the data by.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Residual Analysis for ANOVA Models KNNL – Chapter 18.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Occasionally, we are able to see clear violations of the constant variance assumption by looking at a residual plot - characteristic “funnel” shape… often.
Nonparametric Statistics
Computacion Inteligente Least-Square Methods for System Identification.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Transforming the data Modified from:
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
ASV Chapters 1 - Sample Spaces and Probabilities
Model validation and prediction
Chapter 4. Inference about Process Quality
Checking Regression Model Assumptions
CHAPTER 29: Multiple Regression*
Checking Regression Model Assumptions
CONCEPTS OF ESTIMATION
Joanna Romaniuk Quanticate, Warsaw, Poland
Undergraduated Econometrics
Chapter 12 Review Inference for Regression
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Simple Linear Regression
Diagnostics and Remedial Measures
3.2. SIMPLE LINEAR REGRESSION
2/5/ Estimating a Population Mean.
Diagnostics and Remedial Measures
Problem 3.26, when assumptions are violated
Presentation transcript:

Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13

What is a transformation? It is a mathematical function that is applied to all the observations of a given variable Y represents the original variable, Y* is the transformed variable, and f is a mathematical function that is applied to the data

Most are monotonic: Monotonic functions do not change the rank order of the data, but they do change their relative spacing, and therefore affect the variance and shape of the probability distribution

There are two legitimate reasons to transform your data before analysis The patterns in the transformed data may be easier to understand and communicate than patterns in the raw data. They may be necessary so that the analysis is valid

They are often useful for converting curves into straight lines: The logarithmic function is very useful when two variables are related to each other by multiplicative or exponential functions

Logarithmic (X):

Example: Asi’s growth (50 % each year) Yearweight

Exponential:

Example: Species richness in the Galapagos Islands

Power:

Statistics and transformation Data to be analyzed using analysis of variance must meet to assumptions: The data must be homoscedastic: variances of treatment groups need to be approximately equal The residuals, or deviations from the mean must be normal random variables

Lets look an example A single variate of the simplest type of ANOVA (completely randomized, single classification) decomposes as follows: In this model the components are additive with the error term ε ij distributed normally

However… We might encounter a situation in which the components are multiplicative in effect, where If we fitted a standard ANOVA model, the observed deviations from the group means would lack normality and homoscedasticity

The logarithmic transformation We can correct this situation by transforming our model into logarithms Wherever the mean is positively correlated with the variance the logarithmic transformation is likely to remedy the situation and make the variance independent of the mean

We would obtain Which is additive and homoscedastic

The square root transformation It is used most frequently with count data. Such distributions are likely to be Poisson distributed rather than normally distributed. In the Poisson distribution the variance is the same as the mean. Transforming the variates to square roots generally makes the variances independents of the means for these type of data. When counts include zero values, it is desirable to code all variates by adding 0.5.

The box-cox transformation Often one do not have a-priori reason for selecting a specific transformation. Box and Cox (1964) developed a procedure for estimating the best transformation to normality within the family of power transformation

The box-cox transformation The value of lambda which maximizes the log- likelihood function: yields the best transformation to normality within the family of transformations s 2 T is the variance of the transformed values (based on v degrees of freedom). The second term involves the sum of the ln of untransformed values

box-cox in R (for a vector of data Y) >library(MASS) >lamb <- seq(0,2.5,0.5) >boxcox(Y_~1,lamb,plotit=T) >library(car) >transform_Y<-box.cox(Y,lamb) What do you conclude from this plot? Read more in Sokal and Rohlf 2000 page 417

The arcsine transformation Also known as the angular transformation It is especially appropriate to percentages

The arcsine transformation It is appropriate only for data expressed as proportions Proportion original data Transformed data

Since the transformations discussed are NON-LINEAR, confidence limits computed in the transformed scale and changed back to the original scale would be asymmetrical