Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming the data Modified from:

Similar presentations


Presentation on theme: "Transforming the data Modified from:"— Presentation transcript:

1 Transforming the data Modified from:
Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13

2 What is a transformation?
It is a mathematical function that is applied to all the observations of a given variable Y represents the original variable, Y* is the transformed variable, and f is a mathematical function that is applied to the data

3 Most are monotonic: Monotonic functions do not change the rank order of the data, but they do change their relative spacing, and therefore affect the variance and shape of the probability distribution

4 There are two legitimate reasons to transform your data before analysis
The patterns in the transformed data may be easier to understand and communicate than patterns in the raw data. They may be necessary so that the analysis is valid

5 They are often useful for converting curves into straight lines:
The logarithmic function is very useful when two variables are related to each other by multiplicative or exponential functions

6 Logarithmic (X):

7 Example: Asi’s growth (50 % each year)
weight 1 10.0 2 15.0 3 22.5 4 33.8 5 50.6 6 75.9 7 113.9 8 170.9 9 256.3 10 384.4 11 576.7 12 865.0

8 Exponential:

9 Example: Species richness in the Galapagos Islands

10 Power:

11 Statistics and transformation
Data to be analyzed using analysis of variance must meet to assumptions: The data must be homoscedastic: variances of treatment groups need to be approximately equal The residuals, or deviations from the mean must be normal random variables

12 Lets look an example A single variate of the simplest type of ANOVA (completely randomized, single classification) decomposes as follows: In this model the components are additive with the error term εij distributed normally

13 However… We might encounter a situation in which the components are multiplicative in effect, where If we fitted a standard ANOVA model, the observed deviations from the group means would lack normality and homoscedasticity

14 The logarithmic transformation
We can correct this situation by transforming our model into logarithms Wherever the mean is positively correlated with the variance the logarithmic transformation is likely to remedy the situation and make the variance independent of the mean

15 We would obtain Which is additive and homoscedastic

16 The square root transformation
It is used most frequently with count data. Such distributions are likely to be Poisson distributed rather than normally distributed. In the Poisson distribution the variance is the same as the mean. Transforming the variates to square roots generally makes the variances independents of the means for these type of data. When counts include zero values, it is desirable to code all variates by adding 0.5.

17 The box-cox transformation
Often one do not have a-priori reason for selecting a specific transformation. Box and Cox (1964) developed a procedure for estimating the best transformation to normality within the family of power transformation

18 The box-cox transformation
The value of lambda which maximizes the log-likelihood function: yields the best transformation to normality within the family of transformations s2T is the variance of the transformed values (based on v degrees of freedom). The second term involves the sum of the ln of untransformed values

19 box-cox in R (for a vector of data Y)
>library(MASS) >lamb <- seq(0,2.5,0.5) >boxcox(Y_~1,lamb,plotit=T) >library(car) >transform_Y<-box.cox(Y,lamb) What do you conclude from this plot? Read more in Sokal and Rohlf 2000 page 417

20 The arcsine transformation
Also known as the angular transformation It is especially appropriate to percentages

21 The arcsine transformation
Transformed data It is appropriate only for data expressed as proportions Proportion original data

22 Since the transformations discussed are NON-LINEAR, confidence limits computed in the transformed scale and changed back to the original scale would be asymmetrical

23 Evaluating Ecological Responses to Hydrologic Changes in a Payment-for-environmental-services Program on Florida Ranchlands Patrick Bohlen, Elizabeth Boughton, John Fauth, David Jenkins, Pedro Quintana-Ascencio, Sanjay Shukla and Hilary Swain G08K10487

24

25 Palaez Ranch Wetland Water Retention

26

27

28 Call: glm(formula = mosqct ~ depth + depth^2, data = pointdata) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) depth depth^ (Dispersion parameter for gaussian family taken to be ) Null deviance: on 663 degrees of freedom Residual deviance: on 661 degrees of freedom AIC: Number of Fisher Scoring iterations: 2

29

30 Call: zeroinfl(formula = mosqct ~ depth + depth^2, data = pointdata, dist = "poisson", EM = TRUE) Pearson residuals: Min Q Median Q Max -6.765e e e e e+05 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** depth <2e-16 *** depth^ <2e-16 *** Zero-inflation model coefficients (binomial with logit link): (Intercept) depth * depth^ ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 1 Log-likelihood: e+04 on 6 Df >

31


Download ppt "Transforming the data Modified from:"

Similar presentations


Ads by Google