Download presentation
Presentation is loading. Please wait.
Published byJaycee Laurence Modified over 9 years ago
1
Transformations Getting normal or using the linear model
2
Two Reasons to Transform Variables do not fit a normal distribution and parametric tests are desired A relationship between two variables is non-linear but transformation would allow the use of linear regression
3
Non-Normal Data Reasons real data can fail to follow a normal distribution: –Errors in measurement are multiplicative rather than additive, e.g. ± 2% rather than ± 2mm –Constraints on the dimensions of an artifact feature are not symmetrical, e.g. point length must exceed haft length but can be as long as the material allows
4
Non-Normal Data 2 –Measurements are products rather than sums of other measurements, e.g. area, volume –Counts follow binomial, poisson, or negative binomial distributions which are often asymmetrical unless sample sizes are large
5
Solutions Use non-parametric methods that do not depend on the normality of the data (increasingly easy to do) Use data transformations that shift the distribution to one that is normal
6
Transformation The goal is to change the spacing of the data to compress a long tail and draw out a flat tail The transformation must preserve the order of the original data – we only change the spacing between data points
7
Transformation Right skewed data with many zeros cannot be transformed effectively since nothing can stretch out observations that have the same value – e.g. artifact counts by site, grid square are often poisson distributed with many zeros
8
An Example Using the DartPoints data set, we saw that Length was asymmetrical Plot the kernel density of Length with and without a log scale to see the difference To transform Length we would use –logLength <- log(DartPoints$Length)
9
plot(density(DartPoints$Length), main="Dart Point Length", xlab="Normal scale") plot(density(DartPoints$Length), main="Dart Point Length", xlab="Log scale", log="x")
10
Common Transformations Tail to the right –Natural or common (base 10) logarithm – no zero values –Square root, cube root, etc – zeros ok –Inverse, -1/x, -1/x 2, etc – no zero values Tail to the left –Exponential e x,10 x (low values) –Square, cube, etc
13
Other Transformations arctangent (inverse tangent) to handle values between 0 and 1 used for population studies of non-metric traits
14
Transforming to Linear By transforming variables before using linear regression we can fit nonlinear equations In some cases we can express the fitted equation in terms of the original untransformed variables
15
Polynomial Y = a + b1x + b2x 2 + b3x 3 + b4x 4... Create polynomial values or use the function poly() within lm() Begin with linear and then work up to quadratic, cubic, and so on until the new terms are not significant Eg. lm(y~x+I(x^2)+I(x^3))
17
Power Function Log-log transformation Use log() to transform dependent and independent variables Compute linear regression –log(y) = a + b * log(x) –y = Ax b (where A= exp(a)) If b = 1, same as the linear model x, y > 0
19
Exponential function Semi-log transformation Use log() to transform dependent variable, y > 0 Compute linear regression –log(y) = a + b * x –y = Ae bx (where A= exp(a)) Fits data with asymptotes
20
Inverse Function Reciprocal transformation – 1/x where x ≠ 0 Used for distance models – marriage, trade, social interaction declines with distance Fits data with asymptotes
22
Other Functions Logarithmic – no zeros in x –y = a + b * log(x) Square Root – no negative values in x –y = a + b * sqrt(x)
24
Examples Human cranial capacity over the last 1.8 million years Number of Identified Specimens (NISP) and Minimum Number of Individuals (MNI) at Chucalissa (Middle Misssissippian site)
26
# BrainsCC.RData # Explore logs with scatterplot RegModel.1 <- lm(BrainCC~AgeKa, data=BrainsCC) # Rcmdr summary(RegModel.1) # Rcmdr BrainsCC$logAge <- with(BrainsCC, log(AgeKa)) # Rcmdr BrainsCC$logBrain <- with(BrainsCC, log(BrainCC)) # Rcmdr RegModel.2 <- lm(logBrain~logAge, data=BrainsCC) # Rcmdr summary(RegModel.2) # Rcmdr RegModel.3 <- lm(BrainCC~logAge, data=BrainsCC) # Rcmdr summary(RegModel.3) # Rcmdr plot(BrainCC~AgeKa, data=BrainsCC, pch="+") abline(RegModel.1, lty=1, lwd=2, col="black") x <- seq(0, 1800, 10) logx <- log(x) lines(x, exp(predict(RegModel.2, data.frame(logAge=logx))), lty=1, lwd=2, col="red") lines(x, predict(RegModel.3, data.frame(logAge=logx)), lty=1, lwd=2, col="blue") legend("topright", c("Linear", "Power", "Logarithmic"), lty=1, lwd=2, col=c("black", "red", "blue"))
28
LinearModel.4 <- lm(BrainCC ~ AgeKa + I(AgeKa^2), data=BrainsCC) summary(LinearModel.4) LinearModel.5 <- lm(BrainCC ~ AgeKa + I(AgeKa^2) + I(AgeKa^3), data=BrainsCC) summary(LinearModel.5) LinearModel.6 <- lm(BrainCC ~ AgeKa + I(AgeKa^2) + I(AgeKa^3) + I(AgeKa^4), data=BrainsCC) summary(LinearModel.6) plot(BrainCC~AgeKa, data=BrainsCC, pch="+") abline(RegModel.1, lty=1, lwd=2, col="black") x <- seq(0, 1800, 10) lines(x, predict(LinearModel.4, data.frame(AgeKa=x)), lty=1, lwd=2, col="red") lines(x, predict(LinearModel.5, data.frame(AgeKa=x)), lty=1, lwd=2, col="blue") lines(x, predict(LinearModel.6, data.frame(AgeKa=x)), lty=1, lwd=2, col="green") legend("topright", c("Linear", "Quadratic", "Cubic", "Quartic"), lty=1, lwd=2, col=c("black", "red", "blue", "green"))
30
load("C:/Users/DCarlson/Documents/anth642/R/Data/Chucalissa.rda") #Rcmdr plot(mni~nisp, data=Chucalissa) RegModel.1 <- lm(mni~nisp, data=Chucalissa) #Rcmdr summary(RegModel.1) #Rcmdr abline(RegModel.1) plot(mni~nisp, data=Chucalissa, log="xy") # Plot log-log transform plot(mni~nisp, data=Chucalissa, log="y") # Plot semi-log transform Chucalissa$logMNI <- log(Chucalissa$mni) # Create logged variables Chucalissa$logNISP <- log(Chucalissa$nisp) plot(logMNI~logNISP, data=Chucalissa) RegModel.2 <- lm(logMNI~logNISP, data=Chucalissa) #Rcmdr summary(RegModel.2) #Rcmdr abline(RegModel.2) plot(mni~nisp, data=Chucalissa) # plot log-log equation on original data a2 <- exp(RegModel.2$coefficients[[1]]) # Convert a to exp(a) b2 <- RegModel.2$coefficients[[2]] a1 <- RegModel.1$coefficients[[1]] b1 <- RegModel.1$coefficients[[2]] curve(a2*x^b2, 0, 3250, add=TRUE) abline(RegModel.1, lty=3) text(locator(), as.expression(substitute(MNI == a*NISP^b, list(a=round(a2, 4), b=round(b2, 4)))), pos=2) text(locator(), as.expression(substitute(MNI == a+b*NISP, list(a=round(a1, 4), b=round(b1, 4)))), pos=4)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.