Transformations
Transformations to Linearity Many non-linear curves can be put into a linear form by appropriate transformations of the either the dependent variable Y or some (or all) of the independent variables X1, X2, ... , Xp . This leads to the wide utility of the Linear model. We have seen that through the use of dummy variables, categorical independent variables can be incorporated into a Linear Model. We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.
Intrinsically Linear (Linearizable) Curves 1 Hyperbolas y = x/(ax-b) Linear form: 1/y = a -b (1/x) or Y = b0 + b1 X Transformations: Y = 1/y, X=1/x, b0 = a, b1 = -b
2. Exponential y = a ebx = aBx Linear form: ln y = lna + b x = lna + lnB x or Y = b0 + b1 X Transformations: Y = ln y, X = x, b0 = lna, b1 = b = lnB
3. Power Functions y = a xb Linear from: ln y = lna + blnx or Y = b0 + b1 X
Logarithmic Functions y = a + b lnx Linear from: y = a + b lnx or Y = b0 + b1 X Transformations: Y = y, X = ln x, b0 = a, b1 = b
Other special functions y = a e b/x Linear from: ln y = lna + b 1/x or Y = b0 + b1 X Transformations: Y = ln y, X = 1/x, b0 = lna, b1 = b
Polynomial Models y = b0 + b1x + b2x2 + b3x3 Linear form Y = b0 + b1 X1 + b2 X2 + b3 X3 Variables Y = y, X1 = x , X2 = x2, X3 = x3
Exponential Models with a polynomial exponent Linear form lny = b0 + b1 X1 + b2 X2 + b3 X3+ b4 X4 Y = lny, X1 = x , X2 = x2, X3 = x3, X4 = x4
Trigonometric Polynomials
b0, d1, g1, … , dk, gk are parameters that have to be estimated, n1, n2, n3, … , nk are known constants (the frequencies in the trig polynomial. Note:
Trigonometric Polynomial Models y = b0 + g1cos(2pn1x) + d1sin(2pn1x) + … + gkcos(2pnkx) + dksin(2pnkx) Linear form Y = b0 + g1 C1 + d1 S1 + … + gk Ck + dk Sk Variables Y = y, C1 = cos(2pn1x) , S2 = sin(2pn1x) , … Ck = cos(2pnkx) , Sk = sin(2pnkx)
Response Surface models Dependent variable Y and two independent variables x1 and x2. (These ideas are easily extended to more the two independent variables) The Model (A cubic response surface model) or Y = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + b5 X5 + b6 X6 + b7 X7 + b8 X8 + b9 X9+ e where
The Box-Cox Family of Transformations
The Transformation Staircase
The Bulging Rule x up y up y down x down
Nonlinearizable models Non-Linear Models Nonlinearizable models
Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring e) “rate of increase in Y” =
The Logistic Growth Model Equation: or (ignoring e) “rate of increase in Y” =
The Gompertz Growth Model: Equation: or (ignoring e) “rate of increase in Y” =
Example: daily auto accidents in Saskatchewan to 1984 to 1992 Data collected: Date Number of Accidents Factors we want to consider: Trend Yearly Cyclical Effect Day of the week effect Holiday effects
Trend Yearly Cyclical Trend This will be modeled by a Linear function : Y = b0 +b1 X (more generally a polynomial) Y = b0 +b1 X +b2 X2 + b3 X3 + …. Yearly Cyclical Trend This will be modeled by a Trig Polynomial – Sin and Cos functions with differing frequencies(periods) : Y = d1 sin(2pf1X) + g1 cos(2pf2X) + d1 sin(2pf2X) + g2 cos(2pf2X) + …
Day of the week effect: Holiday Effects This will be modeled using “dummy”variables : a1 D1 + a2 D2 + a3 D3 + a4 D4 + a5 D5 + a6 D6 Di = (1 if day of week = i, 0 otherwise) Holiday Effects Also will be modeled using “dummy”variables :
Independent variables X = day,D1,D2,D3,D4,D5,D6,S1,S2,S3,S4,S5, S6,C1,C2,C3,C4,C5,C6,NYE,HW,V1,V2,cd,T1, T2. Si=sin(0.017202423838959*i*day). Ci=cos(0.017202423838959*i*day). Dependent variable Y = daily accident frequency
Independent variables ANALYSIS OF VARIANCE SUM OF SQUARES DF MEAN SQUARE F RATIO REGRESSION 976292.38 18 54238.46 114.60 RESIDUAL 1547102.1 3269 473.2646 VARIABLES IN EQUATION FOR PACC . VARIABLES NOT IN EQUATION STD. ERROR STD REG F . PARTIAL F VARIABLE COEFFICIENT OF COEFF COEFF TOLERANCE TO REMOVE LEVEL. VARIABLE CORR. TOLERANCE TO ENTER LEVEL (Y-INTERCEPT 60.48909 ) . day 1 0.11107E-02 0.4017E-03 0.038 0.99005 7.64 1 . IACC 7 0.49837 0.78647 1079.91 0 D1 9 4.99945 1.4272 0.063 0.57785 12.27 1 . Dths 8 0.04788 0.93491 7.51 0 D2 10 9.86107 1.4200 0.124 0.58367 48.22 1 . S3 17 -0.02761 0.99511 2.49 1 D3 11 9.43565 1.4195 0.119 0.58311 44.19 1 . S5 19 -0.01625 0.99348 0.86 1 D4 12 13.84377 1.4195 0.175 0.58304 95.11 1 . S6 20 -0.00489 0.99539 0.08 1 D5 13 28.69194 1.4185 0.363 0.58284 409.11 1 . C6 26 -0.02856 0.98788 2.67 1 D6 14 21.63193 1.4202 0.273 0.58352 232.00 1 . V1 29 -0.01331 0.96168 0.58 1 S1 15 -7.89293 0.5413 -0.201 0.98285 212.65 1 . V2 30 -0.02555 0.96088 2.13 1 S2 16 -3.41996 0.5385 -0.087 0.99306 40.34 1 . cd 31 0.00555 0.97172 0.10 1 S4 18 -3.56763 0.5386 -0.091 0.99276 43.88 1 . T1 32 0.00000 0.00000 0.00 1 C1 21 15.40978 0.5384 0.393 0.99279 819.12 1 . C2 22 7.53336 0.5397 0.192 0.98816 194.85 1 . C3 23 -3.67034 0.5399 -0.094 0.98722 46.21 1 . C4 24 -1.40299 0.5392 -0.036 0.98999 6.77 1 . C5 25 -1.36866 0.5393 -0.035 0.98955 6.44 1 . NYE 27 32.46759 7.3664 0.061 0.97171 19.43 1 . HW 28 35.95494 7.3516 0.068 0.97565 23.92 1 . T2 33 -18.38942 7.4039 -0.035 0.96191 6.17 1 . ***** F LEVELS( 4.000, 3.900) OR TOLERANCE INSUFFICIENT FOR FURTHER STEPPING
Day of the week effects D1 4.99945 D2 9.86107 D3 9.43565 D4 13.84377 28.69194 D6 21.63193
Holiday Effects NYE 32.46759 HW 35.95494 T2 -18.38942
Cyclical Effects S1 -7.89293 S2 -3.41996 S4 -3.56763 C1 15.40978 C2 7.53336 C3 -3.67034 C4 -1.40299 C5 -1.36866