Download presentation
Presentation is loading. Please wait.
1
Remedial Procedure in SLR: transformation
We will introduce remedial procedures in simple linear regression, specifically focus on transformation techniques.
2
Overview of remedial measures
If the simple linear regression model is not appropriate for a data set Abandon regression model and develop a more appropriate model Employ some transformation on the data so that regression model is appropriate for the transformed data Nonlinearity of regression function Transformations Non-constancy of error variance Transformations and Weighted least squares Non-independence of Error terms Autocorrelation, time series analysis Non-normality of error terms Transformations Outliers Transformations or Robust regression If the simple linear regression model is not appropriate for a data set, there are two basic choices: Abandon regression model and develop a more appropriate model. And Employ some transformation on the data so that regression model is appropriate for the transformed data. Each approach has advantages and disadvantages. The first approach may entail a more complex model that could yield better insights, but may also lead to more complex procedures for estimating the parameters. Successful use of transformations, on the other hand, leads to relatively simple method of estimation and may involve fewer parameters than a complex model, an advantage when the sample size is small. Yet transformations may obscure the fundamental interconnection between the variables. We consider the use of transformations in this chapter and the use of more complex models in later chapters
3
Y = β0eβ1X+ε Transformation for a better linear model: transform Y
With simple algebra, we can transform both predictors and response to make some nonlinear equation into a linear form. If the cause of nonlinear relationship is due to Y rather than X, we can also transform Y. Consider the log linear model. We can form a linear model by taking log of both sides. Note that if we transform the dependent variable Y, then the assumptions about the error term change with transformations. Epsilon is now the deviation between true Log(Y) and its estimate. [B] This may change he shape of he distribution of the error terms from the normal distribution. Transformation for a better linear model: transform Y With simple algebra, we can transform both predictors and response to make some nonlinear equation into a linear form. Consider the log-linear model, Y = β0eβ1X+ε We can form a linear model taking log of both sides: log(Y ) = log(β0) + β1X + ε or Y ' = βI + β1X + ε Note that assumptions about the error term change with transformations. 𝜖 𝑖 = log 𝑌 − log (𝑌) Which does not follow Normal, and cannot be diagnosed as usual.
4
Transformation for a better linear model: transform X
If we want to maintain the Normality of the error terms without transforming Y, we can attempt to change X. Luckily, the linear in linear regression means linear with respect to the parameters, not with the predictor X. We can transform X while maintaining the linearity of the parameters and the error term. For example, in the quadratic function Y=beta0+beta1X+beta2X2, [B] we can define two new predictors X1 and X2 to have a linear model. In the example of Y=beta0+beta1long(X), [B] we can define a new predictor X1 to be log(X), and get a simple linear function. The error terms does not change with X transformations, we can diagnose as usual Transformation for a better linear model: transform X The “linear” in linear regression means linear with respect to the parameters, not with the predictors. We can transform the predictors to make many non-linear functional forms as linear model. Such as 𝑌= 𝛽 0 + 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 +𝜀, 𝑤ℎ𝑒𝑟𝑒 𝑋 1 =𝑋, 𝑋 2 = 𝑋 2 Y = β0 + β1X + β2X2 + ε β0 + β1 log(X) + ε 𝑌= 𝛽 0 + 𝛽 1 𝑋 1 +𝜀, 𝑤ℎ𝑒𝑟𝑒 𝑋 1 =log(𝑋) The error term does not change with X transformations, diagnose as usual 𝜖 𝑖 = 𝑌 𝑖 − 𝑌 𝑖
5
Prototype nonlinear regression patterns with constant error variance and transformation X
Comment If some of the X data are near 0 and reciprocal transformation is desired. Shift the origin by 𝑋 ′ = 1 𝑋+𝑘 Where 𝑘 is an constant Should not transform Y because it will Effect the distribution of the error terms. The 3 plots here contain some prototype nonlinear regression relations with constant error variance and some simple transformation on X that may be helpful to linearize the model. We can try several alternative transformation s, and decide which transformation is most effective by scatter plots and residuals plots. Based on the prototype in figure a), we shall consider initially the square root or log transformation on X. At times, it may be helpful to introduce a constant into the transformation. For example, if some of the X data are near zero and the reciprocal transformation is desired, we can shift the origin by using the transformation 1/(X+K), where k is an appropriately chosen constant. Since the residuals are constant, we do not need to transform Y otherwise the residual distribution will change and could violate the assumption of constant variance.
6
Prototype nonlinear regression patterns with unequal variance and Non-normality of error and transformation Y (and X, if necessary) Comment Shapes and spreads of distribution of Y need to be changed Can be combined with transformation on X 𝑌 ′ = log 10 𝑌+𝑘 if Y is negative When unequal error variance are present but still linear. Both Y and X transformation may be required Unequal error variances and nonmorality of the error terms frequently appear together. To remedy these departures from the simple linear regression model, we need a transformation on Y, since the shape and spreads of the distribution of Y need to be changed. The Box-Cox procedure automatically identifies a transformation from the family of power transformations on Y. The family of the power transformation is of the form Y’=Y^lamba. Where lambda is a parameter to be determined from the data. For example. When lambda is 2, Y’=Y2. when lamba is 0.5, Y’ is sqrt Y. It is worthy to note that, [B] when lambda is 0, the transformation is ln(Y), not 0! Such a transformation on Y may help linearize a curvilinear regression relation. When unequal error variances are present but the regression relation is linear, a transformation of Y may not be sufficient. While such a transformation may stabilize the error variance, it will also change the linear relationship to a curvilinear one. A transformation on X may therefore also be required. This case can be handeled by weighted least squares procedure in later chapter.
7
Box-Cox Procedure Y ‘= Y λ Y ‘ = 𝛽 0 + 𝛽 1 𝑋 𝑖 +𝑒𝑟𝑟𝑜𝑟
The Box cox procedure uses the method of maximum likelihood to estimate lambda, and other estimators beta0, beta1 and sigma. It is used when you need to transform Y (and not X)….. It also provides a confidence interval for lambda. In this way, it suggests the best estimate of lambda so that Y ^lambda can be modeled with a linear model. Box-Cox Procedure Transformations on Y sometimes help with variance issue: non-normality and non-constant. Box-Cox considers a family of so-called “power transformations”, where, Y ‘= Y λ then Y ‘ = 𝛽 0 + 𝛽 1 𝑋 𝑖 +𝑒𝑟𝑟𝑜𝑟 “Suggests” transformations to try. Works by using the method of maximum likelihood to find the value of λ that produces the best (transformed) regression. Scatter and residual plots should be utilized to examine the appropriateness
8
The Plasma example Age (X) and plasma level of a poly amine (Y) for a portion of the 25 healthy children are studied. Scatter plot shows there is greater variability for younger children than for older ones Now see the plasma example. Age and plasma level of 25 children are studied. Scatter plot shows there is greater variability for younger children than for older ones, especially between the first and the last group. The impact of age on plasma level is significant. B1 is -2.18, meaning when the age gets one year older, the average plasma level decreases by 2.18 unit, and the pvalue is very small (e-8). The R2 is 75%, meaning 75% of the variance is explained by the model. Although the R2 is not small, we cannot conclude the model is a good one without checking the assumption.
9
Check Normality and constancy on the residuals
Residual plot The Shapiro test is significant, meaning we should conclude that the error terms are not normal Bf test is a rather conservative test, the difference among groups needs to be very significant to reject the test. As we can see that the major difference is between the first and the last group, not for the middle three groups. Although the BF test is not rejected, the curvilinear regression relationship is distinct in the QQ plot, and we should deal with the variance difference between the first and the last group.
10
Confident interval band
𝑌 ℎ ± 𝑡 𝑐 𝑠 𝑌 ℎ Where 𝑠 𝑌 ℎ =𝑠 𝑛 + 𝑋 ℎ − 𝑋 2 Σ 𝑋 𝑖 − 𝑋 2 The 95% confidence band prediction shows that one point is outside of the estimation, suggesting there is an extreme outlier with a large Y value. Thus, we consider transform the Y variable to bring the outlying value down a bit. 𝑌 ℎ ± 𝑡 𝑐 𝑠 𝑝𝑟𝑒𝑑 Where 𝑠 𝑝𝑟𝑒𝑑 2 =𝑠 2 + 𝑠 𝑌 ℎ 2 After a careful exam on the experiment procedure, no mistake has been found, hence we should keep this observation.
11
Box-cox procedure The best 𝜆=−0.515 𝑏𝑖𝑔𝑔𝑒𝑠𝑡 𝑙𝑜𝑔−𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
Now we do Boxcox transformation on Y. There are two boxcox functions in R that are commonly used. Their idea is both to compare linear function that build on a serials of lambda. For example, in the first function, construct 62 models with lambda ranges from -3 to 3, increments by 0.1. And then identify the best lambda values that maximize the probability likelihood function. The best lambda is The plot shows that the lambda in the best 5% ranges from -1.1 and 0.1. The second function boxcox.sse search for lambda value that minimizes the SSE. From the plot, it is clear that a power value near -0.5 has the smallest SSE. [B] We have similar suggestion on lambda from the two methods. Transform Y to 1 over sqrt of Y and refit the model. Box-cox procedure The best 𝜆=− 𝑏𝑖𝑔𝑔𝑒𝑠𝑡 𝑙𝑜𝑔−𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 The best 𝜆=−0.5 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝑆𝐸 -1.1 0.1 𝑌 𝜆 = 𝛽 0 + 𝛽 1 𝑋, 𝑤ℎ𝑒𝑟𝑒 𝜆 𝑟𝑎𝑛𝑔𝑒𝑠 𝑓𝑟𝑜𝑚 −3 𝑡𝑜 3, 𝑖𝑛𝑐𝑟𝑒𝑚𝑒𝑛𝑡𝑠 𝑏𝑦 0.1)
12
The result before and after transformation are both present here.
The linear trend changes from negative to positive because of the reciprocal transformation. The outlying observation in the first group is now closer to the mean. We compare the two models: R2 is increased from 75% to 87%. The F statistic increases from 70 to 149. The F statistic is the ratio of MSR to MSE. The model after transformation explains more variation in Y and more useful. Both R2 and F statistic are used as criterion to compare models. Comparing these values is like comparing F to C, apple with oranges. But the residual standard error cannot be used as criterion to compare. Because the residual standard deviation measures the deviation in the residuals. Before transformation, s is the deviation between Y and Y hat. After transformation, s is the deviation between Y’ and Y’ hat. Transform 𝒀 ′ = 𝟏 𝒀 Because s{residual} has the same unit as the response variable, but transformation alters that. Before After Why the s(residual) is not comparable?
13
Re-Check Normality and constancy on the residuals
Residual plot We still need to check assumptions for the new model. And see that the results appears Normal, and no obvious outliers. Before After
14
Back-transformations
Transformation improves model performance, but could make interpretation harder. If we want to predict Y, the plasma level, we need to back transform from Y’, the 1 over square root of Y. The back transformation lets us make inferences on the original scale. Back-transformations Transformations can improve model performance, but make interpretation hard. Back transformation lets us make inferences (and graphs!) on the original scale. Very helpful for communicating results to the public.
15
Back-transform the end products of the analysis
In general, if we transform Y: For example, if the f() is a quadratic function, the back transformation function f’ turns Y’ to Y, so the f’ is the square root function. [B] let f(Y) denote the transformation function that turns Y to Y’. And let f’ denote the back-transformation function that turns Y’ to Y’. When we have a confidence interval based on Y’, we can directly apply the back transformation function on the upper and the lower bound of the confidence interval. Rather than transform each items, Y’ and standard error of Y’. If we transform X to f(X), we don’t need to back transform and just explain Y based a linear function on f(X). Because our concern is to explain Y, and it doesn’t matter if we explain by X or f(X). No matter we transform X or Y, we usually do not consider transform the estimates, since it is very complicated. Back-transform the end products of the analysis In general, let 𝑌 ′ =𝑓 𝑌 and let 𝑓 ′ be the back-transformation function. Then: For example, 𝑌′=𝑓 𝑌 = 𝑌 2 , the back-transformation function 𝑓 ′ does 𝑓 ′ 𝑌′ =𝑌, so 𝑓 ′ 𝑌′ = 𝑌 2 =𝑌 – 𝒇′( 𝒀 ′ ± 𝒕 𝒄 𝒔 𝒀 ′ ) is the CI for 𝑌 – 𝒇′( 𝒀 ′ ± 𝒕 𝒄 𝒔 𝒑𝒓𝒆𝒅 ) is the PI for Yh(new) In transformation on X, inverting 𝑋 ′ =𝑓(𝑋) is simpler because usually 𝒀 |𝑿 = 𝒀 |𝒇 𝑿 Back transforming the parameter estimates is often quite complicated. 𝑖.𝑒., 𝑌 = 𝛽 0 + 𝛽 1 𝑓(𝑋) 𝑖.𝑒., 𝑌 = 𝑓 ′ 𝛽 0 + 𝑓 ′ 𝛽 1 X
16
Back-transform 𝒀 ′ = 𝟏 𝒀 1. The back transform function 𝑓 ′ = 1 𝑌 ′2 = 𝑌 ′ −2 2. The predicted value should be 𝑌 = 𝑌 ′ −2 3. The confidence interval for the prediction, either for the mean or single response, should also be back transformed with 𝑣𝑎𝑙𝑢𝑒 −2 . Now let’s complete the plasma example with back transformation. The linear function after transformation is Y’ = (X). The back transformation function is raise to the power of -2. The predicted value should be raise to the power of -2. 1 𝑌 = (𝑋)
17
Back-transform 𝒀 ′ = 𝟏 𝒀 , 𝐭𝐡𝐞𝐧 𝒀= 𝒀 ′ −𝟐
We apply the back transformation function on the upper and lower bounds of the mean and single response confidence interval. You can see this from the R function. The lower and upper bounds are all raise to the power of -2. Now it is easier to interpret the result: The relationship between plasma and age follows a nonlinear patter. Overall, the older the children get, the smaller the plasma level. The variation for the younger children is bigger than older. And the confidence band is wider at first than at the end.
18
Summary of remedial measures
And we summarize the topic on the last two pages. Summary of remedial measures For nonlinear functional relationships with well behaved residuals Try transforming X May require a polynomial or piecewise fit (we will cover these later) For non-constant or non-normal variance, possibly with a nonlinear functional form Try transforming Y The Box-Cox procedure may be helpful If the transformation on Y doesn’t fix the non constant variance problem, weighted least squares can be used (we will cover this later).
19
Transformations of X and Y can be used together.
Any time you consider a transformation Remember to recheck all the diagnostics. Consider whether you gain enough to justify losing interpretability. Reciprocal transformations make interpretation especially hard. Consider back-transforming the results of the final model for presentation. For very non-normal errors, especially those arising from discrete responses, generalized linear models are often a better option, but linear regression may be “good enough.”
20
Transformation – our primary tool to improve model fit
Always repeat diagnostic process after transformation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.