Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)

Similar presentations


Presentation on theme: "Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)"— Presentation transcript:

1 Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)

2 Curvilinear Relationship For many problems, is not linear. Curvilinear relationship: is a curve, not a straight line; increase in mean of y|x is not the same for all x. Two approaches to modeling curvilinear relationships: –Polynomial Regression. –Transformations. Transformations: Perhaps E(f(Y)|g(X)) is a straight line, where f(Y) and g(X) are transformations of Y and X, and a simple linear regression model holds for the response variable f(Y) and explanatory variable g(X).

3 Curvilinear Relationship Y=Life Expectancy in 1999 X=Per Capita GDP (in US Dollars) in 1999 Data in gdplife.JMP Linearity assumption of simple linear regression is clearly violated. The increase in mean life expectancy for each additional dollar of GDP is less for large GDPs than Small GDPs. Decreasing returns to increases in GDP.

4 The mean of Life Expectancy | Log Per Capita appears to be approximately a straight line.

5 How do we use the transformation? Testing for association between Y and X: If the simple linear regression model holds for f(Y) and g(X), then Y and X are associated if and only if the slope in the regression of f(Y) and g(X) does not equal zero. P-value for test that slope is zero is <.0001: Strong evidence that per capita GDP and life expectancy are associated. Prediction and mean response: What would you predict the life expectancy to be for a country with a per capita GDP of $20,000?

6 How do we choose a transformation? Tukey’s Bulging Rule. See Handout. Match curvature in data to the shape of one of the curves drawn in the four quadrants of the figure in the handout. Then use the associated transformations, selecting one for either X, Y or both.

7 Transformations in JMP 1.Use Tukey’s Bulging rule (see handout) to determine transformations which might help. 2.After Fit Y by X, click red triangle next to Bivariate Fit and click Fit Special. Experiment with transformations suggested by Tukey’s Bulging rule. 3.Make residual plots of the residuals for transformed model vs. the original X by clicking red triangle next to Transformed Fit to … and clicking plot residuals. Choose transformations which make the residual plot have no pattern in the mean of the residuals vs. X. 4.Compare different transformations by looking for transformation with smallest root mean square error on original y-scale. If using a transformation that involves transforming y, look at root mean square error for fit measured on original scale.

8

9 ` By looking at the root mean square error on the original y-scale, we see that all of the transformations improve upon the untransformed model and that the transformation to log x is by far the best.

10 The transformation to Log X appears to have mostly removed a trend in the mean of the residuals. This means that. There is still a problem of nonconstant variance.

11 Interpreting the Coefficient on Log X

12 Comparing models for curvilinear relationships In comparing two transformations, use transformation with lower RMSE, using the fit measured on the original scale if y was transformed on the original y-scale In comparing transformations to polynomial regression models, compare RMSE of best transformation to best polynomial regression model. If the transfomation’s RMSE is larger than the polynomial regression’s RMSE but is within 1% of the polynomial regression’s RMSE, then it is still a good idea to use the transformation on the grounds of parsimony.

13 Transformations and Polynomial Regression for Display.JMP RMSE Linear51.59 log x41.31 1/x40.04 46.02 Fourth order poly.37.79 Fourth order polynomial is the best polynomial regression model using the criterion on notes10 Fourth order polynomial is the best model – it has the smallest RMSE by a considerable amount (more than 1% advantage over best transformation of 1/x.

14 Log Transformation of Both X and Y variables It is sometimes useful to transform both the X and Y variables. A particularly common transformation is to transform X to log(X) and Y to log(Y)

15 Heart Disease-Wine Consumption Data (heartwine.JMP)

16 Evaluating Transformed Y Variable Models The log-log transformation provides slightly better predictions than the simple linear regression Model.

17 Interpreting Coefficients in Log-Log Models

18 Another interpretation of coefficients in log-log models

19 Another Example of Transformations: Y=Count of tree seeds, X= weight of tree

20

21 By looking at the root mean square error on the original y-scale, we see that Both of the transformations improve upon the untransformed model and that the transformation to log y and log x is by far the best.

22 Comparison of Transformations to Polynomials for Tree Data

23 Prediction using the log y/log x transformation What is the predicted seed count of a tree that weights 50 mg? Math trick: exp{log(y)}=y (Remember by log, we always mean the natural log, ln), i.e.,


Download ppt "Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)"

Similar presentations


Ads by Google