Generalized Additive Models: An Introduction and Example William R Shadish University of California, Merced This research was supported in part by grants R305D 100046 and R305U070003 from the Institute for Educational Sciences, U.S. Department of Education, and by a grant from the University of California Office of the President to the University of California Educational Evaluation Consortium. The opinions expressed here are those of the author and do not represent the opinions of the U.S. Department of Education or the UC Office of the President.
This Topic is about Modeling Trend Do the data trend up or down over time? Do they trend in a straight line or curve? If nonlinear, what is the shape of the curve?
Sometimes Trend Looks Pretty Linear
Sometimes It Looks Pretty Nonlinear
Sometimes It Just Looks Like A Mess
How Do We Find the Right Shape of the Trend? Most common statistics assume trend is linear Or they require the researcher to know how to specify the nonlinearity correctly. E.g., is it: x + x2 + x3 log(x) Or something else? But the researcher rarely (if ever) knows! If the guess is wrong, so are the estimates and inferences.
Generalized Additive Models to the Rescue GAMs model trend with smoothing splines They let the data suggest the shape of trend Penalizing for over-fitting a curve to the data
What is a Smoother?
What is a Spline Splines A flexible strip fixed at certain points (knots) and then bent in a smooth curve. Used to draw curves in drafting or carpentry. Statistical splines improve on loess and related smoothers by having a stronger analytic basis, being better at preventing over-smoothing, having superior software implementations, and being easier to make part of GA(M)Ms.
More on Splines (Keele, 2008) A very simple example. Imagine modeling these data: Clearly linearity is not a good fit. The point at which the line changes direction (x = 60) is called a knot (c).
A Tentative Model a very simple case joining two linear regressions together into a spline Predicted values (y) will change depending on whether the observations (x) are above or below the knot
Spline Basis Functions For a spline, the second column of the design matrix (X) is replaced (in this simple case) by two columns: The resulting design matrix is:
X for Simple Spline
This creates the following spline:
More Advanced Matters Many kinds of splines exist Bases are rarely linear Can do generalized additive mixed model Can compute autocorrelation or autoregressive models Can do Poisson, binomial, and other outcomes Can include parametric covariates GAMs are not limited to longitudinal data E.g., used in analysis of regression discontinuity designs.
GAM Degree of Nonlinearity Measured by estimated degrees of freedom (edf) edf = trace(H) edf is approximately (polynomial degree + 1) Some examples of data and edf:
edf for linear data
edf for quadratic data
edf for very wiggly data
An Example: Lambert et al. (2006) An Example: Lambert et al. (2006). Number of Intervals of Disruptive Behavior Recorded during single-student responding (SSR) and response card treatment (RC) conditions
Computations and Data Computations done in R mgcv Data snapshot:
Some Output A significant treatment effect
Cases differ significantly from each other in starting levels on the outcome Some Output
The treatment effect varies significantly over cases Some Output
Some Output 7 of 9 cases show significant nonlinear trend.
Some Output Case 2 shows significant linear trend.
Some Output Case 6 shows no significant trend, linear or not.
Graphical Output
Autocorrelations This was not an autoregressive model. Small n makes AR models difficult We are working on a Bayesian approach to AC In the meantime, one can compute the AC on the residuals to get a sense of the size of the problem:
Autocorrelations Among Residuals Only lag 1 is significant, but 4 or 9 of them. So standard errors could be wrong. gam cannot estimate an autoregressive models, so we are looking at Bayesian gamm’s, which can do so.
Discussion All these models can be implemented in regression or mixed models without smoothers. For SCDs, GA(M)Ms provide information about level, trend, variability, overlap, immediacy of effect, and phase consistency that SCD researchers want when interpreting a functional relation GA(M)Ms probably have wide application in other longitudinal data sets. I can send R syntax for using GAMs.
GAM is a Method Whose Time Has Come Further Readings (In order from least to most complex) Shadish, W. R., Zuur, A. F., & Sullivan, K. J. (2014). Using generalized additive (mixed) models to analyze single case designs. Journal of School Psychology, 52, 149-178 Sullivan, K.J., Shadish, W.R., & Steiner, P.M. (in press). Analyzing longitudinal data with generalized additive models: Applications to single-case designs. Psychological Methods. Keele, L. (2008). Semiparametric Regression for the Social Sciences. Chichester, UK: Wiley. Zuur, A. F. (2012). A beginner’s guide to generalized additive models with R. Newburgh, UK: Highland Statistics. Zuur, A. F., Saveliev, A. A., & Ieno, E. N. (2014). A beginner’s guide to generalized additive mixed models with R. Newburgh, UK: Highland Statistics. Wood, S. N. (2006). Generalized additive models: An introduction with R. Boca Raton, FL: Chapman and Hall/CRC.
THE END This research was supported in part by grants R305D 100046 and R305U070003 from the Institute for Educational Sciences, U.S. Department of Education, and by a grant from the University of California Office of the President to the University of California Educational Evaluation Consortium. The opinions expressed here are those of the author and do not represent the opinions of the U.S. Department of Education or the UC Office of the President.