Polynomial regression models Possible models for when the response function is “curved”
Uses of polynomial models When the true response function really is a polynomial function. (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.
Example What is impact of exercise on human immune system? Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?
Scatter plot
A quadratic polynomial regression function where: Y i = amount of immunoglobin in blood (mg) X i = maximal oxygen uptake (ml/kg) typical assumptions about error terms (“INE”)
Estimated quadratic function
Interpretation of the regression coefficients If 0 is a possible x value, then b 0 is the predicted response. Otherwise, interpretation of b 0 is meaningless. b 1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up
The regression equation is igg = oxygen oxygensq Predictor Coef SE Coef T P VIF Constant oxygen oxygensq S = R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS oxygen oxygensq
A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995
“Center” the predictors Mean of oxygen = oxygen oxcent oxcentsq
Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219
A better quadratic polynomial regression function wheredenotes the centered predictor, and β * 0 = mean response at the predictor mean β * 1 = “linear effect coefficient” β * 11 = “quadratic effect coefficient”
The regression equation is igg = oxcent oxcentsq Predictor Coef SE Coef T P VIF Constant oxcent oxcentsq S = R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS oxcent oxcentsq
Interpretation of the regression coefficients b 0 is predicted response at the predictor mean. b 1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up
Estimated regression function
Similar estimates
The relationship between the two forms of the model Centered model: Original model: Where:
Mean of oxygen =
What is predicted IgG if maximal oxygen uptake is 90? There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction. Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq
It is possible to “overfit” the data with polynomial models.
It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted But, good statistical software will keep an unsuspecting user from fitting such a model.
The hierarchical approach to model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?
The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function: and not this one:
Example Quality of a product (y) – a score between 0 and 100 Temperature (x 1 ) – degrees Fahrenheit Pressure (x 2 ) – pounds per square inch
A two-predictor, second-order polynomial regression function where: Y i = quality X i1 = temperature X i2 = pressure β 12 = “interaction effect coefficient”
The regression equation is quality = temp pressure tempsq presssq tp Predictor Coef SE Coef T P VIF Constant temp pressure tempsq Press tp S = R-Sq = 99.3% R-Sq(adj) = 99.1%
Again, some correlation quality temp pressure tempsq presssq temp pressure tempsq presssq tp Cell Contents: Pearson correlation
A better two-predictor, second-order polynomial regression function where: Y i = quality x i1 = centered temperature x i2 = centered pressure β * 12 = “interaction effect coefficient”
Reduced correlation quality tcent pcent tpcent tcentsq tcent pcent tpcent tcentsq pcentsq Cell Contents: Pearson correlation
The regression equation is quality = tcent pcent tpcent tcentsq pcentsq Predictor Coef SE Coef T P VIF Constant tcent pcent tpcent tcentsq pcentsq S = R-Sq = 99.3% R-Sq(adj) = 99.1%
Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq