Polynomial regression models Possible models for when the response function is “curved”

Uses of polynomial models When the true response function really is a polynomial function. (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

Example What is impact of exercise on human immune system? Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

Scatter plot

A quadratic polynomial regression function where: Y i = amount of immunoglobin in blood (mg) X i = maximal oxygen uptake (ml/kg) typical assumptions about error terms (“INE”)

Estimated quadratic function

Interpretation of the regression coefficients If 0 is a possible x value, then b 0 is the predicted response. Otherwise, interpretation of b 0 is meaningless. b 1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq Predictor Coef SE Coef T P VIF Constant -1464.4 411.4 -3.56 0.001 oxygen 88.31 16.47 5.36 0.000 99.9 oxygensq -0.5362 0.1582 -3.39 0.002 99.9 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxygen 1 4472047 oxygensq 1 130164

A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995

“Center” the predictors Mean of oxygen = 50.637 oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064

Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219

A better quadratic polynomial regression function wheredenotes the centered predictor, and β * 0 = mean response at the predictor mean β * 1 = “linear effect coefficient” β * 11 = “quadratic effect coefficient”

The regression equation is igg = 1632 + 34.0 oxcent - 0.536 oxcentsq Predictor Coef SE Coef T P VIF Constant 1632.20 29.35 55.61 0.000 oxcent 34.000 1.689 20.13 0.000 1.1 oxcentsq -0.5362 0.1582 -3.39 0.002 1.1 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxcent 1 4472047 oxcentsq 1 130164

Interpretation of the regression coefficients b 0 is predicted response at the predictor mean. b 1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

Estimated regression function

Similar estimates

The relationship between the two forms of the model Centered model: Original model: Where:

Mean of oxygen = 50.637

What is predicted IgG if maximal oxygen uptake is 90? There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction. Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq 1 39.4 1549

It is possible to “overfit” the data with polynomial models.

It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted But, good statistical software will keep an unsuspecting user from fitting such a model.

The hierarchical approach to model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?

The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function: and not this one:

Example Quality of a product (y) – a score between 0 and 100 Temperature (x 1 ) – degrees Fahrenheit Pressure (x 2 ) – pounds per square inch

A two-predictor, second-order polynomial regression function where: Y i = quality X i1 = temperature X i2 = pressure β 12 = “interaction effect coefficient”

The regression equation is quality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp Predictor Coef SE Coef T P VIF Constant -5127.9 110.3 -46.49 0.000 temp 31.096 1.344 23.13 0.000 1154.5 pressure 139.747 3.140 44.50 0.000 1574.5 tempsq -0.133389 0.006853 -19.46 0.000 973.0 Press -1.14422 0.02741 -41.74 0.000 1453.0 tp -0.145500 0.009692 -15.01 0.000 304.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

Again, some correlation quality temp pressure tempsq presssq temp -0.423 pressure 0.182 0.000 tempsq -0.434 0.999 0.000 presssq 0.162 0.000 1.000 -0.000 tp -0.227 0.773 0.632 0.772 0.632 Cell Contents: Pearson correlation

A better two-predictor, second-order polynomial regression function where: Y i = quality x i1 = centered temperature x i2 = centered pressure β * 12 = “interaction effect coefficient”

Reduced correlation quality tcent pcent tpcent tcentsq tcent -0.423 pcent 0.182 0.000 tpcent -0.274 0.000 0.000 tcentsq -0.355 -0.000 0.000 0.000 pcentsq -0.762 0.000 0.000 0.000 -0.000 Cell Contents: Pearson correlation

The regression equation is quality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq Predictor Coef SE Coef T P VIF Constant 94.9259 0.7224 131.40 0.000 tcent -0.91611 0.03957 -23.15 0.000 1.0 pcent 0.78778 0.07913 9.95 0.000 1.0 tpcent -0.145500 0.009692 -15.01 0.000 1.0 tcentsq -0.133389 0.006853 -19.46 0.000 1.0 pcentsq -1.14422 0.02741 -41.74 0.000 1.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 94.926 0.722 (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq 1 0.0000 0.0000 0.0000 0.0000 0.0000

Polynomial regression models Possible models for when the response function is “curved”

Similar presentations

Presentation on theme: "Polynomial regression models Possible models for when the response function is “curved”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Polynomial regression models Possible models for when the response function is “curved”

Similar presentations

Presentation on theme: "Polynomial regression models Possible models for when the response function is “curved”"— Presentation transcript:

Similar presentations

About project

Feedback