Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polynomial regression models Possible models for when the response function is “curved”

Similar presentations


Presentation on theme: "Polynomial regression models Possible models for when the response function is “curved”"— Presentation transcript:

1 Polynomial regression models Possible models for when the response function is “curved”

2 Uses of polynomial models When the true response function really is a polynomial function. (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

3 Example What is impact of exercise on human immune system? Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

4 Scatter plot

5 A quadratic polynomial regression function where: Y i = amount of immunoglobin in blood (mg) X i = maximal oxygen uptake (ml/kg) typical assumptions about error terms (“INE”)

6 Estimated quadratic function

7 Interpretation of the regression coefficients If 0 is a possible x value, then b 0 is the predicted response. Otherwise, interpretation of b 0 is meaningless. b 1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

8 The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq Predictor Coef SE Coef T P VIF Constant -1464.4 411.4 -3.56 0.001 oxygen 88.31 16.47 5.36 0.000 99.9 oxygensq -0.5362 0.1582 -3.39 0.002 99.9 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxygen 1 4472047 oxygensq 1 130164

9 A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995

10 “Center” the predictors Mean of oxygen = 50.637 oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064

11 Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219

12 A better quadratic polynomial regression function wheredenotes the centered predictor, and β * 0 = mean response at the predictor mean β * 1 = “linear effect coefficient” β * 11 = “quadratic effect coefficient”

13 The regression equation is igg = 1632 + 34.0 oxcent - 0.536 oxcentsq Predictor Coef SE Coef T P VIF Constant 1632.20 29.35 55.61 0.000 oxcent 34.000 1.689 20.13 0.000 1.1 oxcentsq -0.5362 0.1582 -3.39 0.002 1.1 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxcent 1 4472047 oxcentsq 1 130164

14 Interpretation of the regression coefficients b 0 is predicted response at the predictor mean. b 1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

15 Estimated regression function

16 Similar estimates

17 The relationship between the two forms of the model Centered model: Original model: Where:

18 Mean of oxygen = 50.637

19

20

21 What is predicted IgG if maximal oxygen uptake is 90? There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction. Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq 1 39.4 1549

22 It is possible to “overfit” the data with polynomial models.

23 It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted But, good statistical software will keep an unsuspecting user from fitting such a model.

24 The hierarchical approach to model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?

25 The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function: and not this one:

26 Example Quality of a product (y) – a score between 0 and 100 Temperature (x 1 ) – degrees Fahrenheit Pressure (x 2 ) – pounds per square inch

27

28 A two-predictor, second-order polynomial regression function where: Y i = quality X i1 = temperature X i2 = pressure β 12 = “interaction effect coefficient”

29 The regression equation is quality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp Predictor Coef SE Coef T P VIF Constant -5127.9 110.3 -46.49 0.000 temp 31.096 1.344 23.13 0.000 1154.5 pressure 139.747 3.140 44.50 0.000 1574.5 tempsq -0.133389 0.006853 -19.46 0.000 973.0 Press -1.14422 0.02741 -41.74 0.000 1453.0 tp -0.145500 0.009692 -15.01 0.000 304.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

30 Again, some correlation quality temp pressure tempsq presssq temp -0.423 pressure 0.182 0.000 tempsq -0.434 0.999 0.000 presssq 0.162 0.000 1.000 -0.000 tp -0.227 0.773 0.632 0.772 0.632 Cell Contents: Pearson correlation

31 A better two-predictor, second-order polynomial regression function where: Y i = quality x i1 = centered temperature x i2 = centered pressure β * 12 = “interaction effect coefficient”

32 Reduced correlation quality tcent pcent tpcent tcentsq tcent -0.423 pcent 0.182 0.000 tpcent -0.274 0.000 0.000 tcentsq -0.355 -0.000 0.000 0.000 pcentsq -0.762 0.000 0.000 0.000 -0.000 Cell Contents: Pearson correlation

33 The regression equation is quality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq Predictor Coef SE Coef T P VIF Constant 94.9259 0.7224 131.40 0.000 tcent -0.91611 0.03957 -23.15 0.000 1.0 pcent 0.78778 0.07913 9.95 0.000 1.0 tpcent -0.145500 0.009692 -15.01 0.000 1.0 tcentsq -0.133389 0.006853 -19.46 0.000 1.0 pcentsq -1.14422 0.02741 -41.74 0.000 1.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

34

35

36 Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 94.926 0.722 (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq 1 0.0000 0.0000 0.0000 0.0000 0.0000


Download ppt "Polynomial regression models Possible models for when the response function is “curved”"

Similar presentations


Ads by Google