Presentation is loading. Please wait.

Presentation is loading. Please wait.

6-4 Other Aspects of Regression

Similar presentations


Presentation on theme: "6-4 Other Aspects of Regression"โ€” Presentation transcript:

1 6-4 Other Aspects of Regression
6-4.1 Polynomial Models

2 6-4 Other Aspects of Regression
6-4.1 Polynomial Models

3 6-4 Other Aspects of Regression
6-4.1 Polynomial Models Suppose that we wanted to test the contribution of the second-order terms to this model. In other words, what is the value of expanding the model to include the additional terms? ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘‡โˆ’ ๐›ฝ 2 ๐‘…โˆ’ ๐œ–

4 6-4 Other Aspects of Regression
6-4.1 Polynomial Models Full Model: Reduced Model: ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘‡โˆ’ ๐›ฝ 2 ๐‘…โˆ’ ๐œ– ๐ป 0 : ๐›ฝ 12 = ๐›ฝ 11 = ๐›ฝ 22 = 0 ๐ป 1 :๐ด๐‘ก ๐‘™๐‘’๐‘Ž๐‘ ๐‘ก ๐‘œ๐‘›๐‘’ ๐‘œ๐‘“ ๐‘กโ„Ž๐‘’ ๐›ฝ โ€ฒ ๐‘  โ‰ 0 ๐‘“ 0 = (170.73โˆ’11.37) (5โˆ’2) (16โˆ’6) = 46.72 Tabled F = f0.05;3,10 = 2.44 (p-value < )

5 6-4 Other Aspects of Regression
Example 6-9 OPTIONS NOOVP NODATE NONUMBER; DATA ex69; INPUT YIELD TEMP RATIO; TEMPC=TEMP ; RATIOC=RATIO ; TEMRATC=TEMPC*RATIOC; TEMPCSQ=TEMPC**2; RATIOCSQ=RATIOC**2; CARDS; PROC REG DATA=EX69; MODEL YIELD= TEMPC RATIOC TEMRATC TEMPCSQ RATIOCSQ/VIF; TITLE 'QUADRATIC REGRESSION MODEL - FULL MODEL'; MODEL YIELD=TEMPC RATIOC/VIF; TITLE 'LINEAR REGRESSION MODEL - REDUCED MODEL'; RUN; QUIT;

6 6-4 Other Aspects of Regression

7 6-4 Other Aspects of Regression

8 6-4 Other Aspects of Regression

9 6-4 Other Aspects of Regression

10 6-4 Other Aspects of Regression

11 6-4 Other Aspects of Regression

12 6-4 Other Aspects of Regression
Residual Plots (b) The variance of the observations may by increasing with time or with the magnitude of yi or xi. Data transformation on the response y is often used to eliminate this problem ( ๐‘ฆ , ln ๐‘ฆ, 1 ๐‘ฆ ). (c) Plots of residuals against ๐‘ฆ ๐‘– and xi also indicate inequality of variance. (d) Indicates model inadequacy; that is, higher-order terms should be added to the model, a transformation on the x-variable or the y-variable (or both) should be considered, or other regressors should be considered (quadratic or exponential model)

13 6-4 Other Aspects of Regression
Example OPTIONS NOOVP NODATE NONUMBER; DATA BIDS; INFILE 'C:\Users\korea\Desktop\Working Folder 2017\imen214-stats\ch06\data\bids.dat'; INPUT PRICE QUANTITY BIDS; LOGPRICE=LOG(PRICE); RECPRICE=1/PRICE; QUANSQ=QUANTITY**2; ODS GRAPHICS ON; proc sgplot; scatter x= quantity y=price; TITLE 'Scatter Plot of PRICE vs. QUANTITY'; PROC REG DATA=BIDS; MODEL PRICE= QUANTITY; TITLE 'LINEAR REGRESSION OF PRICE VS. QUANTITY'; MODEL LOGPRICE= QUANTITY; TITLE 'LINEAR REGRESSION OF LOGPRICE VS. QUANTITY'; MODEL RECPRICE= QUANTITY; TITLE 'LINEAR REGRESSION OF RECPRICE VS. QUANTITY'; MODEL PRICE= QUANTITY QUANSQ; TITLE 'QUADRATIC REGRESSION OF PRICE VS. QUANTITY'; RUN; ods graphics off; QUIT; 153.32 1 4 74.11 7.2 10 29.72 16.7 5 54.67 11.9 68.39 9.3 119.04 3.7 116.14 1.7 6 146.49 0.1 9 81.81 7.8 19.58 18.4 141.08 2.9 101.72 4.7 24.88 17.4 19.43 39.63 11.2 151.13 1.6 7 79.18 7.3 204.94 0.2 81.06 6.8 37.62 11.4 8 17.13 20 3 37.81 13.4 130.72 1.8 2 26.07 18.5 39.59 14.7 66.2 9.1

14 6-4 Other Aspects of Regression

15 6-4 Other Aspects of Regression

16 6-4 Other Aspects of Regression

17 6-4 Other Aspects of Regression

18 6-4 Other Aspects of Regression

19 6-4 Other Aspects of Regression

20 6-4 Other Aspects of Regression

21 6-4 Other Aspects of Regression

22 6-4 Other Aspects of Regression

23 6-4 Other Aspects of Regression

24 6-4 Other Aspects of Regression

25 6-4 Other Aspects of Regression

26 6-4 Other Aspects of Regression

27 6-4 Other Aspects of Regression

28 6-4 Other Aspects of Regression

29 6-4 Other Aspects of Regression

30 6-4 Other Aspects of Regression
6-4.2 Categorical Regressors Many problems may involve qualitative or categorical variables. The usual method for the different levels of a qualitative variable is to use indicator variables. For example, to introduce the effect of two different operators into a regression model, we could define an indicator variable as follows:

31 6-4 Other Aspects of Regression
Example 6-10 Y=gas mileage, x1=engine displacement, x2=horse power x3=0 if automatic transmission 1 if manual transmission ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 + ๐›ฝ 3 ๐‘ฅ 3 +๐œ– if automatic (x3=0), then ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 +๐œ– if manual (x3=1), then ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 + ๐›ฝ 3 +๐œ– ๐‘Œ=( ๐›ฝ 0 + ๐›ฝ 3 )+ ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 +๐œ– It is unreasonable because x1, x2 effects to x3 are not involved in the model Interaction model: ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 + ๐›ฝ 3 ๐‘ฅ 3 + ๐›ฝ 13 ๐‘ฅ 1 ๐‘ฅ 3 + ๐›ฝ 23 ๐‘ฅ 2 ๐‘ฅ 3 +๐œ– ๐‘Œ= ๐›ฝ 0 + ๐›ฝ 1 ๐‘ฅ 1 + ๐›ฝ 2 ๐‘ฅ 2 + ๐›ฝ 3 ๐‘ฅ 3 + ๐›ฝ 13 ๐‘ฅ 1 + ๐›ฝ 23 ๐‘ฅ 2 +๐œ– ๐‘Œ= ๐›ฝ 0 + ๐›ฝ ๐›ฝ 1 + ๐›ฝ 13 ๐‘ฅ 1 + ๐›ฝ 2 + ๐›ฝ 23 ๐‘ฅ 2 +๐œ–

32 6-4 Other Aspects of Regression
Dummy Variables Many times a qualitative variable seems to be needed in a regression model. This can be accomplished by creating dummy variables or indicator variables. If a qualitative variable has ๐‘Ÿ levels you will need ๐‘Ÿโˆ’1 dummy variables. Notice that in ANOVA if a treatment had ๐‘Ÿ levels it had ๐‘Ÿโˆ’1 degrees of freedom. The ith dummy variable is defined as ๐‘‹ ๐‘– = ๐‘–๐‘“ ๐‘–๐‘› ๐‘–๐‘กโ„Ž ๐‘™๐‘’๐‘ฃ๐‘’๐‘™ ๐‘œ๐‘“ ๐‘ž๐‘ข๐‘Ž๐‘™๐‘–๐‘ก๐‘Ž๐‘ก๐‘–๐‘ฃ๐‘’ ๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘™๐‘’ 0 ๐‘–๐‘“ ๐‘›๐‘œ๐‘ก ๐‘–๐‘› ๐‘–๐‘กโ„Ž ๐‘™๐‘’๐‘ฃ๐‘’๐‘™ ๐‘œ๐‘“ ๐‘ž๐‘ข๐‘Ž๐‘™๐‘–๐‘ก๐‘Ž๐‘ก๐‘–๐‘ฃ๐‘’ ๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘™๐‘’ i=1, 2, โ‹ฏ, ๐‘Ÿโˆ’1 This can be done automatically in PROC GLM by using the CLASSS statement as we did in ANOVA. Any dummy variables defined with respect to a qualitative variable must be treated as a group. Individual t-tests are not meaningful. Partial F-tests must be performed on the group of dummy variables.

33 6-4 Other Aspects of Regression
Example 6-11 OPTIONS NOOVP NODATE NONUMBER; DATA EX611; INPUT FORM SCENT COLOR RESIDUE REGION QUALITY IF REGION=1 THEN REGION1=0; ELSE REGION1=1; /* IF REGION=1์ด๋ฉด REGION1=0 ์ด๊ณ  THEN EAST IF REGION=2์ด๋ฉด REGION1=1 ์ด๊ณ  THEN WEST */ FR=FORM*REGION1; RR=RESIDUE*REGION1; CARDS; PROC REG DATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1; TITLE 'MODEL WITH DUMMY VARIABLE'; MODEL QUALITY=FORM RESIDUE REGION1 FR RR; TITLE 'INTERACTION MODEL WITH DUMMY VARIABLE'; RUN; QUIT;

34 6-4 Other Aspects of Regression

35 6-4 Other Aspects of Regression

36 6-4 Other Aspects of Regression

37 6-4 Other Aspects of Regression

38 6-4 Other Aspects of Regression

39 6-4 Other Aspects of Regression

40 6-3 Multiple Regression Example OPTIONS NOOVP NODATE NONUMBER;
DATA appraise; INPUT price units age size parking area cond$ IF COND='Fโ€˜ THEN COND1=1; ELSE COND1=0; IF COND='Gโ€˜ THEN COND2=1; ELSE COND2=0; CARDS; F G G E G G G G G G G F E G G G F E G F G E F E ods graphics on; PROC REG DATA=APPRAISE; MODEL PRICE=UNITS AGE AREA COND1 COND2/R; TITLE โ€˜REDUCED MODEL WITH DUMMY VARIABLE'; RUN; ods graphics off; QUIT;

41 6-3 Multiple Regression ๐‘น ๐Ÿ ๐‘ด๐‘บ๐‘ฌ Full Model 0.9801 0.9746 34123
Reduced Model 0.9771 0.9737 34721 With dummy 0.9860 0.9821 28673

42 6-3 Multiple Regression Example

43 6-3 Multiple Regression

44 6-3 Multiple Regression

45 6-3 Multiple Regression

46 Analysis of Covariance (๊ณต๋ถ„์‚ฐ๋ถ„์„)
6-4 Other Aspects of Regression Analysis of Covariance (๊ณต๋ถ„์‚ฐ๋ถ„์„) Suppose we have the following setup. 2 3 Treatment 1 2 r X Y โ‹ฏ ๐‘‹ 1,1 ๐‘Œ 1,1 ๐‘‹ 2,1 ๐‘Œ 2,1 ๐‘‹ ๐‘Ÿ,1 ๐‘Œ ๐‘Ÿ,1 โ‹ฎ ๐‘‹ 1,๐‘› ๐‘Œ 1,๐‘› ๐‘‹ 2,๐‘› ๐‘Œ 2,๐‘› ๐‘‹ ๐‘Ÿ,๐‘› ๐‘Œ ๐‘Ÿ,๐‘› 2 2 3 3 2 4 1 3 Y 2 1 3 2 4 1 3 4 1 2 4 1 3 3 1 4 4 4 1 4 1 X Suppose X and Y are linearly related. We are interested in comparing the means of Y at the different levels of the treatment. Suppose a plot of the data looks like the following. ๊ณต๋ถ„์‚ฐ๋ถ„์„ (Analysis of Covariance : ANCOVA) ์ด๋ž€ ๋ถ„์‚ฐ๋ถ„์„๊ณผ ํšŒ๊ท€๋ถ„์„์„ ํ˜ผํ•ฉํ•œ ํ˜•ํƒœ์˜ ๋ถ„์„์œผ๋กœ์„œ, ๋ถ„์‚ฐ๋ถ„์„์—์„œ์™€ ๊ฐ™์ด ๋…๋ฆฝ์ ์ธ r๊ฐœ ๋ชจ์ง‘๋‹จ ํ‰๊ท  (์ฒ˜๋ฆฌํšจ๊ณผ)๋“ค ๊ฐ„์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”๊ฐ€๋ฅผ ๊ฒ€์ฆํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ์— ์‚ฌ์šฉ๋˜๋‚˜, ์ž๋ฃŒ (Y )๊ฐ€ ๋‹ค๋ฅธ ์–ด๋–ค ๋ณ€์ˆ˜์™€ ํ•จ์ˆ˜๊ด€๊ณ„(์—ฌ๊ธฐ์„œ๋Š” 1์ฐจ์‹์˜ ๊ด€๊ณ„๋งŒ ๋‹ค๋ฃธ)์— ์žˆ๋‹ค๊ณ  ๋ฏฟ์–ด์งˆ ๋•Œ ๋งค์šฐ ์œ ์šฉํ•œ ๋ถ„์„์ˆ˜๋‹จ์ด๋‹ค.

47 6-4 Other Aspects of Regression
Why Use Covariates? Concomitant variables or covariates are used to adjust for factors that influence the Y measurements. In randomized block designs, we did the same thing, but there we could control the value of the block variable. Now we assume we can measure the variable, but not control it. The plot on the previous page demonstrates why we need covariates in some situations. If the covariate (X) was ignored we would most likely conclude that treatment level 3 resulted in a larger mean than 1 and 4 but not different from 2. If the linear relation is extended we see that the value of Y in level 3 could very well be less than that of 1, nearly equal to that of 4 and surely less than that of 2. One assumption we need, equivalent to the no interaction assumption in two-way ANOVA, is that the slopes of the linear relationship between X and Y is the same in each treatment level.

48 Checking for Equal Slopes
6-4 Other Aspects of Regression Checking for Equal Slopes The Model we fit first Treatment = 1 ๐‘Œ 1๐‘— = ๐›ฝ 0 + ๐›ผ 1 +๐›ฝ ๐‘‹ ๐‘— +๐›ผ ๐›ฝ 1 ๐‘‹ ๐‘— + ๐œ€ 1๐‘— Y-intercept = ๐›ฝ 0 + ๐›ผ slope=๐›ฝ+๐›ผ ๐›ฝ 1 โ‹ฎ Treatment = rโˆ’1 ๐‘Œ (๐‘Ÿโˆ’1)๐‘— = ๐›ฝ 0 + ๐›ผ ๐‘Ÿโˆ’1 +๐›ฝ ๐‘‹ ๐‘— +๐›ผ ๐›ฝ ๐‘Ÿโˆ’1 ๐‘‹ ๐‘— + ๐œ€ (๐‘Ÿโˆ’1)๐‘— Y-intercept = ๐›ฝ 0 + ๐›ผ ๐‘Ÿโˆ’ slope=๐›ฝ+๐›ผ ๐›ฝ ๐‘Ÿโˆ’1 Treatment = r ๐‘Œ ๐‘Ÿ๐‘— = ๐›ฝ 0 +๐›ฝ ๐‘‹ ๐‘— + ๐œ€ ๐‘Ÿ๐‘— Y-intercept = ๐›ฝ slope=๐›ฝ The test of equal slopes is ๐ป 0 : ๐›ผ ๐›ฝ 1 =๐›ผ ๐›ฝ 2 =โ‹ฏ=๐›ผ ๐›ฝ ๐‘Ÿโˆ’1 =0 ๐ป 1 :๐ด๐‘ก ๐‘™๐‘’๐‘Ž๐‘ ๐‘ก ๐‘œ๐‘›๐‘’ ๐‘›๐‘œ๐‘ก ๐‘ง๐‘’๐‘Ÿ๐‘œ If we fail to reject this we return the model without the interaction term and test without the interaction term and test ๐ป 0 : ๐›ผ 1 = ๐›ผ 2 =โ‹ฏ= ๐›ผ ๐‘Ÿโˆ’1 =0 ๐ป 1 :๐‘๐‘œ๐‘ก ๐‘Ž๐‘™๐‘™ ๐‘ง๐‘’๐‘Ÿ๐‘œ

49 6-4 Other Aspects of Regression
EXAMPLE Four different formulations of an industrial glue are being tested. The tensile strength of the glue is also related to the thickness. Five observations on strength (Y) and thickness (X) in 0.01 inches are obtained for each formulation. The data are shown in the following table. Glue Formation 1 2 3 4 y x 46.5 45.9 49.8 46.1 44.3 13 14 12 48.7 49.0 50.1 48.5 45.2 10 11 46.3 47.1 48.9 48.2 50.3 15 44.7 43.0 51.0 48.1 48.6 16

50 6-4 Other Aspects of Regression
Example OPTIONS NOOVP NODATE NONUMBER; DATA GLUE; INPUT FORMULA STRENGTH THICK CARDS; Ods graphics on; PROC GLM DATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK FORMULA*THICK; TITLE 'ANALYSIS OF COVARIANCE WITH INTERACTION'; /* TEST FOR LINEARITY */ MODEL STRENGTH=FORMULA THICK/SOLUTION; LSMEANS FORMULA/PDIFF STDERR; TITLE 'ANALYSIS OF COVARIANCE WITHOUT INTERACTION'; /* TEST FOR Parallelism */ RUN; QUIT; SOLUTION produces a solution to the normal equations (parameter estimates). PROC GLM displays a solution by default when your model involves no classification variables, so you need this option only if you want to see the solution for models with classification effects. PDIFF requests that p-values for differences of the LS-means be produced. STDERR produces the standard error of the LS-means and the probability level for the hypothesisย H0: LS-mean=0

51 6-4 Other Aspects of Regression
PROC GLM DATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK FORMULA*THICK; TITLE 'ANALYSIS OF COVARIANCE WITH INTERACTION'; /* TEST FOR LINEARITYโ€˜ */ 4๊ฐœ์˜ formula๊ฐ€ interaction์ด ์—†์ด parallelํ•œ๊ฐ€?

52 6-4 Other Aspects of Regression

53 6-4 Other Aspects of Regression
PROC GLM DATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK/SOLUTION; LSMEANS FORMULA/PDIFF STDERR; TITLE 'ANALYSIS OF COVARIANCE WITHOUT INTERACTION'; /* test for parallelism */ ์•ž์—์„œ ๊ฒฐ์ •ํ•œ parallel line์ด ๋™์ผํ•œ intercept๋ฅผ ์ง€๋‚˜๋Š”๊ฐ€? ์•ž์—์„œ ๊ฒฐ์ •ํ•œ parallel line์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ zero์ธ๊ฐ€?

54 6-4 Other Aspects of Regression

55 6-4 Other Aspects of Regression

56 6-4 Other Aspects of Regression

57 6-4 Other Aspects of Regression

58 6-4 Other Aspects of Regression
6-4.3 Variable Selection Procedures Best Subsets Regressions Selection Techniques R2 MSE Cp ๐ถ ๐‘ = ๐‘†๐‘†๐ธ(๐‘) ๐œŽ 2 (๐น๐‘€) โˆ’๐‘›+2๐‘

59 6-4 Other Aspects of Regression
6-4.3 Variable Selection Procedures Backward Elimination all regressors in the model t-test: smallest absolute t-value eliminated first Minitab ๐›ผ=0.10 for cut-off form, residue, region

60 6-4 Other Aspects of Regression
6-4.3 Variable Selection Procedures Forward Selection No regressors in the model largest absolute t-value added first Minitab ๐›ผ=0.25 for cut-off form, residue, region, scent

61 6-4 Other Aspects of Regression
6-4.3 Variable Selection Procedures Stepwise Regression begins with forward step, then backward elimination tin=tout Minitab ๐›ผ=0.15 for cut-off form, residue, region

62 6-4 Other Aspects of Regression
Example OPTIONS NODATE NOOVP NONUMBER; DATA SALES; INFILE 'C:\Users\korea\Desktop\Working Folder 2017\imen214-stats\ch06\data\sales.dat'; INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; PROC CORR DATA=SALES; VAR SALES; WITH TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS OF DEPENDENT WITH INDENDENTS'; VAR TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS BETWEEN INDEPENDENT VARIABLES'; PROC REG DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/VIF R; TITLE 'REGRESSION MODEL WITH ALL VARIABLES'; PROC RSQUARE DATA=SALES CP; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/ADJRSQ RMSE SSE SELECT=10; TITLE 'ALL POSSIBLE REGRESSIONS'; PROC STEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/FORWARD; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/BACKWARD; TITLE 'STEPWISE REGRESSION USING BACKWARD ELIMINATION'; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'STEPWISE REGRESSION THE STEPWISE TECHNIQUE'; MODEL SALES=POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=POTENT ADVERT SHARE CHANGE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE/R; MODEL SALES=TIME POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE WORKLOAD/R; RUN; QUIT;

63 6-4 Other Aspects of Regression

64 6-4 Other Aspects of Regression
Example

65 6-4 Other Aspects of Regression

66 6-4 Other Aspects of Regression

67 6-4 Other Aspects of Regression

68 6-4 Other Aspects of Regression

69 6-4 Other Aspects of Regression

70 6-4 Other Aspects of Regression

71 All Possible Regressions
6-4 Other Aspects of Regression All Possible Regressions This is the brute force method of modeling. It is feasible if the number of independent variables is small (less than 10 or so) and the sample size is not too large. Some of the common quantities to look at are R-square should be large. Should be adequately increase when an additional variable is added. Adj R-square should not be much less than R-square. It should show an increase if a variable is added. Mallows Cp should be approximately the number of parameters in the model (including the y-intercept). This is a good measure to use to narrow down the possible models quickly, then use 1) and 2) to pick the final models. The model should make sense. Note: Many of the better methods of model selection are to time consuming to use on all possible regressions. A number of good models can be chosen and then use better methods.

72 6-4 Other Aspects of Regression

73 6-4 Other Aspects of Regression

74 6-4 Other Aspects of Regression

75 6-4 Other Aspects of Regression

76 6-4 Other Aspects of Regression
Stepwise Regression Forward Selection: Begins with no variables in the model. Calculates simple linear model for each X and adds most significant. (if above stated p-value). Calculates all models with already added variables and each non-added variable. Most significant is added. (if above sated p-value) This process is continued until no variables can be added. Backward Elimination: Model with all variables is fit. Least significant variable is removed (if p-value is greater than specified limit) and the model is refit without this variable. This process is continued until no variables can be removed.

77 6-4 Other Aspects of Regression
Stepwise Regression Stepwise Technique: This technique is a variation on the forward selection technique. After a variable is added, the least significant is also removed if it has a p-value greater than the specified limit. This accounts for multicollinearity to some degree. Typically you do not do a stepwise procedure if you do an all possible regressions and vice versa. Stepwise procedures are more economical than all possible regressions in large data sets. There is no guarantee that the stepwise procedures will end up with the same model or the โ€œbestโ€ model.

78 6-4 Other Aspects of Regression

79 6-4 Other Aspects of Regression

80 6-4 Other Aspects of Regression

81 6-4 Other Aspects of Regression

82 6-4 Other Aspects of Regression

83 6-4 Other Aspects of Regression

84 6-4 Other Aspects of Regression

85 6-4 Other Aspects of Regression

86 6-4 Other Aspects of Regression

87 6-4 Other Aspects of Regression

88 6-4 Other Aspects of Regression

89 6-4 Other Aspects of Regression

90 6-4 Other Aspects of Regression

91 6-4 Other Aspects of Regression

92 6-4 Other Aspects of Regression

93 6-4 Other Aspects of Regression

94 6-4 Other Aspects of Regression

95 6-4 Other Aspects of Regression

96 6-4 Other Aspects of Regression

97 6-4 Other Aspects of Regression

98 6-4 Other Aspects of Regression
Press Statistic The main purpose of many regression analyses is to predict Y for a future set of Xโ€™s. The problem is that we have only present Yโ€™s and Xโ€™s to use to make a model, but we would like to evaluate the model by how well it estimates Yโ€™s with new Xโ€™s. The Press Statistic tries to overcome this problem. It is similar to the DFFITS in that you remove one observation at a time. The parameters are then calculated and ๐‘Œ is calculated for the Xโ€™s of the observation that is removed. Once the ๐‘Œ ๐‘– โ€™s are calculated in this manner for each observation (call them ๐‘Œ ๐‘– โˆ— ) the press statistic can be calculated. ๐‘ƒ๐‘Ÿ๐‘’๐‘ ๐‘ = ๐‘–=1 ๐‘› ๐‘Œ ๐‘– โˆ’ ๐‘Œ ๐‘– โˆ— 2 Notice that this is very similar to SSE. It is very computation intensive, however. The Press Statistic is obtained in SAS by using the r option on the model statement.

99 6-4 Other Aspects of Regression
Validation Data Split Split data into a fitting portion and a validation portion. This should be done randomly. Perform the model fitting routine as discussed earlier using data in the fitting portion only. For each viable model compute the SSE using the observations in the validation data portion. The best model is the one that minimizes the SSE. Recalculate the chosen model using the entire data set. Notice this procedure requires a large enough data set to enable you to split a validation portion off and still have adequate data to evaluate models. The process is tedious in SAS, requiring multiple runs or fancy programming.

100 6-4 Other Aspects of Regression

101 6-4 Other Aspects of Regression

102 6-4 Other Aspects of Regression

103 6-4 Other Aspects of Regression

104 6-4 Other Aspects of Regression

105 6-4 Other Aspects of Regression

106 6-4 Other Aspects of Regression

107 6-4 Other Aspects of Regression

108 6-4 Other Aspects of Regression

109 6-4 Other Aspects of Regression

110 6-4 Other Aspects of Regression

111 6-4 Other Aspects of Regression

112 6-4 Other Aspects of Regression

113 6-4 Other Aspects of Regression

114 6-4 Other Aspects of Regression

115 6-4 Other Aspects of Regression

116 6-4 Other Aspects of Regression

117 6-4 Other Aspects of Regression

118 6-4 Other Aspects of Regression

119 6-4 Other Aspects of Regression

120 6-4 Other Aspects of Regression

121 6-4 Other Aspects of Regression

122 6-4 Other Aspects of Regression

123 6-4 Other Aspects of Regression

124 6-4 Other Aspects of Regression

125 6-4 Other Aspects of Regression
Model R2 Adj R2 MSE PRESS Potent advert share accounts 0.9004 0.8805 Potent advert share change accounts 0.9119 0.8888 Time potent advert share change 0.9108 0.8873 Time potent advert share accounts 0.9064 0.8817 Time potent advert share change workload 0.9109 0.8812 No one model could be used ๏ƒ  confidence interval might be helpful to decide the best model or parsimony

126


Download ppt "6-4 Other Aspects of Regression"

Similar presentations


Ads by Google