Presentation is loading. Please wait.

Presentation is loading. Please wait.

Don't Be Loopy: Re-Sampling and Simulation the SAS® Way David L. Cassell Design Pathways Corvallis, OR.

Similar presentations


Presentation on theme: "Don't Be Loopy: Re-Sampling and Simulation the SAS® Way David L. Cassell Design Pathways Corvallis, OR."— Presentation transcript:

1 Don't Be Loopy: Re-Sampling and Simulation the SAS® Way David L. Cassell Design Pathways Corvallis, OR

2 David L. Cassell, Design Pathways Introduction BootstrappingJackknifingCross-validationSimulations Monte Carlo … and on and on…

3 David L. Cassell, Design Pathways First, the BAD WAY The typical bootstrap code – a huge macro loop SlowAwkward Very complex code Log-fillingOutput-clogging Did I mention ‘slow’?

4 David L. Cassell, Design Pathways The typical BAD bootstrap code – a huge macro loop %do i = 1 %to &REPS ; %do i = 1 %to &REPS ; %* steps to generate one data set; %* steps to generate one data set; %* the proc to do the analysis; %* the proc to do the analysis; %* some way of appending the new results; %* some way of appending the new results; %end; %end; %* a proc to compute the bootstrap estimates; %* a proc to compute the bootstrap estimates; %mend; %mend;

5 David L. Cassell, Design Pathways Interlude – What is a bootstrap? Types of Re-sampling: Random draws Designed subsets Exchange labels

6 David L. Cassell, Design Pathways Interlude – What is a bootstrap? Want to approximate sampling distribution Simple: SRS with replacement from original sample Non-parametric (mostly) Want: bias, std error, CI, or … Assumptions: exchangeability, …

7 David L. Cassell, Design Pathways Interlude – What is a bootstrap? We’ll start with the simple bootstrap Get a URS sample of size N Compute your statistic Repeat B=1000 or 10,000 or … times Look at the behavior of your B values

8 David L. Cassell, Design Pathways Interlude – What is a bootstrap? Warning: do not forget exchangeability! The simple / naïve bootstrap doesn’t work right on: Time series data Repeated measures data Survey sample data Data with analytic weights........

9 David L. Cassell, Design Pathways Interlude – What is a bootstrap? A common approach is the bootstrap percentile interval: Take your B values from before Pull the 2.5 th and 97.5 th percentiles to get a 95% percentile interval as your CI

10 David L. Cassell, Design Pathways The typical BAD bootstrap code – a huge macro loop %macro bootie ( input=, reps= ); %do i = 1 %to &REPS ; %do i = 1 %to &REPS ; %* steps to generate one data set; %* steps to generate one data set; %* the proc to do the analysis; %* the proc to do the analysis; %* some way of appending the new results; %* some way of appending the new results; %end; %end; %* a proc to compute the bootstrap estimates; %* a proc to compute the bootstrap estimates; %mend; %mend;

11 David L. Cassell, Design Pathways A Better Bootstrap 1. Generate ALL of the bootstrap samples as one data set 2. Use the same proc as before, but use by- processing 3. Use the same computations to get the bootstrap estimates

12 David L. Cassell, Design Pathways A Better Bootstrap proc surveyselect data=YourData out=outboot out=outboot seed=30459584 seed=30459584 method=urs method=urs samprate=1 samprate=1 outhits outhits rep=1000; rep=1000; run; run;

13 David L. Cassell, Design Pathways A Better Bootstrap proc univariate data=outboot; var x; var x; by Replicate; by Replicate; output out=out1 q1=q1 median=med q3=q3; output out=out1 q1=q1 median=med q3=q3; run; run; data out2; set out1; set out1; trimean = (q1 + 2*med + q3) / 4; trimean = (q1 + 2*med + q3) / 4; run; run;

14 David L. Cassell, Design Pathways A Better Bootstrap proc univariate data=out2; var trimean; var trimean; output out=final output out=final pctlpts=2.5, 97.5 pctlpts=2.5, 97.5 pctlpre=ci; pctlpre=ci; run; run;

15 David L. Cassell, Design Pathways A Better Bootstrap – More sasfile YourData load; proc surveyselect data=YourData out=outboot seed=30459584 seed=30459584 method=urs samprate=1 outhits method=urs samprate=1 outhits rep=1000; rep=1000; run; run; sasfile YourData close;

16 David L. Cassell, Design Pathways A Better Bootstrap – More ods listing close; proc univariate data=outboot; var x; var x; by Replicate; by Replicate; output out=out1 q1=q1 median=med q3=q3; output out=out1 q1=q1 median=med q3=q3; run; run; ods listing;

17 David L. Cassell, Design Pathways A Better Bootstrap – ODS OUTPUT ods output Modes=modal; proc univariate data=outboot modes; var YourVariable; var YourVariable; by Replicate; by Replicate; run; run; ods output close;

18 David L. Cassell, Design Pathways Case Resampling Simple bootstrap, as before Apply to: PROC REG, PROC LOGISTIC, …. The approach can be criticized on several grounds

19 David L. Cassell, Design Pathways Case Resampling data test; x=1; y=45; output; x=1; y=45; output; do x = 2 to 29; do x = 2 to 29; y = 3*x + 6*rannor(1234); y = 3*x + 6*rannor(1234); output; output; end; end; x=30; y=45; output; x=30; y=45; output; run; run;

20 David L. Cassell, Design Pathways Case Resampling

21 David L. Cassell, Design Pathways Case Resampling ods listing close; proc surveyselect data=temp1 out=boot1 seed=38474 method=urs samprate=1 outhits rep=1000; method=urs samprate=1 outhits rep=1000; run; run; proc reg data=boot1 outest=est1(drop=_:); model y=x; model y=x; by replicate; by replicate; run; run; ods listing;

22 David L. Cassell, Design Pathways Case Resampling proc univariate data=est1; var x; var x; output out=final pctlpts=2.5, 97.5 output out=final pctlpts=2.5, 97.5 pctlpre=ci; pctlpre=ci; run; run;

23 David L. Cassell, Design Pathways Case Resampling proc robustreg data=temp1 method=MM; model y=x; model y=x; run; run;

24 David L. Cassell, Design Pathways Case Resampling PROC REG (1.74, 2.80) bootstrap (case resampling)(1.65, 2.90) PROC ROBUSTREG(2.39, 3.13)

25 David L. Cassell, Design Pathways Resampling residuals Fit the model Bootstrap sample for the residuals Add the randomly resampled e to Y-hat Fit the model for each of the B reps Compute bootstrap estimates

26 David L. Cassell, Design Pathways Resampling residuals 1 perform the regression, get Y-hat and e 2 split the data 3 copy the FIT data set repeatedly 4 URS sample of residuals for each replicate 5 merge residuals with records 6 fit the model on each replicate 7 compute bootstrap estimates

27 David L. Cassell, Design Pathways Resampling residuals proc reg data=test; model y=x; model y=x; output out=out1 p=yhat r=res; output out=out1 p=yhat r=res; run; run;

28 David L. Cassell, Design Pathways Resampling residuals data fit(keep=yhat x order) resid(keep=res); set out1; set out1; order+1; order+1; run; run;

29 David L. Cassell, Design Pathways Resampling residuals proc surveyselect data=fit out=outfit method=srs samprate=1 rep=1000; method=srs samprate=1 rep=1000; run; run;

30 David L. Cassell, Design Pathways Resampling residuals data outres2; do replicate = 1 to 1000; do replicate = 1 to 1000; do order = 1 to numrecs; do order = 1 to numrecs; p = ceil( numrecs * ranuni(394747373) ); p = ceil( numrecs * ranuni(394747373) ); set resid nobs=numrecs point=p; set resid nobs=numrecs point=p; output; output; end; end; stop; stop; run; run;

31 David L. Cassell, Design Pathways Resampling residuals data prepped; merge outfit outres2; merge outfit outres2; by replicate order; by replicate order; new_y = yhat + res; new_y = yhat + res; run; run;

32 David L. Cassell, Design Pathways Resampling residuals proc reg data=prepped outest=est1( drop=_: ); outest=est1( drop=_: ); model new_y = x; model new_y = x; by replicate; by replicate; run; run;

33 David L. Cassell, Design Pathways Resampling residuals proc univariate data=est1; var x; var x; output out=final pctlpts=2.5, 97.5 output out=final pctlpts=2.5, 97.5 pctlpre=ci; pctlpre=ci; run; run;

34 David L. Cassell, Design Pathways ?The? Bootstrap? Simple bootstrap Residual resampling Parametric bootstrap Smooth bootstrap Wild bootstrap Double bootstrap Various ‘adjusted’ bootstraps..........

35 David L. Cassell, Design Pathways The Jackknife Non-parametric N systematic samples of size N-1 Less general than the bootstrap Easier to apply to complex sampling schemes

36 David L. Cassell, Design Pathways The Jackknife data outb; do replicate = 1 to numrecs; do replicate = 1 to numrecs; do rec = 1 to numrecs; do rec = 1 to numrecs; set test nobs=numrecs point=rec; set test nobs=numrecs point=rec; if replicate ^= rec then output; if replicate ^= rec then output; end; end; stop; stop; run; run;

37 David L. Cassell, Design Pathways The Jackknife ods listing close; proc univariate data=outb; var y; var y; by replicate; by replicate; output out=outall kurtosis=curt; output out=outall kurtosis=curt; run; run; ods listing;

38 David L. Cassell, Design Pathways The Jackknife proc univariate data=outall; var curt; var curt; output out=final mean=jmean std=jstd; output out=final mean=jmean std=jstd; run; run;

39 David L. Cassell, Design Pathways Randomization Tests Resampling plan Re-label the data points randomly Compare against original Random subset of full permutation test

40 David L. Cassell, Design Pathways Cross-Validation Another type of resampling plan K replicate samples Each sample uses (K-1)/K to model and 1/K for testing

41 David L. Cassell, Design Pathways Cross-Validation LOOCV – Leave-One-Out Cross-Validation K-fold Cross-Validation Random K-fold Cross-Validation

42 David L. Cassell, Design Pathways Random K-Fold Cross-Validation %let K=10; %let rate= %sysevalf( (&K-1) / &K ); proc surveyselect data=temp1 out=xv seed=495857 samprate=&RATE outall rep=&K ; samprate=&RATE outall rep=&K ; run; run; data xv; set xv; set xv; if selected then new_y=y; if selected then new_y=y; run; run;

43 David L. Cassell, Design Pathways Random K-Fold Cross-Validation proc reg data=xv; model new_y=x; model new_y=x; by replicate; by replicate; output out=out1(where=(new_y=.)) p=yhat; output out=out1(where=(new_y=.)) p=yhat; run; run;

44 David L. Cassell, Design Pathways Random K-Fold Cross-Validation data out2; set out1; set out1; d=y-yhat; d=y-yhat; absd=abs(d); absd=abs(d); run; run; proc summary data=out2; var d absd; var d absd; output out=out3 std(d)=rmse mean(absd)=mae; output out=out3 std(d)=rmse mean(absd)=mae; run; run;

45 David L. Cassell, Design Pathways Monte Carlo Simulations Sample from theoretical distributions Sample from population of data points

46 David L. Cassell, Design Pathways Simulations proc surveyselect data=largefile out=process_set seed=45884743 method=srs sampsize=1000; seed=45884743 method=srs sampsize=1000; run; run; data processor; array{5,5} a1-a25; array{5,5} a1-a25; set process_set; set process_set;.......... run; run;

47 David L. Cassell, Design Pathways Simulations proc plan seed=4958584; factors replicate=100 ordered factors replicate=100 ordered SiteNo = 30 of 200 / noprint; SiteNo = 30 of 200 / noprint; output out=plan9; output out=plan9; run; run;

48 David L. Cassell, Design Pathways CONCLUSIONS Cassell’s “7 Habits of Highly Effective SAS-ers” KNOW YOUR PROBLEM KNOW YOUR PROBLEM USE THE RIGHT TOOL USE THE RIGHT TOOL FEWER STEPS GET YOU FARTHER FEWER STEPS GET YOU FARTHER STAY TALL AND THIN STAY TALL AND THIN TOO MUCH OF A GOOD THING IS BAD TOO MUCH OF A GOOD THING IS BAD SKIP THE EXPENSIVE STUFF SKIP THE EXPENSIVE STUFF SHARPEN THE SAW SHARPEN THE SAW

49 David L. Cassell, Design Pathways CONCLUSIONS SAS is great at resampling and simulations. You just have to code it in SAS instead of something else! Don’t run 5003 steps when 3 steps will do it. Don’t assume everything is a macro problem.

50 David L. Cassell, Design Pathways CONCLUSIONS Resampling methods and simulations do not solve all your problems. Use your brain before you use your keyboard.

51 David L. Cassell, Design Pathways Contact Information David L. Cassell Design Pathways 3115 NW Norwood Pl. Corvallis, OR 97330 DavidLCassell@msn.com541-754-1304


Download ppt "Don't Be Loopy: Re-Sampling and Simulation the SAS® Way David L. Cassell Design Pathways Corvallis, OR."

Similar presentations


Ads by Google