Using Simulation to Evaluate Statistical Techniques.

Using Simulation to Evaluate Statistical Techniques.
Coverage of Confidence Intervals Robustness Power and Sample Size Simulating p-values

1. Examine coverage for confidence intervals

Suppressing output created by by group processing
%macro ODSOff; /* Call prior to BY-group processing */ /* from the do loop, adapted*/ options nonotes; ods graphics off; ods exclude all; ods noresults; %mend; %macro ODSOn(); /* Call after BY-group processing */ options notes; ods graphics on; ods exclude none; ods results;

Confidence Interval for a Mean, Normal Data
Whenever possible it is useful to simulate a scenario for which one knows what the answer should be. In this case this is estimating confidence intervals for the mean of a sample from a normal population.

Generate samples from a N(0,1) distribution.
%let obs = 50; %let reps = 10000; %let seed=54321; data Normal(keep=rep x); call streamianit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Normal"); output; end; run;

Compute c.i. for each sample
proc means data=Normal noprint; by rep; var x; output out=OutStats mean=Meanx lclm=Lower uclm=Upper; run;

Graph the first 100 c.i.s proc sgplot data=OutStats(obs=100);
title "95% Confidence Intervals for the Mean"; scatter x=rep y=Meanx; highlow x=rep low=Lower high=Upper / legendlabel="95% CI"; refline 0 / axis=y; yaxis display=(nolabel); run;

Calculate coverage data OutStats; set OutStats;
In_CI = (Lower<0 & Upper>0); proc freq data=OutStats; tables in_ci / nocum; run;

Non-normal data -- Coverage for exponential data

Generate exponential samples
%let obs = 10; %let reps = 10000; %let seed=54321; data Exp(keep=rep x); call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Expo") ; output; end; run;

Obtain confidence limits for each sample.
proc means data=Exp noprint; by rep; var x; output out=OutStats mean=Meanx lclm=Lower uclm=upper; run;

Calculate coverage data OutStats; set OutStats;
In_CI = (Lower<1 & Upper>1); run; proc freq data=OutStats; tables In_CI / nocum;

Calculate coverage with IML

Mean and Std functions in IML
proc iml; x=shape(1:12,6); m=mean(x); s=std(x); print x, m, s; quit;

%let obs = 50; %let reps = 10000; %let seed=54321; proc iml; call randseed(&seed); x = j(&obs, &reps);/* each column is a sample*/ call randgen(x, "Normal"); SampleMean = mean(x);/* mean of each column*/ s = std(x);/* std dev of each column*/ talpha = quantile("t", 0.975, &obs-1); Lower = SampleMean - talpha * s / sqrt(&obs); Upper = SampleMean + talpha * s / sqrt(&obs); ParamInCI = (Lower<0 & Upper>0);/*indicator variable*/ PctInCI = ParamInCI[:];/* pct that contain parameter */ print PctInCI; quit;

2. Examine Robustness

Assessing two-sample t Test Robustness to unequal variances

Generate two sample data for two cases, equal and unequal variance.
%let n1 = 10; %let n2 = 10; %let reps = 10000; %let seed=54321; data twosamp(drop=i); label x1 = "Normal data, same variance" x2 = "Normal data, different variance"; call streaminit(&seed); do rep = 1 to &reps; c = 1; do i = 1 to &n1; x1 = rand("Normal"); x2 = rand("Normal"); output; end; c = 2; do i = 1 to &n2; x2 = rand("Normal", 0, 10); run; /* Scenario 1: (x1 | c=1) ~ N(0,1); (x1 | c=2) ~ N(0,1); */ /* Scenario 2: (x2 | c=1) ~ N(0,1); (x2 | c=2) ~ N(0,10); */

Examine t-test output ods trace on;
proc ttest data=fram.frex4 plots=none; class male; var chol; run; ods trace off;

Get probability from t-test assuming equal variance.
%ODSOff proc ttest data=twosamp; by rep; class c; /* compare c=1 to c=2 */ var x1 x2; ods output ttests=TTests(where=(method="Pooled")); run; %ODSOn proc print data=ttests (obs=10);

Calculate proportion rejected.
data Results; set TTests; RejectH0 = (Probt <= 0.05); run; proc sort data=Results; by Variable; proc freq data=Results; tables RejectH0 / nocum;

Assessing t test in IML

Two Sample t-test Robustness to non-normal populations

/*Assessing t test in IML*/
%let n1 = 10; %let n2 = 10; %let reps = 10000;/* number of samples */ %let seed=54321; proc iml; call randseed(&seed); x = j(&n1, &reps);/* allocate space for Group 1 */ y = j(&n2, &reps);/* allocate space for Group 2 */ call randgen(x, "Normal",0,10);/* fill matrix from N(0,10) */ call randgen(y, "exponential",10);/* fill from Exp(1) */ /* 2. Compute the t statistics; VAR operates on columns */ meanX = mean(x); varX = var(x);/* mean & var of each sample */ meanY = mean(y); varY = var(y); /* compute pooled standard deviation from n1 and n2 */ poolStd = sqrt( ((&n1-1)*varX + (&n2-1)*varY)/(&n1+&n2-2) ); /* compute the t statistic */ t = (meanX - meanY) / (poolStd*sqrt(1/&n1 + 1/&n2)); /* 3. Construct indicator var for tests that reject H0 */ alpha = 0.05; RejectH0 = (abs(t)>quantile("t", 1-alpha/2, &n1+&n2-2)); /* 0 or 1 */ /* 4. Compute proportion: (# that reject H0)/NumSamples */ Prob = RejectH0[:]; print Prob; quit;

3. Examine Power and Sample Size

Evaluating power of the t test

PROC POWER proc power; twosamplemeans power = . /* missing ==> "compute this" */ meandiff= 0 to 2 by /* delta = 0, 0.1, ..., */ stddev= /* N(delta, 1) */ npergroup=10; /* 10 in each group */ plot x=effect markers=none; ods output Output=Power; /* output results to data set */ run; proc print data=power;run;

Simulated Power.

Simulate the data. %let n1 = 10; %let n2 = 10; %let reps = 10000;
%let seed=54321; data PowerSim(drop=i); call streaminit(&seed); do Delta = 0 to 2 by 0.1; do rep = 1 to &reps; c = 1; do i = 1 to &n1; x1 = rand("Normal"); output; end; c = 2; do i = 1 to &n2; x1 = rand("Normal", Delta, 1); run;

Compute statistics %ODSOff proc ttest data=PowerSim; by Delta rep;
class c; var x1; ods output ttests=TTests(where=(method="Pooled")); run; %ODSOn

Calculate percent rejected.
data Results; set TTests; RejectH0 = (Probt <= 0.05); run; proc freq data=Results noprint; by Delta; tables RejectH0 / out=SimPower(where=(RejectH0=1)); proc print data=simpower;run;

Plot simulated vs PROC POWER results.
data Combine; set SimPower Power; p = percent / 100; label p="Power"; run; proc sgplot data=Combine noautolegend; title "Power of the t Test"; title2 "Samples are N(0,1) and N(delta,1), n1=n2=10"; series x=MeanDiff y=Power; scatter x=Delta y=p; xaxis label="Difference in Population Means (mu2 - mu1)"; title;

Effect of Sample Size on Power

%let reps = 1000; %let delta=.5; %let seed=54321; data power(drop=iw ); call streaminit(&seed); do obs = 25 to 100 by 5; do rep = 1 to &reps; do i = 1 to obs; grp = 1; x = rand("Normal");output; grp = 2; x = rand("Normal", &Delta, 1); output; end; run;

%ODSOff proc ttest data=power; by obs rep; class grp; var x; ods output ttests=TTests(where=(method="Pooled")); run; %ODSOn

data Results; set TTests; Reject = (Probt <= 0.05); run; proc print data=results(obs=50);run;

proc freq data=Results noprint;
by obs; tables Reject / out=sampsize(where=(Reject)); run; proc print data=sampsize(obs=5);run;

proc power; twosamplemeans meandiff = &delta stddev = 1 alpha = 0.05 npergroup =25 to 100 by 5 power = .; plot markers=none; ods output Output=Power; run; proc print data=power;run;

data total; set power sampsize; pct=percent/100; n=npergroup; run;

proc sgplot data=total noautolegend;
title "Power of the t Test by Sample Size"; title2 "Samples are N(0,1) and N(&delta,1), n1=n2=N"; label N="Size of each sample" p="Power"; refline 0.8 / axis=y; series x=N y=Power; scatter x=obs y=pct; run; title;

Simulating p-values

A 6-sided die is tossed 36 times and the number of times each size appears in recorded.
1 2 3 4 5 6 8 11

data tmp; value=_n_; input count datalines; ; run; proc freq data=tmp; tables value/chisq; weight count;

%let reps=10000; proc iml; Observed = { }; k = ncol(Observed); N = sum(Observed); p = j(1, k, 1/k); NumSamples = 10000; freq = RandMultinomial(&reps, N, p); x = repeat(1:k, &reps); rep = repeat(T(1:&reps), 1, k); create die var {"rep" "Freq" "x"}; append; close; quit; proc print data=die(obs=18); run;

proc freq data=die noprint;
by rep; weight Freq; tables x / chisq; output out=chi2 chisq; run; proc contents data=chi2;run;

proc sql; select count(*)/&reps from chi2 where _pchi_>=7.67 ; title "Percent > observed"; quit; title;

proc sgplot data=chi2; title "Simulated Distribution of Test Statistic under Null Hypothesis"; histogram _pchi_ / binstart=0 binwidth=1; refline 7.67 / axis=x; xaxis label="Test Statistic"; run; title;

Using Simulation to Evaluate Statistical Techniques.

Similar presentations

Presentation on theme: "Using Simulation to Evaluate Statistical Techniques."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Simulation to Evaluate Statistical Techniques.

Similar presentations

Presentation on theme: "Using Simulation to Evaluate Statistical Techniques."— Presentation transcript:

Similar presentations

About project

Feedback