Download presentation
Presentation is loading. Please wait.
1
The sampling distribution of a statistic
The basic idea is that we can characterize the distribution of the statistic in all possible samples. If we draw a single random sample we usually use it to draw inferences about a statistic (e.g. the mean) based on what (theoretically) would happen if we drew lots of (infinitely many) random samples. This distribution of values of a statistic in all possible samples is the sampling distribution of the statistics. Sometimes theoretical arguments lead to a mathematical representation of the sampling distribution but, in general, we can't enumerate the values of the statistic in all possible samples.
2
Sampling distribution of the mean of a sample.
: . Sampling distribution of the mean of a sample.
3
Monte Carlo Estimates Simulation allows us to actually randomly draw multiple samples and examine empirically what occurs in these samples. We can calculate p-values, standard errors, confidence intervals, etc. based on the multiple samples.
4
The process for simulating the sampling distribution for some statistic:
Generate multiple random samples Compute the statistic for each sample The collection of these calculated statistics provides an approximate sampling distribution (ASD). Analyze the ASD to draw conclusions.
5
Simulation using the Data Step
Now we have to draw multiple samples of a given size. The “classic” example is the sampling distribution of the mean.
6
Sampling Distribution of the Mean, uniform(0,1) 1
Sampling Distribution of the Mean, uniform(0,1) 1. Generate Random samples. %let obs = 10; /* size of each sample */ %let reps = 1000; /* number of samples */ %let seed=54321; data SimUni; call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Uniform"); output; end; run; Note the order of the do loops. We will use by group processing so this order saves a sort step. A good habit is to start with a small number of reps (I usually use 10) and check the code.
7
2. Compute mean for each sample
proc means data=SimUni noprint; by rep; var x; output out=OutUni mean=MeanX; run; proc print data=outuni(obs=10);run; This could also be done with sql
8
3. Analyze ASD: summarize and create histogram
proc means data=OutUni N Mean Std P5 P95; var MeanX; run; proc univariate data=OutUni; label MeanX = "Sample Mean of U(0,1) Data"; histogram MeanX / normal; ods select Histogram moments goodnessoffit; These are our simulated estimates of mean values from a sample of 10 independent U(0,1).
9
Examine Percentiles proc univariate data=OutUni noprint; var MeanX;
output out=Pctl95 N=N mean=MeanX pctlpts= pctlpre=Pctl; run; proc print data=Pctl95 noobs; Univariate allows estimating custom percentiles
10
Estimate Probabilities from ASD, e. g
Estimate Probabilities from ASD, e.g. what is the probability the mean of a sample >.7 proc sql; select sum(meanx>.7)/count(*) as prob from outuni; quit; Things like this are often of interest, e.g. , you got a mean of .9, what is the probability of this occurring if the mean is 0?
11
Sampling Distribution of statistics from normal data
12
1. Simulate data %let obs = 31; %let rep = 10000; %let seed=54321;
data Normals(drop=i); call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Normal"); output; end; run;
13
2. Compute statistics for each sample
proc means data=Normals noprint; by rep; var x; output out=StatsNorm mean=SampleMean median=SampleMedian var=SampleVar; run;
14
3. Analyze Approximate Sampling Distribution
3. Analyze Approximate Sampling Distribution. Calculate variances of sampling distribution for mean and median proc means data=StatsNorm Var; var SampleMean SampleMedian; run;
15
3. Analyze Approximate Sampling Distribution
3. Analyze Approximate Sampling Distribution. Plot kernel density estimates. proc sgplot data=StatsNorm; title "Sampling Distributions of Mean and Median for N(0,1) Data"; density SampleMean / type=kernel legendlabel="Mean"; density SampleMedian / type=kernel legendlabel="Median"; refline 0 / axis=x; run;
16
3. Analyze Approximate Sampling Distribution
3. Analyze Approximate Sampling Distribution. Examine sampling distribution of the variance and fit to chi-square distribution. /* scale the sample variances by (N-1)/sigma^2 */ data OutStatsNorm; set OutStatsNorm; ScaledVar = SampleVar * (&N-1)/1; run; /* Fit chi-square distribution to data */ proc univariate data=OutStatsNorm; label ScaledVar = "Variance of Normal Data (Scaled)"; histogram ScaledVar / gamma(alpha=15 sigma=2); ods select Histogram;
17
The effect of sample size
18
Generate samples %let reps = 1000; %let seed=54321; data SimUniSize;
call streaminit(&seed); do obs = 10, 30, 50, 100; do rep = 1 to &rep; do i = 1 to obs; x = rand("Uniform"); output; end; run;
19
Compute mean for each sample
proc means data=SimUniSize noprint; by obs rep; var x; output out=OutStats mean=SampleMean; run; proc print data=outstats(obs=10);run;
20
Summarize approx. sampling distribution of statistic
proc means data=OutStats Mean Std; class obs; var SampleMean; run; proc means data=OutStats noprint; output out=out(where=(_TYPE_=1)) Mean=Mean Std=Std;
21
Use IML to create data to graph
proc iml; use out;/*output dataset from proc means*/ read all var {N Mean Std};/*create vectors*/ close out;/close the dataset*/ NN = N; x = T( do(0.1, 0.9, ) ); create Convergence var {N x pdf};/*create an empty data set*/ do i = 1 to nrow(NN); N = j(nrow(x), 1, NN[i]); pdf = pdf("Normal", x, Mean[i], Std[i]); append;/*add this observation to data set*/ end; close Convergence;/*close the dataset*/ quit;
22
Graph Created Data ods graphics / ANTIALIASMAX=1300;
proc sgplot data=Convergence; title "Sampling Distribution of Sample Mean"; label pdf = "Density" N = "Sample Size"; series x=x y=pdf / group=N; run;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.