The sampling distribution of a statistic

Slides:



Advertisements
Similar presentations
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Exercise session # 1 Random data generation Jan Matuska November, 2006 Labor Economics.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Outline input analysis input analyzer of ARENA parameter estimation
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Multiple regression analysis
Header= Verdana 28 pt., Red 1 STA 517 – Chapter 3: Inference for Contingency Tables 3. Inference for Contingency Tables 3.1 Confidence Intervals for Association.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
1 Validation and Verification of Simulation Models.
1 Introduction to Biostatistics (PUBHLTH 540) Estimating Parameters Which estimator is best? Study possible samples, determine Expected values, bias, variance,
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Econ 140 Lecture 31 Univariate Populations Lecture 3.
 Catalogue No: BS-338  Credit Hours: 3  Text Book: Advanced Engineering Mathematics by E.Kreyszig  Reference Books  Probability and Statistics by.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Probability and Statistics Required!. 2 Review Outline  Connection to simulation.  Concepts to review.  Assess your understanding.  Addressing knowledge.
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Sampling Distributions. What is a sampling distribution? Grab a sample of size N Compute a statistic (mean, variance, etc.) Record it Do it again (until.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
ETM 607 – Input Modeling General Idea of Input Modeling Data Collection Identifying Distributions Parameter estimation Goodness of Fit tests Selecting.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Sampling and estimation Petter Mostad
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
BAE 5333 Applied Water Resources Statistics
Inference: Conclusion with Confidence
CHAPTER 12 More About Regression
Parameter, Statistic and Random Samples
Sampling Distributions
Typical biostatistics tasks
Since everything is a reflection of our minds,
Simulation: Sensitivity, Bootstrap, and Power
Goodness-of-Fit Tests
Predictive distributions
6-1 Introduction To Empirical Models
3-2 Random Variables denoted by a variable such as X. In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment,
Ch13 Empirical Methods.
Sampling Distribution of Pearson Correlation
4-1 Statistical Inference
EM for Inference in MV Data
Univariate Statistics
When You See (This), You Think (That)
Sampling Distributions
CHAPTER 12 More About Regression
Using Simulation to Evaluate Statistical Techniques.
(Approximately) Bivariate Normal Data and Inference Based on Hotelling’s T2 WNBA Regular Season Home Point Spread and Over/Under Differentials
Sampling Distribution of the Mean in IML
EM for Inference in MV Data
CHAPTER 12 More About Regression
Sampling Distributions (§ )
Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013.
Introductory Statistics
Presentation transcript:

The sampling distribution of a statistic The basic idea is that we can characterize the distribution of the statistic in all possible samples. If we draw a single random sample we usually use it to draw inferences about a statistic (e.g. the mean) based on what (theoretically) would happen if we drew lots of (infinitely many) random samples. This distribution of values of a statistic in all possible samples is the sampling distribution of the statistics. Sometimes theoretical arguments lead to a mathematical representation of the sampling distribution but, in general, we can't enumerate the values of the statistic in all possible samples.

Sampling distribution of the mean of a sample. : . Sampling distribution of the mean of a sample.

Monte Carlo Estimates Simulation allows us to actually randomly draw multiple samples and examine empirically what occurs in these samples. We can calculate p-values, standard errors, confidence intervals, etc. based on the multiple samples.

The process for simulating the sampling distribution for some statistic: Generate multiple random samples Compute the statistic for each sample The collection of these calculated statistics provides an approximate sampling distribution (ASD). Analyze the ASD to draw conclusions.

Simulation using the Data Step Now we have to draw multiple samples of a given size. The “classic” example is the sampling distribution of the mean.

Sampling Distribution of the Mean, uniform(0,1) 1 Sampling Distribution of the Mean, uniform(0,1) 1. Generate Random samples. %let obs = 10; /* size of each sample */ %let reps = 1000; /* number of samples */ %let seed=54321; data SimUni; call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Uniform"); output; end; run; Note the order of the do loops. We will use by group processing so this order saves a sort step. A good habit is to start with a small number of reps (I usually use 10) and check the code.

2. Compute mean for each sample proc means data=SimUni noprint; by rep; var x; output out=OutUni mean=MeanX; run; proc print data=outuni(obs=10);run; This could also be done with sql

3. Analyze ASD: summarize and create histogram proc means data=OutUni N Mean Std P5 P95; var MeanX; run; proc univariate data=OutUni; label MeanX = "Sample Mean of U(0,1) Data"; histogram MeanX / normal; ods select Histogram moments goodnessoffit; These are our simulated estimates of mean values from a sample of 10 independent U(0,1).

Examine Percentiles proc univariate data=OutUni noprint; var MeanX; output out=Pctl95 N=N mean=MeanX pctlpts=2.5 97.5 pctlpre=Pctl; run; proc print data=Pctl95 noobs; Univariate allows estimating custom percentiles

Estimate Probabilities from ASD, e. g Estimate Probabilities from ASD, e.g. what is the probability the mean of a sample >.7 proc sql; select sum(meanx>.7)/count(*) as prob from outuni; quit; Things like this are often of interest, e.g. , you got a mean of .9, what is the probability of this occurring if the mean is 0?

Sampling Distribution of statistics from normal data

1. Simulate data %let obs = 31; %let rep = 10000; %let seed=54321; data Normals(drop=i); call streaminit(&seed); do rep = 1 to &reps; do i = 1 to &obs; x = rand("Normal"); output; end; run;

2. Compute statistics for each sample proc means data=Normals noprint; by rep; var x; output out=StatsNorm mean=SampleMean median=SampleMedian var=SampleVar; run;

3. Analyze Approximate Sampling Distribution 3. Analyze Approximate Sampling Distribution. Calculate variances of sampling distribution for mean and median proc means data=StatsNorm Var; var SampleMean SampleMedian; run;

3. Analyze Approximate Sampling Distribution 3. Analyze Approximate Sampling Distribution. Plot kernel density estimates. proc sgplot data=StatsNorm; title "Sampling Distributions of Mean and Median for N(0,1) Data"; density SampleMean / type=kernel legendlabel="Mean"; density SampleMedian / type=kernel legendlabel="Median"; refline 0 / axis=x; run;

3. Analyze Approximate Sampling Distribution 3. Analyze Approximate Sampling Distribution. Examine sampling distribution of the variance and fit to chi-square distribution. /* scale the sample variances by (N-1)/sigma^2 */ data OutStatsNorm; set OutStatsNorm; ScaledVar = SampleVar * (&N-1)/1; run; /* Fit chi-square distribution to data */ proc univariate data=OutStatsNorm; label ScaledVar = "Variance of Normal Data (Scaled)"; histogram ScaledVar / gamma(alpha=15 sigma=2); ods select Histogram;

The effect of sample size

Generate samples %let reps = 1000; %let seed=54321; data SimUniSize; call streaminit(&seed); do obs = 10, 30, 50, 100; do rep = 1 to &rep; do i = 1 to obs; x = rand("Uniform"); output; end; run;

Compute mean for each sample proc means data=SimUniSize noprint; by obs rep; var x; output out=OutStats mean=SampleMean; run; proc print data=outstats(obs=10);run;

Summarize approx. sampling distribution of statistic proc means data=OutStats Mean Std; class obs; var SampleMean; run; proc means data=OutStats noprint; output out=out(where=(_TYPE_=1)) Mean=Mean Std=Std;

Use IML to create data to graph proc iml; use out;/*output dataset from proc means*/ read all var {N Mean Std};/*create vectors*/ close out;/close the dataset*/ NN = N; x = T( do(0.1, 0.9, 0.0025) ); create Convergence var {N x pdf};/*create an empty data set*/ do i = 1 to nrow(NN); N = j(nrow(x), 1, NN[i]); pdf = pdf("Normal", x, Mean[i], Std[i]); append;/*add this observation to data set*/ end; close Convergence;/*close the dataset*/ quit;

Graph Created Data ods graphics / ANTIALIASMAX=1300; proc sgplot data=Convergence; title "Sampling Distribution of Sample Mean"; label pdf = "Density" N = "Sample Size"; series x=x y=pdf / group=N; run;