1 Introduction to Biostatistics (PUBHLTH 540) Estimating Parameters Which estimator is best? Study possible samples, determine Expected values, bias, variance, MSE –with replacement example –without replacement example (Exam 1) Estimate population mean –point estimator (sample mean) –interval estimator (95% central width) Central Limit theorem Interval estimators based on a sample –estimating the standard error –determining the multiplier (normal and t-distributions)
2 Sampling with replacement Program ejs09b540p19.sas –uses Arrays, Outputs, and Transpose –Select SRS w rep from N=5 with n=3 –Uniform random number generator Program ejs09b540p20.sas –Replaces sample size, pop size, and trials with macro variables (gives flexibility) –Uses functions of arrays to get mean, var, min, max –Select SRS w rep from N=5 with n=3
3 SRS without Replacement Program ejs09b540p21.sas –Process of selecting subjects without replacement –Do loops, shifting indices etc. Program ejs09b540p22.sas –Implementable version with macro variables Program ejs09b540p23.sas –Check that all sample sets have equal probability –n=3 from N=3 with functions to get sets
4 Which Estimator of Population Median is Best? Program ejs09b540p24.sas –Add data from population, and link response for sample subject sets –Evaluate sample median, mean, (min+max)/2 Program ejs09b540p25.sas –Summarize results of samples- using expected value, variance, MSE of estimators –Use PROC MEANS options for VARDEF=N, and MAXDEC=2 –Sample mean has smallest MSE –Is this always true?
5 Estimate Pop Median Age in Seasons Study Data Program ejs09b540p26.sas –use basev2.sas7bdat with “Age” –include histograms of distribution of estimator over possible samples –best estimator is not the mean!- BEST depends on the population… Program ejs09b540p27.sas –estimate Pop Mean using sample mean from SRS w/o rep. of n=25 –How does var of sample means relate to the population variance?
6 Relating Population Variance to the Variance of the Sample Means Population Variance Variance of Sample Mean (without replacement: with T=10,000 trials…
7 Interval Estimate idea is to place an interval around an estimate to approximate the width of the estimators sampling distribution usually, the width is the central 95% of the estimators sampling distribution How wide is this? –measure width in terms of stderr of mean
8 How good is Approximation? Program ejs09b540p28.sas –SRS w/o rep of n=5 to estimate Mean LDL cholesterol from the Seasons study using the sample mean, 10 samples. –determine the 2.5 th percentile and 97.5 th percentile of the distribution of sample means. –Determine how many multiples of stderror of mean the percentiles are from the population mean
9 Example of 95% Width Program ejs09b540p28.sas Change number of samples to Determine multiples for standard error –Lower 2.5% multiplier is –Upper 97.5 multiplier is 2.02 –Standard Deviation of sample means = se(Mean)=15.94 Program ejs09b540p30.sas –select srs w/o rep of n=5, estimate mean sample mean=166.7 Low= (15.94) High= (15.94)
10 Example of Triglycerides- Seasons Study
11 Example of Triglycerides- Seasons Study Take 10,000 SRS w/o replacement of size n=5 (program ejs09b540p31.sas) Population: SourceSim Source ejs09b540p31.sas Multiplier of for 2.5 %ile Multiplier of for 97.5 %ile
12 Example of Triglycerides- Seasons Study
13 Example of Triglycerides- Seasons Study
14 Example of Triglycerides- Seasons Study
15 Example of Triglycerides- Seasons Study
16 Example of Triglycerides- Seasons Study
17 Example of Triglycerides- Seasons Study
18 Conclusions With larger sample size, distribution of sample means is more bell shaped (i.e. ‘normal’) (Central Limit Theorem) Central 95% of distribution is around + or - 2 standard errors from true population mean In practice we don’t know the SE In practice we don’t know the multiplier Solution: Estimate SE from sample Solution: Approximate multipler assuming a distribution (Normal if known or t-distribution if not known)
19 Normal Distribution With larger sample sizes, the distribution of SRS means is normal: Standard Normal Distribution
20 Transforming a Random Variable Standardization is an example of transforming a random variable. Suppose we have a random variable: What is the expected value and variance of X=a+bY?
21 Transforming a Random Variable Variance of X=a+bY?
22 Transforming a Random Variable Application for Standardizing
23 Conclusions- Practical Assume Central Limit Theorem holds (usually if n>30) Use multiplier based on centered distribution of standard normal (if is known) see Table A3 in Text –central 60%-0.84 to –central 80% to 1.28 –central 90% to 1.64 –central 95%-1.96 to 1.96 –central 99%-2.56 to 2.56
24 Conclusions- Practical In practice we don’t know Estimate using Use a t-distribution with (n-1) degrees of freedom for multiplies (see table A4 in text). –assumes underlying normal distribution and srs
25 Conclusions- Practical t-distribution examples for 95% interval estimator (Confidence interval): –n=2df=1-4.3 to 4.3 –n=5df= to –n=10df= to –n=20df= to –n=30df= to –n=50df= to –n=120df= to 1.98 –n=500df= to 1.96