June 16
In Chapter 8: 8.1 Concepts 8.2 Sampling Behavior of a Mean 8.3 Sampling Behavior of a Count and Proportion
§8.1: Concepts Statistical inference is the act of generalizing from a sample to a population with calculated degree of certainty. We calculate statistics in the sample We are curious about parameters in the population
ParametersStatistics SourcePopulationSample Calculated?NoYes Constant?YesNo Notation (examples)μ, σ, p Parameters and Statistics It is essential to draw the distinction between parameters and statistics.
§8.2 Sampling Behavior of a Mean How precisely does a given sample mean reflect the underlying population mean? To answer this question, we must establish the sampling distribution of x- bar The sampling distribution of x-bar is the hypothetical distribution of means from all possible samples of size n taken from the same population
Simulation Experiment Population: N = 10,000 with lognormal distribution (positive skew), μ = 173, and σ = 30 (Figure A, next slide) Take repeated SRSs, each of n = 10, from this population Calculate x-bar in each sample Plot x-bars (Figure B, next slide)
A. Population (individual values) B. Sampling distribution of x-bars
Findings 1.Distribution B is Normal even though Distribution A is not (Central Limit Theorem) 2.Both distributions are centered on µ (“unbiasedness”) 3.The standard deviation of Distribution B is much less than the standard deviation of Distribution A (square root law)
Results from Simulation Experiment Finding 1 (central limit theorem) says the sampling distribution of x-bar tends toward Normality even when the population distribution is not Normal. This effect is strong in large samples. Finding 2 (unbiasedness) means that the expected value of x-bar is μ Finding 3 is related to the square root law which says:
Standard Deviation (Error) of the Mean The standard deviation of the sampling distribution of the mean has a special name: it is called the “standard error of the mean” (SE) The square root law says the SE is inversely proportional to the square root of the sample size:
Example, the Weschler Adult Intelligence Scale has σ = 15 Quadrupling the sample size cut the SE in half Square root law! For n = 1 For n = 4 For n = 16
Putting it together: The sampling distribution of x-bar is Normal with mean µ and standard deviation (SE) = σ / √n (when population Normal or n is large) These facts make inferences about µ possible Example: Let X represent Weschler adult intelligence scores: X ~ N(100, 15). Take an SRS of n = 10 SE = σ / √n = 15/√10 = 4.7 xbar ~ N(100, 4.7) 68% of sample mean will be in the range µ ± SE = 100 ± 4.7 = 95.3 to x ~ N(µ, SE)
Law of Large Numbers As a sample gets larger and larger, the sample mean tends to get closer and closer to the μ This tendency is known as the Law of Large Numbers This figure shows results from a sampling experiment in a population with μ = As n increased, the sample mean became a better reflection of μ = 173.3
8.3 Sampling Behavior of Counts and Proportions Recall (from Ch 6) that binomial random variable represents the random number of successes (X) in n independent “success/failure” trials; the probability of success for each trial is p Notation X~b(n,p) The sampling distribution X~b(10,0.2) is shown on the next slide: μ = 2 when the outcome is expressed as a count and μ = 0.2 when the outcome is expressed as a proportion.
Normal Approximation to the Binomial When n is large, the binomial distribution takes on Normal properties How large does the sample have to be to apply the Normal approximation? One rule says that the Normal approximation applies when npq ≥ 5
Top figure: X~b(10,0.2) npq = 10 ∙ 0.2 ∙ (1–0.2) = 1.6 (less than 5) → Normal approximation does not apply Bottom figure: X~b(100,0.2) npq = 100 ∙ 0.2 ∙ (1−0.2) = 16 (greater than 5) → Normal approximation applies
Normal Approximation for a Binomial Count When Normal approximation applies:
Normal Approximation for a Binomial Proportion
“p-hat” is the symbol for the sample proportion
Illustrative Example: Normal Approximation to the Binomial Suppose the prevalence of a risk factor in a population is p = 0.2 Take an SRS of n = 100 from this population A variable number of cases in a sample will follow a binomial distribution with n = 20 and p =.2
Illustrative Example, cont. The Normal approximation for the binomial count is: The Normal approximation for the binomial proportion is:
1. Statement of a problem: Suppose we see a sample with 30 cases. What is the probability of see at least 30 cases under these circumstance, i.e., Pr(X ≥ 30) = ? assuming X ~ N(20, 4) 2. Standardize: z = (30 – 20) / 4 = Sketch: next slide 4. Table B: Pr(Z ≥ 2.5) = Illustrative Example, cont.
Binomial and superimposed Normal sampling distributions for the problem. Our Normal approximation suggests that only.0062 of samples will see at least this many cases.