Review 49% of all people in the world are male. Interested whether physics majors are more likely to be male than the general population, you survey 10 of your friends in physics. Here are the probabilities from a binomial distribution with n = 10 and q = .49. How many of your friends need to be male to support your hypothesis? 1 2 3 4 5 6 7 8 9 10 .001 .011 .494 .127 .213 .246 .197 .108 .041 .008
Sampling Distributions 10/04
Same Thought Experiment Known population Sample n members Compute some statistic What is probability distribution of the statistic? Replication Doing exactly the same experiment but with a new sample Sampling variability means each replication will result in different value of statistic
Sampling Distributions The probability distribution of some statistic over repeated replication of an experiment Distribution of sample means: the probability distribution for M Sampling Distribution Population Sample(s) X = [ -.70 1.10 -.36 -.68 -.08], M = -.144 X = [ .09 -.88 1.16 -1.72 .40], M = -.019 X = [ -.99 .47 .65 1.52 .20], M = .370 X = [ -.84 -2.06 1.06 -.24 2.49], M = .082 X = [ 1.88 .57 .38 .16 .85], M = .768
Reliability of the Sample Mean How close is M to m? Tells how much we can rely on M as estimator of m Standard Error (SE or sM) Typical distance from M to m Standard deviation of p(M) Estimating m from M If we know M, we can assume m is within about 2 SEs If m were further, we probably wouldn’t have gotten this value of M SE determines reliability Low SE high reliability; high SE low reliability Depends on sample size and variability of individual scores Law of Large Numbers The larger the sample, the closer M will be to m Formally: as n goes to infinity, SE goes to 0 Implication: more data means more reliability m m m 2sM M
Central Limit Theorem Characterizes distribution of sample mean Deep mathematical result Works for any population distribution Three properties of p(M) Mean: The mean of p(M) always equals m Standard error: The standard deviation of p(M), sM, equals Variance equals Shape: As n gets large, p(M) approaches a Normal distribution All in one equation: Only Normal if n is large enough! Rule of thumb: Normal if n 30
Distribution of Sample Variances Same story for s as for M Probability distribution for s over repeated replication Chi-square distribution (c2) Probability distribution for sample variance Positive skew; variance sensitive to outliers Mean equals s2, because s2 is unbiased Recall: Distribution of ?