Hypergeometric Random Variables
Sampling without replacement When sampling with replacement, each trial remains independent. For example,… If balls are replaced, P(red ball on 2 nd draw) = P(red ball on 2 nd draw | first ball was red). Though for a large population of balls, the effect may be minimal. If balls not replaced, then given the first ball is red, there is less chance of a red ball on the 2 nd draw.
n trials, y red balls Suppose there are r red balls, and N – r other balls. Consider Y, the number of red balls in n selections, where now the trials may be dependent. (for sampling without replacement, when sample size is significant relative to the population) The probability y of the n selected balls are red is
Hypergeometric R. V. A random variable has a hypergeometric distribution with parameters N, n, and r if its probability function is given by where 0 < y < min( n, r ).
Hypergeometric mean, variance If Y is a hypergeometric random variable with parameter p the expected value and variance for Y are given by ( Proof not as easy as previous distributions and is not given at this time. )
Sounds like… If we let p = r/N and q = 1- p = (N - r)/N, then the hypergeometric measures Look quite similar to the expressions for the binomial distribution, E(Y) = np and V(Y) = npq.
Rule of Thumb For cases when n / N < 0.05, it may be reasonable to approximate the hypergeometric probabilities using a binomial distribution. Suppose each hour, 1000 bottles are filled by a machine and on average 10% are “underfilled”. Each hour 20 of the bottles are randomly selected. Find probability at least 3 of the 20 are underfilled. Since 20/1000 = 0.02, perhaps we could use the binomial distribution to approximate the answer.
Easy binomial probability? Let p = 0.10, the “success of underfilling” P( at least 3 underfilled ) = 1 – P( 0, 1, or 2 underfilled) = 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)] Approximately equal to 1 – binomialcdf(20, 0.10, 2) = how close is this to actual hypergeometric?
A hypergeometric probability P( at least 3 underfilled ) = 1 – P( 0, 1, or 2 underfilled) = 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)] As compared to using a binomial approx.
The Binomial Approximation The hypergeometric distribution …and a very similar binomial distribution
As population increases Let N get large as n and p=r/N remain constant, and we would see that Hypergeometric probabilities converge to the binomial probabilities, as the events become “almost independent”. Proof ?