The hypergeometric and negative binomial distributions
Relationship of hypergeometric and binomial distributions The binomial distribution gives exact probabilities for the number of successes in samples from a dichotomous population (S-F) when sampling with replacement. The binomial distribution gives approximate probabilities when sampling without replacement provided that the sample size n is small relative to the population size N. The hypergeometric distribution gives exact probabilities in the last case.
The hypergeometric distribution Assumptions: The population or set to be sampled consists of N (finite) individuals, objects, etc. Each individual can be classified as S or F, and there are M successes in the population. A sample of n individuals is selected without replacement in such a way that each subset of size n is equally likely to be chosen. The random variable X of interest is the number of successes in the sample. The distribution is denoted by P(X=x) = h(x;n,M,N).
Definition of distribution If X is the number of S’s in a random sample of size n drawn from a population consisting of M S’s and (N-M) F’s, for x, an integer, satisfying
Mean and variance of a hypergeometric rv X The mean and variance of a hypergeometric random variable X having pmf h(x;n,M,N) is
Comparison of hypergeometric and binomial mean and variance The ratio M/N is the proportion of S’s in the population. If we replace M/N by p, we see that the mean of the hypergeometric mean is the same as for the binomial. The hypergeometric variance is multiplied by the factor (N-n)/(N-1) (finite population correction factor). The correction factor is less than 1 (the hypergeometric has a smaller variance than the binomial), and is close to one when n is small relative to N.
Example Five individuals from an animal population of 25 are caught, tagged, and released. After they have mixed with the general population, a sample of 10 of these animals is selected. Let X be the number of tagged animals in this sample of size 10. Compute and , the expected number of tagged animals, and the variance of the number of tagged animals.
Example: Solutions
Example (continued) If the population size N is actually unknown, it makes sense to equate the observed sample proportion of tagged animals x/n with the population proportion M/N. The estimate of N if x=2 would then be
The Negative Binomial Distribution This random variable and distribution are based on an experiment that satisfies the conditions for a binomial random variable and one additional condition: The experiment continues until a total of r successes have been observed, where r is a positive integer. The random variable of interest is X=the number of failures that precede the r-th success. The distribution is denoted by in the text.
Definition of distribution The pmf of a negative binomial rx X with parameters r and p
Mean and variance of a negative binomial random variable
Example Suppose that p=P(female birth)=.5. A couple wishes to have exactly two female children in their family. They will have children until this condition is fulfilled. What is the probability that the family has x male children? What is the probability that the family has four children? How many male children would you expect this family to have? How many children would you expect this family to have?
Solutions for the example P(x male children)= P(four children)=P(X=2)= The expected number of male children before the second female is The expected number of children is then four.
Geometric distribution When r=1, X is the number of failures before the first success. The random variable is called geometric in this case (though Y = X + 1 is also called geometric).