Virtual University of Pakistan

Virtual University of Pakistan
Lecture No. 28 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah

IN THE LAST LECTURE, YOU LEARNT
Properties of Expected Values in the case of Bivariate Probability Distributions (Detailed discussion) Covariance & Correlation Some Well-known Discrete Probability Distributions: Discrete Uniform Distribution An Introduction to the Binomial Distribution

TOPICS FOR TODAY Binomial Distribution
Fitting a Binomial Distribution to Real Data An Introduction to the Hypergeometric Distribution

We begin with the discussion of the BINOMIAL DISTRIBUTION.

The binomial distribution is a very important discrete probability distribution.
We illustrate this distribution with the help of the following example:

EXAMPLE Suppose that we toss a fair coin 5 times, and we are interested in determining the probability distribution of X, where X represents the number of heads that we obtain. Now, in 5 tosses of the coin, there can be 0, 1, 2, 3, 4 or 5 heads, and the no. of heads is thus a random variable which can take one of these six values. In order to compute the probabilities of these X-values, the formula is:

We note that in tossing a fair coin 5 times:
1) every toss results in either a head or a tail, 2) the probability of heads (denoted by p) is equal to ½ every time (in other words, the probability of heads remains constant), 3) every throw is independent of every other throw, and 4) the total number of tosses i.e. 5 is fixed in advance.

The above four points represents the four basic and vitally important PROPERTIES of a binomial experiment.

Binomial Distribution:
where n = the total no. of trials p = probability of success in each trial q = probability of failure in each trial (i.e. q = 1 - p) x = no. of successes in n trials. x = 0, 1, 2, … n

The binomial distribution has two parameters, n and p.

In this example, n = 5 since the coin was thrown 5 times, p = ½ since it is a fair coin, q = 1 – p = 1 – ½ = ½ Hence

Putting x = 0

Putting x = 1

Similarly, we have:

Hence, the binomial distribution for this particular example is as follows:

Binomial Distribution in the case of tossing a fair coin five times:

Graphical Representation of the above binomial distribution:
10/32 8/32 6.32 2/32 4/32 1 2 3 4 X 5 P(x)

The next question is: What about the mean and the standard deviation of this distribution? We can calculate them just as before, using the formulas

Mean of X = E(X) = XP(X) Var(X) = X2 P(X) – [XP(X)]2

but it has been mathematically proved that for a binomial distribution given by

For a binomial distribution
E(X) = np and Var(X) = npq so that

Coefficient of Variation:
For the above example, n = 5, p = ½ and q = ½ Hence Mean = E(X) = np = 5(½) = 2.5 Coefficient of Variation:

We would have got exactly the same answers if we had applied the LENGTHIER procedure:
E(X) = XP(X) and Var X = X2 P(X)-[XP(X)]2

Graphical Representation of the Mean and Standard Deviation of the Binomial Distribution (n=5, p=1/2) 10/32 8/32 6.32 2/32 4/32 1 2 3 4 5 E(X) S.D.(X) 1.12 X P(x)

What does this mean? What this mean is that if 5 fair coins are tossed an INFINITE no. of times, sometimes we will get no head out of 5, sometimes/head, …. sometimes all 5 heads. But on the AVERAGE we should expect to get 2.5 heads in 5 tosses of the coin, or, a total of 25 heads in 50 tosses of the coin.

And 1.12 gives a measure of the possible variability in the various numbers of heads that can be obtained in 5 tosses. (As you know, in this problem, the number of heads can range from 0 to 5 Had the coin been tossed 10 times, the no. of heads possible would vary from 0 to 10 and the standard deviation would probably have been different).

In this example:

Note that the binomial distribution is not always symmetrical as in the above example.

It will be symmetrical only when p = q = ½ (as in the above example).
1 2 3 4 X 5 P(x)

It is skewed to the right if p < q:
1 2 3 4 X 5 P(x) 6 7

It is skewed to the left if p > q:
7 6 5 4 3 X 2 1 P(x)

But the degree of skewness (or asymmetry) decreases as n increases.

Next, we consider the Fitting of a Binomial Distribution to Real Data.
We illustrate this concept with the help of the following example:

EXAMPLE The following data has been obtained by tossing a LOADED die 5 times, and noting the number of times that we obtained a six. Fit a binomial distribution to this data.

SOLUTION To fit a binomial distribution, we need to find n and p. Here n = 5, the largest x-value. To find p, we use the relationship x = np.

The rationale of this step is that, as indicated in the last lecture, the mean of a binomial probability distribution is equal to np, i.e.  = np But, here, we are not dealing with a probability distribution i.e. the entire population of all possible sets of throws of a loaded die --- we only have a sample of throws at our disposal.

As such,  is not available to us, and all we can do is to replace it by its estimate X.
Hence, our equation becomesX = np.

Now, we have:

Using the relationship x = np, we get 5p = 1.99 or p = 0.398.
This value of p seems to indicate clearly that the die is not fair at all! (Had it been a fair die, the probability of getting a six would have been 1/6 i.e ; a value of p = is very different from )28

Letting the random variable X represent the number of sixes, the above calculations yield the fitted binomial distribution as

Hence the probabilities and expected frequencies are calculated as below:

In the above table, the expected frequencies are obtained by multiplying each of the probabilities by 200. In the entire above procedure, we are assuming that the given frequency distribution has the characteristics of the fitted theoretical binomial distribution,

Comparing the observed frequencies with the expected frequencies, we obtain:

The graphical representation of the observed frequencies as well as the expected frequencies is as follows:

Graphical Representation of the Observed and Expected Frequencies:
75 60 45 15 30 1 2 3 4 X 5 Frequency Observed frequency Expected frequency

The above graph quite clearly indicates that there is not much discrepancy between the observed and the expected frequencies. Hence, we can say that it is a reasonably good fit.

There is a procedure known as the Chi-Square Test of Goodness of Fit which enables us to determine in a formal, mathematical manner whether or not the theoretical distribution fits the observed distribution reasonably well. This test comes under the realm of Inferential Statistics --- that area which we will deal with during the last 15 lectures of this course.

Let us consider a real-life application of the binomial distribution:

AN EXAMPLE FROM INDUSTRY:
Suppose that the past record indicates that the proportion of defective articles produced by this factory is 7%. And suppose that a law NEWLY instituted in this particular country states that there should not be more than 5% defective.

Suppose that the factory-owner makes the statement that his machinery has been overhauled so that the number of defectives has DECREASED. In order to examine this claim, the relevant government department decides to send an inspector to examine a sample of 20 items. What is the probability that the inspector will find 2 or more defective items in his sample (so that a fine will be imposed on the factory)?

SOLUTION The first step is to identify the NATURE of the situation: If we study this problem closely, we realize that we are dealing with a binomial experiment because of the fact that all four properties of a binomial experiment are being fulfilled:

Properties of a Binomial Experiment
1. Every item selected will either be defective (i.e. success) or not defective (i.e. failure). 2. Every item drawn is independent of every other item. 3. The probability of obtaining a defective item i.e. 7% is the same (constant) for all items. (This probability figure is according to relative frequency definition of probability.) 4. The number of items drawn is fixed in advance i.e. 20.

Hence, we are in a position to apply the binomial formula
Substituting n = 20 and p = 0.07, we obtain:

Now P(X > 2) = 1 - P(X < 2) = 1- [P(X = 0) + P(X =1)]

Hence the probability is SUBSTANTIAL i. e
Hence the probability is SUBSTANTIAL i.e. more than 40% that the inspector will find two or more defective articles among the 20 that he will inspect. In other words, there is CONSIDERABLE chance that the factory will be fined.

The point to be realized is that, generally speaking, whenever we are dealing with a ‘success / failure’ situation, we are dealing with what can be a binomial experiment.

(For EXAMPLE, if we are interested in determining any of the following proportions, we are dealing with a BINOMIAL situation: 1. Proportion of smokers in a city smoker  success, non-smokers  failure. 2. Proportion of literates in a community  literacy rate literate  success, illiterate  failure. 3. Proportion of males in a city  sex ratio.)

The next distribution that we are going to discuss is the
HYPERGEOMETRIC PROBABILITY DISTRIBUTION PROPERTIES OF HYPERGEOMETRIC EXPERIMENT i) The outcomes of each trial may be classified into one of two categories, success and failure. ii) The probability of success changes on each trial. iii) The successive trials are not independent. iv) The experiment is repeated a fixed number of times.

There are many experiments in which the condition of independence is violated and the probability of success does not remain constant for all trials. Such experiments are called hypergeometric experiments. In other words, a hypergeometric experiment has the following properties:

The number of success, X in a hypergeometric experiment is called a hypergeometric random variable and its probability distribution is called the hypergeometric distribution.

When the hypergeometric random variable X assumes a value x, the hypergeometric probability distribution is given by the formula

The hypergeometric probability distribution has three parameters N, n and k.
where N = number of units in the population, n = number of units in the sample, and k = number of successes in the population.

The hypergeometric probability distribution is appropriate when
i) a random sample of size n is drawn WITHOUT REPLACEMENT from a finite population of N units; ii) k of the units are of one kind (classified as success) and the remaining N – k of another kind (classified as failure).

IN TODAY’S LECTURE, YOU LEARNT
Binomial Distribution Fitting a Binomial Distribution to Real Data An Introduction to the Hypergeometric Distribution

Hypergeometric Distribution (in some detail) Poisson Distribution
IN THE NEXT LECTURE, YOU WILL LEARN Hypergeometric Distribution (in some detail) Poisson Distribution Limiting Approximation to the Binomial Poisson Process Continuous Uniform Distribution

Virtual University of Pakistan

Similar presentations

Presentation on theme: "Virtual University of Pakistan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Virtual University of Pakistan

Similar presentations

Presentation on theme: "Virtual University of Pakistan"— Presentation transcript:

Similar presentations

About project

Feedback