Joint Probability Distributions and Random Samples

Joint Probability Distributions and Random Samples
5 Copyright © Cengage Learning. All rights reserved.

5.3 Statistics and Their Distributions
Copyright © Cengage Learning. All rights reserved.

Statistics and Their Distributions
Consider selecting two different samples of size n from the same population distribution. The xi’s in the second sample will virtually always differ at least a bit from those in the first sample. For example, a first sample of n = 3 cars of a particular type might result in fuel efficiencies x1 = 30.7, x2 = 29.4, x3 = 31.1, whereas a second sample may give x1 = 28.8, x2 = 30.0, and x3 = 32.5. Before we obtain data, there is uncertainty about the value of each xi.

Because of this uncertainty, before the data becomes available we view each observation as a random variable and denote the sample by X1, X2, , Xn (uppercase letters for random variables). This variation in observed values in turn implies that the value of any function of the sample observations—such as the sample mean, sample standard deviation, or sample fourth spread—also varies from sample to sample. That is, prior to obtaining x1, , xn, there is uncertainty as to the value of , the value of s, and so on.

Definition

Thus the sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by ; the calculated value of this statistic is . Similarly, S represents the sample standard deviation thought of as a statistic, and its computed value is s. If samples of two different types of bricks are selected and the individual compressive strengths are denoted by X1, , Xm and Y1, , Yn, respectively, then the statistic , the difference between the two sample mean compressive strengths, is often of great interest.

Any statistic, being a random variable, has a probability distribution. In particular, the sample mean has a probability distribution. Suppose, for example, that n = 2 components are randomly selected and the number of breakdowns while under warranty is determined for each one. Possible values for the sample mean number of breakdowns are 0 (if X1 = X2 = 0), .5 (if either X1 = 0 and X2 = 1 or X1 = 1 and X2 = 0), 1, 1.5,

The probability distribution of specifies P( = 0), P( = .5), and so on, from which other probabilities such as P(1   3) and P(  2.5) can be calculated. Similarly, if for a sample of size n = 2, the only possible values of the sample variance are 0, 12.5, and 50 (which is the case if X1 and X2 can each take on only the values 40, 45, or 50), then the probability distribution of S2 gives P(S2 = 0), P(S2 = 12.5), and P(S2 = 50).

The probability distribution of a statistic is sometimes referred to as its sampling distribution to emphasize that it describes how the statistic varies in value across all samples that might be selected.

Random Samples

Random Samples Definition

Random Samples Conditions 1 and 2 can be paraphrased by saying that the Xi’s are independent and identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) population, Conditions 1 and 2 are satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet the sample size n is much smaller than the population size N.

Random Samples In practice, if n/N  .05 (at most 5% of the population is sampled), we can proceed as if the Xi’s form a random sample. The virtue of this sampling method is that the probability distribution of any statistic can be more easily obtained than for any other sampling method. There are two general methods for obtaining information about a statistic’s sampling distribution. One method involves calculations based on probability rules, and the other involves carrying out a simulation experiment.

Deriving a Sampling Distribution

Deriving a Sampling Distribution
Probability rules can be used to obtain the distribution of a statistic provided that it is a “fairly simple” function of the Xi’s and either there are relatively few different X values in the population or else the population distribution has a “nice” form. Our next example illustrate such situation.

Example 5.21 A certain brand of MP3 player comes in three configurations: a model with 2 GB of memory, costing $80, a 4 GB model priced at $100, and an 8 GB version with a price tag of $120. If 20% of all purchasers choose the 2 GB model, 30% choose the 4 GB model, and 50% choose the 8 GB model, then the probability distribution of the cost X of a single randomly selected MP3 player purchase is given by with  = 106,  2 = 244 (5.2)

Example 5.21 cont’d Suppose on a particular day only two MP3 players are sold. Let X1 = the revenue from the first sale and X2 the revenue from the second. Suppose that X1 and X2 are independent, each with the probability distribution shown in (5.2) [so that X1 and X2 constitute a random sample from the distribution (5.2)].

Example 5.21 cont’d Table 5.2 lists possible (x1, x2) pairs, the probability of each [computed using (5.2) and the assumption of independence], and the resulting and s2 values. [Note that when n = 2, s2=(x1 – )2 + (x2 – )2.] Outcomes, Probabilities, and Values of x and s2 for Example 20 Table 5.2

Example 5.21 cont’d Now to obtain the probability distribution of , the sample average revenue per sale, we must consider each possible value and compute its probability. For example, = 100 occurs three times in the table with probabilities .10, .09, and .10, so Px (100) = P( = 100) = = .29 Similarly, pS2(800) = P(S2 = 800) = P(X1 = 80, X2 = 120 or X1 = 120, X2 = 80) = = .20

Example 5.21 cont’d The complete sampling distributions of and S2 appear in (5.3) and (5.4). (5.3) (5.4)

Example 5.21 cont’d Figure 5.8 pictures a probability histogram for both the original distribution (5.2) and the distribution (5.3). The figure suggests first that the mean (expected value) of the distribution is equal to the mean 106 of the original distribution, since both histograms appear to be centered at the same place. Probability histograms for the underlying distribution and x distribution in Example 20 Figure 5.8

Example 5.21 From (5.3), = (80)(.04) + . . . + (120)(.25) = 106 = 
cont’d From (5.3), = (80)(.04) (120)(.25) = 106 =  Second, it appears that the distribution has smaller spread (variability) than the original distribution, since probability mass has moved in toward the mean. Again from (5.3), = (802)(.04) +    + (1202)(.25) – (106)2

Example 5.21 cont’d The variance of is precisely half that of the original variance (because n = 2). Using (5.4), the mean value of S2 is S2 = E(S2) =  S2  pS2(s2) = (0)(.38) + (200)(.42) + (800)(.20) =  2 That is, the sampling distribution is centered at the population mean , and the S2 sampling distribution is centered at the population variance  2.

Example 5.21 cont’d If there had been four purchases on the day of interest, the sample average revenue would be based on a random sample of four Xi’s, each having the distribution (5.2). More calculation eventually yields the pmf of for n = 4 as

Example 5.21 cont’d From this, x = 106 =  and = 61 =  2/4. Figure 5.8 is a probability histogram of this pmf. Probability histogram for based on n = 4 in Example 20 Figure 5.9

5.4 The Distribution of the Sample Mean

The Distribution of the Sample Mean
The importance of the sample mean springs from its use in drawing conclusions about the population mean . Some of the most frequently used inferential procedures are based on properties of the sampling distribution of A preview of these properties appeared in the calculations and simulation experiments of the previous section, where we noted relationships between E( ) and  and also among V( ),  2, and n.

Proposition

According to Result 1, the sampling (i.e., probability) distribution of is centered precisely at the mean of the population from which the sample has been selected. Result 2 shows that the distribution becomes more concentrated about  as the sample size n increases. In marked contrast, the distribution of To becomes more spread out as n increases. Averaging moves probability in toward the middle, whereas totaling spreads probability out over a wider and wider range of values.

The standard deviation is often called the standard error of the mean; it describes the magnitude of a typical or representative deviation of the sample mean from the population mean.

Example 5.25 In a notched tensile fatigue test on a titanium specimen, the expected number of cycles to first acoustic emission (used to indicate crack initiation) is  = 28,000, and the standard deviation of the number of cycles is  = 5000. Let X1, X2, , X25 be a random sample of size 25, where each Xi is the number of cycles on a different randomly selected specimen. Then the expected value of the sample mean number of cycles until first emission is E( ) = 28,000, and the expected total number of cycles for the 25 specimens is E(To) = n = 25(28,000) = 700,000.

Example 5.25 cont’d The standard deviation of (standard error of the mean) and of To are If the sample size increases to n = 100, E( ) is unchanged, but = 500, half of its previous value (the sample size must be quadrupled to halve the standard deviation of ).

The Case of a Normal Population Distribution

Proposition We know everything there is to know about the and To distributions when the population distribution is normal. In particular, probabilities such as P(a   b) and P(c  To  d) can be obtained simply by standardizing.

Figure 5.15 illustrates the proposition. A normal population distribution and sampling distributions Figure 5.15

Example 5.26 The distribution of egg weights (g) of a certain type is normal with mean value 53 and standard deviation .3 (consistent with data in the article “Evaluation of Egg Quality Traits of Chickens Reared under Backyard System in Western Uttar Pradesh” (Indian J. of Poultry Sci., 2009: 261–262)). Let 𝑋 1 , 𝑋 2 , …,𝑋 12 denote the weights of a dozen randomly selected eggs; these 𝑋 𝑖 ’s constitute a random sample of size 12 from the specified normal distribution

Example 5.26 cont’d The total weight of the 12 eggs is 𝑇 0 =𝑋 𝑋 12 it is normally distributed with mean value E( 𝑇 0 ) = 𝑛𝜇= 12(53) = 636 and variance V( 𝑇 0 ) = n 𝜎 2 =12 (3) 2 = The probability that the total weight is between 635 and 640 is now obtained by standardizing and referring to Appendix Table A.3:

Example 5.26 cont’d If cartons containing a dozen eggs are repeatedly selected, in the long run slightly more than 83% of the eggs in a carton will weigh in total between 635 g and 640 g. Notice that 635 < 𝑇 0 < 640 is equivalent to < X < (divide each term in the original system of inequalities by 12). Thus P( < X < ) ≈ This latter probability can also be obtained by standardizing X directly.

Example 5.26 Now consider randomly selecting just four of these eggs. The sample mean weight 𝑋 is then normally distributed with mean value 𝜇 𝑋 =𝜇= 53 and standard deviation 𝜇 𝑋 =𝜎/ 𝑛 = .3/ 4 =.15 The probability that the sample mean weight exceeds 53.5 g is then Because 53.5 is 3.33 standard deviations (of X ) larger than the mean value 53, it is exceedingly unlikely that the sample mean will exceed 53.5.

The Central Limit Theorem

When the Xi’s are normally distributed, so is for every sample size n. The derivations in Example 5.21 and simulation experiment of Example 5.24 suggest that even when the population distribution is highly nonnormal, averaging produces a distribution more bell-shaped than the one being sampled A reasonable conjecture is that if n is large, a suitable normal curve will approximate the actual distribution of . The formal statement of this result is the most important theorem of probability.

Sampling distributions of for different populations and different sample sizes

Figure 5.16 illustrates the Central Limit Theorem. The Central Limit Theorem illustrated Figure 5.16

Example 5.27 The amount of a particular impurity in a batch of a certain chemical product is a random variable with mean value 4.0 g and standard deviation 1.5 g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity is between 3.5 and 3.8 g? According to the rule of thumb to be stated shortly, n = 50 is large enough for the CLT to be applicable.

Example 5.27 cont’d then has approximately a normal distribution with mean value = 4.0 and so

Example 5.27 Now consider randomly selecting 100 batches, and let 𝑇 0 represent the total amount of impurity in these batches. Then the mean value and standard deviation of 𝑇 0 are 100(4) = 400 and 100 (1.5) = 15, respectively, and the CLT implies that 𝑇 0 has approximately a normal distribution. The probability that this total is at most 425 g is

The CLT provides insight into why many random variables have probability distributions that are approximately normal. For example, the measurement error in a scientific experiment can be thought of as the sum of a number of underlying perturbations and errors of small magnitude. A practical difficulty in applying the CLT is in knowing when n is sufficiently large. The problem is that the accuracy of the approximation for a particular n depends on the shape of the original underlying distribution being sampled.

If the underlying distribution is close to a normal density curve, then the approximation will be good even for a small n, whereas if it is far from being normal, then a large n will be required. There are population distributions for which even an n of 40 or 50 does not suffice, but such distributions are rarely encountered in practice.

6.1 Some General Concepts of Point Estimation

Some General Concepts of Point Estimation
Statistical inference is almost always directed toward drawing some type of conclusion about one or more parameters (population characteristics). To do so requires that an investigator obtain sample data from each of the populations under study. Conclusions can then be based on the computed values of various sample quantities. For example, let  (a parameter) denote the true average breaking strength of wire connections used in bonding semiconductor wafers.

A random sample of n = 10 connections might be made, and the breaking strength of each one determined, resulting in observed strengths x1, x2, , x10. The sample mean breaking strength x could then be used to draw a conclusion about the value of . Similarly, if  2 is the variance of the breaking strength distribution (population variance, another parameter), the value of the sample variance s2 can be used to infer something about  2.

When discussing general concepts and methods of inference, it is convenient to have a generic symbol for the parameter of interest. We will use the Greek letter  for this purpose. The objective of point estimation is to select a single number, based on sample data, that represents a sensible value for . As an example, the parameter of interest might be , the true average lifetime of batteries of a certain type.

A random sample of n = 3 batteries might yield observed lifetimes (hours) x1 = 5.0, x2 = 6.4, x3 = 5.9. The computed value of the sample mean lifetime is x = 5.77, and it is reasonable to regard 5.77 as a very plausible value of  — our “best guess” for the value of  based on the available sample information. Suppose we want to estimate a parameter of a single population (e.g.,  or  ) based on a random sample of size n.

The difference between the two sample mean strengths is X – Y, the natural statistic for making inferences about 1 – 2, the difference between the population mean strengths. Definition

In the foregoing battery example, the estimator used to obtain the point estimate of  was X, and the point estimate of  was 5.77. If the three observed lifetimes had instead been x1 = 5.6, x2 = 4.5, and x3 = 6.1, use of the estimator X would have resulted in the estimate x = ( )/3 = 5.40. The symbol (“theta hat”) is customarily used to denote both the estimator of  and the point estimate resulting from a given sample.

Thus = X is read as “the point estimator of  is the sample mean X .” The statement “the point estimate of  is 5.77” can be written concisely as = Notice that in writing = 72.5, there is no indication of how this point estimate was obtained (what statistic was used). It is recommended that both the estimator and the resulting estimate be reported.

Example 6.2 Reconsider the accompanying 20 observations on dielectric breakdown voltage for pieces of epoxy resin. The pattern in the normal probability plot given there is quite straight, so we now assume that the distribution of breakdown voltage is normal with mean value . Because normal distributions are symmetric,  is also the median lifetime of the distribution.

Example 6.2 cont’d The given observations are then assumed to be the result of a random sample X1, X2, , X20 from this normal distribution. Consider the following estimators and resulting estimates for  : a. Estimator = X, estimate = x = xi /n = /20 = b. Estimator = , estimate = = ( )/2 = c. Estimator = [min(Xi) + max(Xi)]/2 = the average of the two extreme lifetimes, estimate = [min(xi) + max(xi)]/2 = ( )/2 =

Example 6.2 cont’d d. Estimator = Xtr(10), the 10% trimmed mean (discard the smallest and largest 10% of the sample and then average), estimator = xtr(10) = = Each one of the estimators (a)–(d) uses a different measure of the center of the sample to estimate . Which of the estimates is closest to the true value?

Example 6.2 We cannot answer this without knowing the true value.
cont’d We cannot answer this without knowing the true value. A question that can be answered is, “Which estimator, when used on other samples of Xi’s, will tend to produce estimates closest to the true value?” We will shortly consider this type of question.

In the best of all possible worlds, we could find an estimator for which =  always. However, is a function of the sample Xi ’s, so it is a random variable. For some samples, will yield a value larger than  , whereas for other samples will underestimate  . If we write =  + error of estimation then an accurate estimator would be one resulting in small estimation errors, so that estimated values will be near the true value.

Unbiased Estimators

Unbiased Estimators The second instrument yields observations that have a systematic error component or bias. Definition That is, is unbiased if its probability (i.e., sampling) distribution is always “centered” at the true value of the parameter.

Unbiased Estimators Thus E( ) = E = E(X) = (np) = p Proposition
No matter what the true value of p is, the distribution of the estimator will be centered at the true value.

Unbiased Estimators Proposition
The estimator that uses divisor n can be expressed as (n – 1)S2/n, so

Unbiased Estimators In Example 6.2, we proposed several different estimators for the mean  of a normal distribution. If there were a unique unbiased estimator for  , the estimation problem would be resolved by using that estimator. Unfortunately, this is not the case. Proposition

Reporting a Point Estimate: The Standard Error

Besides reporting the value of a point estimate, some indication of its precision should be given. The usual measure of precision is the standard error of the estimator used. Definition

Example 6.9 Example 6.2… continued
Assuming that breakdown voltage is normally distributed, is the best estimator of . If the value of  is known to be 1.5, the standard error of X is If, as is usually the case, the value of  is unknown, the estimate = s = is substituted into to obtain the estimated standard error

When the point estimator has approximately a normal distribution, which will often be the case when n is large, then we can be reasonably confident that the true value of  lies within approximately 2 standard errors (standard deviations) of . Thus if a sample of n = 36 component lifetimes gives = x = and s = 3.60, then = .60, so within 2 estimated standard errors, translates to the interval  (2)(.60) = (27.30, 29.70).

If is not necessarily approximately normal but is unbiased, then it can be shown that the estimate will deviate from  by as much as 4 standard errors at most 6% of the time. We would then expect the true value to lie within 4 standard errors of (and this is a very conservative statement, since it applies to any unbiased ). Summarizing, the standard error tells us roughly within what distance of we can expect the true value of  to lie.

Joint Probability Distributions and Random Samples

Similar presentations

Presentation on theme: "Joint Probability Distributions and Random Samples"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joint Probability Distributions and Random Samples

Similar presentations

Presentation on theme: "Joint Probability Distributions and Random Samples"— Presentation transcript:

Similar presentations

About project

Feedback