Distribution of the Sample Means Topics: Essentials Distributions Sampling Error Distribution of the Sample Means Properties of the Distribution: 1) Mean 2) Std. Dev. 3) Central Limit Theorem A large sample in statistics Example: Calculating and Additional Topic
Essentials: Distribution of Sample Means (A distribution unlike others) Be able to explain what the Distribution of Sample Means represents. Know the three characteristics of this distribution. Be able to use a set of data demonstrate the calculation of the mean and standard deviation of this distribution. What is a statistically large sample?
Some Common Distribution Shapes CHAPTER 6: NORMALLY DISTRIBUTED VARIABLES (6.1) Want to recall from our work in Ch. 2…some common distribution shapes. At any given time in statistics, and in the world in general we can observe an enormous variety of variables. Many are different in the distribution that they form. For example: the generation of random numbers would follow a uniform distribution. Our exam scores and project scores thus far have followed a left skewed distribution. Some variables though (actually many, especially naturally occurring ones) follow what is called a Normal Distribution. You may be more familiar with the term “bell curve.” This is a special distribution, and probably the most important distribution in statistics. In statistics, we call variables whose distributions have this shape Normally Distributed Variables. We call the distribution shape, a Normal Curve.
Distribution of the Sample Means Sampling Error: the difference between the sample measure and the population measure due to the fact that a sample is not a perfect representation of the population. the error resulting from using a sample to estimate a population characteristic. Recall: so far, we’ve been talking about variables (we’ve usually called x), and the distribution of these variables. We’ve stated that the distribution of x has a mean we call mu, and a standard deviation we call sigma. We’ve also stated that we can use a sample to acquire information about a population, and that this is most often preferable, since an entire census is often impossible. This however, poses a problem, since the sample provides data for only a portion of the entire population. We cannot expect one sample to give us perfectly accurate information about the population of interest. There is a certain amount of error that will result simply because we are sampling. Hopefully, you will recall from earlier in the course, this type of error is called sampling error. For Example: The Census Bureau publishes figures on the mean income of U.S. households. In 1993, the figure published was $41,428. This figure is the sample mean (x-bar) income of the 60,000 households, NOT the population mean mu of all U.S. households, but we may ask ourselves, how accurate are such estimates likely to be? Is the estimate within $1,000, $5,000, etc.? In order to answer this question, we would need to know the distribution of all possible sample means that could be obtained by sampling the incomes of 60,000 households. This distribution is called the distribution of the sample mean. Let’s look at an example.
Distribution of the Sample Means Distribution of the Sample Means – is a distribution obtained by using the means computed from random samples of a specific size taken from a population. Distribution of the Sample Mean, – the distribution of all possible sample means for a variable x, and for a given sample size. Recall: so far, we’ve been talking about variables (we’ve usually called x), and the distribution of these variables. In other words, how they vary about the mean. In addition to knowing how individual data values vary about the mean for a population, we are sometimes also interested in knowing about the distribution of the means of samples taken from a population. For example, Suppose a researcher selects 100 samples of a given size from a large population and computes the mean for each of the 100 samples. The values of these 100 means constitute a sampling distribution of sample means. If the sample means are randomly selected, the sample means, for the most part, will be somewhat different from the population mean mu. These differences are caused by sampling error.
Properties of the Distribution of Sample Means The mean of the sample means will be the same as the population mean. The standard error of the sample means will be smaller than the standard deviation of the population, and it will be equal to the population standard deviation divided by the square root of the sample size.
Standard Error vs. Standard Deviation Standard error of mean versus standard deviation. ... Put simply, the standard error of the sample mean is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean.
A Third Property of the Distribution of Sample Means A third property of the distribution of the sample means concerns the shape of the distribution, and is explained by the Central Limit Theorem.
The Central Limit Theorem As the sample size n increases, the shape of the distribution of the sample means taken from a population with mean and standard deviation will approach a normal distribution. This distribution will have mean and standard deviation We can use the Central Limit Theorem to answer questions about sample means in the same way that the normal distribution can be used to answer questions about individual values. The only difference is that a new formula must be used to obtain z-scores.
Two Important Things to Remember When Using The Central Limit Theorem When the original variable is normally distributed, the distribution of the sample means will be normally distributed, for any sample size n. When the distribution of the original variable departs from normality, a sample size of 30 or more is needed to use the normal distribution to approximate the distribution of the sample means. The larger the sample, the better the approximation will be.
An Example Suppose I give an 8-point quiz to a small class of four students. The results of the quiz were 2, 6, 4, and 8. We will assume that the four students constitute the population.
The Mean and Standard Deviation of the Population (the four scores) The mean of the population is: The standard deviation of the population is:
Distribution of Quiz Scores A graph of the distribution of quiz scores.
All Possible Samples of Size 2 Taken With Replacement SAMPLE MEAN SAMPLE MEAN 2,2 2 6,2 4 2,4 3 6,4 5 2,6 4 6,6 6 2,8 5 6,8 7 4,2 3 8,2 5 4,4 4 8,4 6 4,6 5 8,6 7 4,8 6 8,8 8 All possible samples of size 2 taken with replacement.
Frequency Distribution of the Sample Means MEAN f 2 1 3 2 4 3 5 4 6 3 7 2 8 1 Shows the number of times each mean occurred.
Distribution of the Sample Means Note the shape here.
The Mean of the Sample Means Denoted In our example: So, , which in this case = 5
The Standard Error of the Sample Means Denoted In our example: Which is the same as the population standard deviation divided by Comment on sampling without replacement which is more the norm, and the Finite Population Correction Factor.
Additional Topics: Using these five players, let’s obtain the distribution of the sample means for samples of size n = 2. This means that we need to determine all possible samples of size 2 from this population of five players. How many samples of size 2 are there? Recall 5 2 Since there are only 10 possible samples of size 2, we can list them quite easily. Let’s look, but before we do, let’s calculate the mean height mu (why mu as opposed to x-bar?), of our five starting players. Mu = 400/5 = 80
Calculating the Standard Error STANDARD ERROR CALCULATION Procedure: Step 1: Calculate the mean (Total of all samples divided by the number of samples). Steps 2 – 6 use the definition formula to calculate the standard deviation Step 2: Calculate each measurement's deviation from the mean (Mean minus the individual measurement). Step 3: Square each deviation from mean. Squared negatives become positive. Step 4: Sum the squared deviations (Add up the numbers from step 3). Step 5: Divide that sum from step 4 by one less than the sample size (n-1, that is, the number of measurements minus one) Step 6: Take the square root of the number in step 5. That gives you the "standard deviation (S.D.)." Step 7: Divide the standard deviation by the square root of the sample size (n). That gives you the “standard error”.
Heights of Five Starting Players on a Men’s Basketball Team (inches) Demonstration showing increasing sample size yielding better estimations of the population value. Heights of Five Starting Players on a Men’s Basketball Team (inches) Using these five players, let’s obtain the distribution of the sample means for samples of size n = 2. This means that we need to determine all possible samples of size 2 from this population of five players. How many samples of size 2 are there? Recall 5 2 Since there are only 10 possible samples of size 2, we can list them quite easily. Let’s look, but before we do, let’s calculate the mean height mu (why mu as opposed to x-bar?), of our five starting players. Mu = 400/5 = 80
Possible Samples of Size n = 2 From a Population of Size N = 5 Here we see the 10 possible samples of size 2 in the first column. The heights for each member of the sample are listed in the second column. The mean of each sample is listed in the third column. We can make some simple, but significant observations about sampling error here, when the mean height of a random sample of 2 players is used to estimate the population mean height. Now, we’ve said before, that it is unlikely that a sample will produce the exact mean as the population (remember, here it is 80). We see in fact, that in only 1/10 or 10% of the samples does the sample mean x-bar equal the population mean mu. This is an example of sampling error. Now let’s look at what would happen if we used samples of size n = 4.
Possible Samples of Size n = 4 From a Population of Size N = 5 There are 5 possible sample means of size 4. Here, none of the sample means of size 4 has a mean equal to the population mean of 80, but in general the sample means are all closer to the population mean, than were the sample means of size 2. Looking at what is going on here, using dotplots will help make this clearer.
Dot plots of the Sampling Distributions for Various Sample Sizes (N = 5) Explain what the dotplots represent. Notice what happens here: As sample size increases, the sample means cluster closer around the population mean. Thus, the larger the sample size, the smaller the sampling error tends to be, in estimating a population mean mu, by a sample mean x-bar. Recall from earlier in the course: bias (consistent over- or under-representation) and precision (how scattered or spread out the values of the sample statistic are). As we increase sample size, we decrease bias and increase precision. Let’s look at the dotplot for samples of size 4: We see that exactly 4 of the 5 possible samples have means within one inch of the population mean. So the probability is 4/5 or 80% that the sampling error made in estimating mu by x-bar, will be 1 inch or less. n = 12/5 = 40%, n = 23/10 = 30%, n = 35/10 = 50%, n = 44.5 = 80%, n = 5100%