Chapter 13 Sampling Distributions
Sampling Distributions Summary measures such as , s, R, or proportion that is calculated for sample data is called a sample statistic. To obtain the sampling distribution of a statistic, one must take all possible samples of a given size and calculate the value of the statistic of interest for each sample.
Example 13.1 A class has 6 students that have just purchased new laptops. They paid the following: $1800 $2100 $2400 $1200 Let x denote the cost of computer. m = $1950.00 s= $377.49 N=6 What if we sample 3 students at random? What could we expect?
Sample R s 1 1800 2100 2400 600 300.00 2 1200 1700 900 458.26 3 2000 300 173.21 4 5 600.00 6 7 8 9 10 11 1900 624.50 12 2200 13 14 519.62 15 16 0.00 17 18 19 20 Mean 1950 690 367.32 Std. Dev. Pop. 168.82 368.65 189.94 Sigma 377.49 292.40 407.56 398.69
Sampling Errors Different samples selected from the same population will give different results because they contain different elements. The difference between a sample statistic obtained from a sample and the value of the same parameter from the population is called the sampling error. Sampling error = x – m All sampling errors occur because of chance. Other errors do occur, these are called nonsampling errors.
Nonsampling Error Non-sampling errors occur because of human mistakes, not chance. Common causes are: Sample is nonrandom Question phrased so that not fully understood Respondents intentionally give false information Polltaker mistaking records or keys wrong answer.
How do errors look? Sampling Error Nonsampling Error In our computer example m = $1950.00 Say we got sample of $2100, $1200, $2100… then x=$1800 So our sampling error: 1800-1950= -$150 Nonsampling Error Same example… but pollster writes down $2100, $1500, $2100 Then x=$1900 Even though this is closer to the population mean… Sampling error is still -$150, but nonsampling error is -$100
Population vs. Sample m, s Remember that in the whole population: m = $1950.00 s= $377.49 While in our samples: m x= $1950.00 sx= $168.82 The mean of the sampling distribution (of a specific size) is the same as the mean of the population. Thus x is called an estimator of the population mean. The standard deviation of a sample will always smaller than the population, as long as sample size >1.
Sample Mean Distribution Population
Sample Mean Distribution Take into account the size of the sample vs. the population in calculating s x In Example 13.3, n/N is more than 5%, so… if Finite population correction factor
Sample Range Distribution 2 1.128 3 1.693 4 2.059 5 2.326 6 2.534 7 2.704 8 2.847 9 2.970 10 3.078 11 3.173 12 3.258 13 3.336 14 3.407 15 3.472 20 3.735 25 3.931
Sample Std Deviation Distribution c4 2 0.7979 3 0.8862 4 0.9213 5 0.9400 6 0.9515 7 0.9594 8 0.9650 9 0.9693 10 0.9727 11 0.9754 12 0.9776 13 0.9794 14 0.9810 15 0.9823 20 0.9869 25 0.9896
Sampling Distribution of a Sample Proportion Example: You ask 10 classmates if they have change for a dollar, so you can buy a Jolt Cola before class. 4 people had change for a dollar. We denote the sample proportion using the symbol p. p = number of people with change for a dollar = 4 = .4 number of people asked 10 Unlike x, the sampling distribution of p follows the binomial distribution. It must meet the following conditions: There are n identical trials. Each performed under identical conditions. Each trial has two and only two mutually exclusive events (outcomes). Usually called a success and a failure. Probability of success is denoted by p and failure by q. p+q=1. Probabilities of p and q remain constant throughout trial. The trials are independent. The outcome of one trial does not affect the outcome of another. ^ ^ ^
Sample Proportion Distribution LSL USL Population x p
Sample Proportion Distribution 1 n1 D1 p1 2 n2 D2 p2 . k nk Dk pk n D
Central-Limit Theorem (CLT) Central Limit Theorem (CLT) states that irrespective of the underlying distribution of a population (with mean μ and standard deviation of σ), taking a number of samples of size n from the population, then the sample mean distribution follow a normal distribution with a mean of μ and a standard deviation of . The normality gets better as your sample size n increases.
Central-Limit Theorem (CLT) Central Limit Theorem: For a large sample size (N≥30), the shape of the sample mean distribution, is approximately normal. Also, the shape of the sampling distribution of p is approximately normal for a sample for which np≥10 and nq≥10. Sampling distribution does not become a normal distribution when n becomes 30, instead it takes on a shape that is close to a normal curve. http://gaussianwaves.blogspot.com/2010/01/central-limit-theorem.html
Central-Limit Theorem
Example 13.9 Factory produces a new synthetic motor oil for older cars that lasts longer than traditional motor oils. The amount of time oil should last follows a distribution with a mean of 4800 miles and standard deviation of 300 miles A random sample of 35 older vehicles were tested What is the approximate probability that the average distance traveled between oil changes will exceed 4900 miles? SOLUTION: