Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 7 Sampling Distributions

Similar presentations


Presentation on theme: "Chapter 7 Sampling Distributions"— Presentation transcript:

1 Chapter 7 Sampling Distributions

2 Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations are very large. Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference. The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

3 population sample Sampling Techniques Parameters Statistical
Statistics Statistical Procedures inference

4 Sampling Example: A pollster is sure that the responses to his “agree/disagree” question will follow a binomial distribution, but p, the proportion of those who “agree” in the population, is unknown. An agronomist believes that the yield per acre of a variety of wheat is approximately normally distributed, but the mean m and the standard deviation s of the yields are unknown. If you want the sample to provide reliable information about the population, you must select your sample in a certain way!

5 Types of Sampling Methods/Techniques
The sampling plan or experimental design determines the amount of information you can extract, and often allows you to measure the reliability of your inference. Sampling Probability Samples Non-Probability Samples Simple Random Stratified Judgement Chunk Cluster Systematic Quota

6 Simple Random Sampling
Sampling Plan: Simple random sampling is a method of sampling that allows each possible sample of size n an equal probability of being selected.

7 Example There are 89 students in a statistics class. The instructor wants to choose 5 students to form a project group. How should he proceed? Give each student a number from 01 to 89. Choose 5 pairs of random digits from the random number table. If a number between 90 and 00 is chosen, choose another number. The five students with those numbers form the group.

8 Other Sampling Techniques
There are several other sampling plans that still involve randomization: Stratified random sample: Divide the population into subpopulations or strata and select a simple random sample from each strata. Cluster sample: Divide the population into subgroups called clusters; select a simple random sample of clusters and take a census of every element in the cluster. 1-in-k systematic sample: Randomly select one of the first k elements in an ordered population, and then select every k-th element thereafter.

9 Examples Divide West Malaysia into states and
take a simple random sample within each state. Divide West Malaysia into states and take a simple random sample of 5 states. Divide a city into city blocks, choose a simple random sample of 10 city blocks, and interview all who live there. Choose an entry at random from the phone book, and select every 50th number thereafter.

10 Non-Random Sampling Plans
There are several other sampling plans that do not involve randomization. They should NOT be used for statistical inference! Convenience sample: A sample that can be taken easily without random selection. People walking by on the street Judgment sample: The sampler decides who will and won’t be included in the sample. Quota sample: The makeup of the sample must reflect the makeup of the population on some selected characteristic. Race, ethnic origin, gender, etc.

11 Types of Samples Sampling can occur in two types of practical situations: Observational studies: The data existed before you decided to study it. Watch out for Nonresponse: Are the responses biased because only opinionated people responded? Undercoverage: Are certain segments of the population systematically excluded? Wording bias: The question may be too complicated or poorly worded.

12 Types of Samples Sampling can occur in two types of practical situations: 2. Experimentation: The data are generated by imposing an experimental condition or treatment on the experimental units. Hypothetical populations can make random sampling difficult if not impossible. Samples must sometimes be chosen so that the experimenter believes they are representative of the whole population. Samples must behave like random samples!

13 Sampling Distributions
Numerical descriptive measures calculated from the sample are called statistics. Statistics vary from sample to sample and hence are random variables. The probability distributions for statistics are called sampling distributions. In repeated sampling, they tell us what values of the statistics can occur and how often each value occurs.

14 Sampling Distributions
Definition: The sampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. Population: 3, 5, 2, 1 Draw samples of size n = 3 without replacement Possible samples 3, 5, 2 3, 5, 1 3, 2, 1 5, 2, 1 Each value of x-bar is equally likely, with probability 1/4 x p(x) 1/4

15 1. Sampling Distribution of the Mean…
A fair die is thrown infinitely many times, with the random variable X = # of spots on any throw. The probability distribution of X is: …and the mean and variance are: x 1 2 3 4 5 6 P(x) 1/6

16 Sampling Distribution of Two Dice
A sampling distribution is created by looking at all samples of size n=2 (i.e. two dice) and their means… While there are 36 possible samples of size 2, there are only 11 values for , and some (e.g =3.5) occur more frequently than others (e.g =1).

17 Sampling Distribution of Two Dice…
The sampling distribution of is shown below: P( ) 6/36 5/36 4/36 3/36 2/36 1/36

18 Example Thrown two fair dice. Based on all possible samples, the calculation of mean and standard deviation can also be done as.

19 Sampling Distribution of the Mean
6

20 Sampling Distribution of the Mean
Notice that is smaller than The larger the sample size the smaller Therefore, tends to fall closer to m, as the sample size increases.

21 Central Limit Theorem Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite m and standard deviation s , then, when n is large, the sampling distribution of the sample mean is approximately normally distributed, with mean m and standard deviation The approximation becomes more accurate as n becomes large.

22 How Large is Large? If the population is normal, then the sampling distribution of will also be normal, no matter what the sample size. When the population is approximately symmetric, the distribution becomes approximately normal for relatively small values of n. When the population is skewed, the sample size must be at least 30 before the sampling distribution of becomes approximately normal.

23 The Sampling Distribution of the Sample Mean
A random sample of size n is selected from a population with mean m and standard deviation s. The sampling distribution of the sample mean will have mean m and standard deviation If the original population is normal, the sampling distribution will be normal for any sample size. If the original population is nonnormal, the sampling distribution will be normal when n is large. The standard deviation of x-bar is sometimes called the STANDARD ERROR (SE).

24 Sampling Distribution of the Mean
Central Limit Theorem Given population with and the sampling distribution will have: Mean Variance Standard Deviation = Standard Error (mean) As n increases, the shape of the distribution becomes normal (whatever the shape of the population)

25 Example

26 Finding Probabilities for the Sample Mean
If the sampling distribution of is normal or approximately normal, standardize or rescale the interval of interest in terms of Find the appropriate area using Z Table. Example: A random sample of size n = 16 from a normal distribution with m = 10 and s = 8.

27 Example A soda filling machine is supposed to fill cans of
soda with 12 fluid ounces. Suppose that the fills are actually normally distributed with a mean of 12.1 oz and a standard deviation of 0.2 oz. What is the probability that the average fill for a 6-pack of soda is less than 12 oz?

28 Exercise The time that the laptop’s battery pack can function before recharging is needed is normally distributed with a mean of 6 hours and standard deviation of 1.8 hours. A random sample of 25 laptops with a type of battery pack is selected and tested. What is the probability that the mean until recharging is needed is at least 7 hours?

29 The characteristics of the sampling distribution of a statistic:
The distribution of values is obtained by means of repeated sampling The samples are all of size n The samples are drawn from the same population

30 2. Sampling Distribution of a Proportion…
The proportion in the sample is denoted "p-hat" The proportion in the population (parameter) is denoted p

31 Types of response variables
Quantitative Sums Averages Response type Categorical Counts Proportions Prior chapters have focused on quantitative response variables. We now focus on categorical response variables.

32 The Sampling Distribution of the Sample Proportion
The Central Limit Theorem can be used to conclude that the binomial random variable X with mean np and standard deviation is approximately normal when n is large. The sample proportion, is simply a rescaling of the binomial random variable X, dividing it by n. From the Central Limit Theorem, the sampling distribution of will also be approximately normal, with a rescaled mean and standard deviation.

33 Approximating Normal from the Binomial
Under certain conditions, a binomial random variable has a distribution that is approximately normal. When n is large, and p is not too close to zero or one, areas under the normal curve with mean np and standard deviation can be used to approximate binomial probabilities. Make sure that np and n(1-p) are both greater than 5 to avoid inaccurate approximations!

34 The Sampling Distribution of the Sample Proportion
A random sample of size n is selected from a binomial population with parameter p. The sampling distribution of the sample proportion, will have mean p and standard deviation If n is large, and p is not too close to zero or one, the sampling distribution of will be approximately normal. The standard deviation of p-hat is sometimes called the STANDARD ERROR (SE) of p-hat.

35 Finding Probabilities for the Sample Proportion
If the sampling distribution of is normal or approximately normal, standardize or rescale the interval of interest in terms of Find the appropriate area using Z Table. If both np > 5 and np(1-p) > 5 Example: A random sample of size n = 100 from a binomial population with p = 0.4.

36 Example The soda bottler in the previous example claims
that only 5% of the soda cans are underfilled. A quality control technician randomly samples 200 cans of soda. What is the probability that more than 10% of the cans are underfilled? n = 200 U: underfilled can p = P(U) = 0.05 q = 0.95 np = 10 nq = 190 This would be very unusual, if indeed p = .05! OK to use the normal approximation

37 Example

38 Sampling Distribution of the Difference Between Two Averages
Theorem : If independent sample of size n1 and n2 are drawn at random from two populations,with means 1 and 2 and variances 12 and 22, respectively, then the sampling distribution of the differences of means, is approximately normally distributed with mean and variance given by

39 Example Starting salaries for MBA grads at two universities are normally distributed with the following means and standard deviations. Samples from each school are taken… University 1 University 2 Mean RM 62,000 /yr RM 60,000 /yr Std. Dev. RM14,500 /yr RM18,300 /yr sample size, n 50 60 What is the sampling distribution of What is the probability that a sample mean of U1 students will exceed the sample mean of U2 students?

40 Example mean: = 62,000 – 60,000 =2000 and standard deviation: =
=

41 Sampling Distribution of the Difference Between Two Sample Proportions
Theorem : If independent sample of size n1 and n2 are drawn at random from two populations, where the proportions of obs with the characteristic of interest in the two populations are p1 and p2 respectively, then the sampling distribution of the differences between sample proportions, is approximately normally distributed with mean and variance given by and Then

42 Sampling Distribution of the Difference Between Two Sample Proportions
Example : It is known that 16% of the households in Community A and 11% of the households in Community B have internets in their houses. If 200 households and 225 households are selected at random from Community A and Community B respectively, compute the probability of observing the difference between the two sample proportions at least 0.10?

43 Sampling Distribution of S2
Theorem : If S2 is the variance of a random sample of size n taken from a normal population having the variance 2, then the statistic has a chi-squared distribution with v = n -1 degrees of freedom.


Download ppt "Chapter 7 Sampling Distributions"

Similar presentations


Ads by Google