Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama
Useful links le.htmlhttp://oak.cats.ohiou.edu/~wallacd1/ssamp le.html pling_dist/ pling_dist/
Sampling distribution In chapter 2 we defined a population parameter as a function of all the population values. Let population consists of N observations then population mean and population standard deviation are parameters For a given population, the parameters are fixed values.
Sampling distribution On the other hand if we draw a sample of size n from a population of size N, then a function of the sample values is called a statistics For example sample mean and sample standard deviation are sample statistics. Since we can draw a large number of samples from the population the value of sample statistic varies from sample to sample
Sampling distribution Since value of a sample statistic varies from sample to sample, the statistic itself is a random variable and has a probability distribution. For Example sample mean is random variable and it has a probability distribution. Example: Start with a toy example Let the population consists of 5 students who took a math quiz of 5 points. Name of the students and corresponding scores are as follows: Name of the studentABCDE Score For this population mean µ = 3.6 and standard deviation σ = 1.02
Sampling distribution Now we repeatedly draw samples of size three from the population of size 5. then the possible samples are 10 as listed below The population parameters are µ = 3.6 and s.d. σ = 1.02 Samplesample Sample values s 1A,B,C2,3,431 2A,B,D2,3,431 3A,B,E2,3, A,C,D2,4, A,C,E2,4, A,D,E2,4, B,C,D3,4, B,C,E3,4,541 9B,D,E3,4,541 10C,D,E4,4,
Sampling distribution X= score of a student in the math quiz Thus we see that the sample mean is a new random variable and has a probability distribution. Question: What is the mean of this random variable and what is its variance? xfP(x) fP( ) Population distribution Sampling distribution of sample mean
Exercise 7.8 Here are some guidelines to solve 1.X= teaching experience of a faculty 2.Write the two columns x and p(x) 3.Total number of samples of size 4 from a population of size 5 is (5 choose 4) = 5 4.List all the 5 samples and compute their sample means. 5.Compute the quantities in part b and c.
Sampling distribution Let N be the size of the population and n be the size of the sample If n/N >.05 And if n/N ≤.05
Sampling distribution of sample mean Theorem Let X be a random variable with population mean µ and population standard deviation σ. If we collect the samples of size n then the new random variable sample mean has the mean same as µ and standard deviation σ/√n We can denote them as follows:
Sampling distribution of sample mean Standard deviation of sample mean decreases as the sample size increases. The mean of the sample remains unaffected with the change in sample size. Sample mean is called an estimator of the population mean. Because whenever population mean is unknown we will use sample mean in place.
Exercise 7.13 X has a large population with µ=60 and σ = 10 Assuming n/N ≤.05, the parameters of sample mean are
Sampling distribution of sample mean P( ) From the above table when we compute the mean and variance They are (complete this with the help of chapter 5 slides)
Sampling distribution of sample mean We have seen that distribution of the sample mean is derived from the distribution of x Thus distribution of x is called parent distribution. The next question is to investigate what is the relationship between the parent distribution and the sampling distribution of.
Sampling distribution of sample mean Let the distribution of x is normal with mean µ and standard deviation σ then it is equivalent to saying that Let the parent population is normal with mean µ and standard deviation σ If we draw a sample of size n from such a population then Mean of that is is equal to the mean of the population µ. Standard deviation of that is is equal to σ/√n The shape of the distribution of is normal whatever be the value of n
Sampling distribution of sample mean If X~ N(µ, σ) then ~ N ((µ, σ/√n) Where n is size of the sample drawn from the population
Central Limit Theorem For a large sample size, the sampling distribution of is approximately normal, irrespective of the shape of the population distribution. What size of the sample is considered to be large? A sample of size ≥ 30 is considered to be large. Useful link: htm
Exercise 7.28 Given that population distribution is skewed to the left. That is X is not distributed as normal. a. When n=400 (i.e. when we repeatedly draw samples of size 400 from the population) and compute the sample mean for all such samples then what would be the distribution of. Answer : since the sample size is large, in such a case the distribution of according to Central Limit theorem will be normal that is ~ N( µ, σ/√400) x
Sampling distribution of sample mean If the random sample comes from a normal population, the sampling distribution of sample mean is normal regardless the size of the sample. If the shape of the parent population is not known or not normal then distribution of sample mean is approximately normal when ever n is large (≥30).(this is central limit theorem) If the shape of the parent population is not known or not normal and sample size is small then we can not say readily about the shape of sample distribution
Estimators Sample mean is an estimator of population mean µ By this we mean when ever value of µ is not available we will use. Sample mean is an unbiased estimator of population mean µ Unbiased estimator means in the long run value of approaches to the true value of µ. In other words expected value of is equal to µ.
Sampling error Recall that for a given population value of µ is fixed and is a variable whose value varies from sample to sample When we use in place of µ some error is inevitable The difference between µ and is called sampling error Sampling error = - µ The sampling error occurs purely due to chance. The chance of being a specific sample being selected. Other type of errors may occur in the estimation : for example error in recording a value or a missing value. Such types of errors are called non-sampling errors
Example of sampling error Now we repeatedly draw samples of size three from the population of size 5. then the possible samples are 10 as listed below The population parameters are µ = 3.6 and s.d. σ = 1.02 Samplesample Sample values Sampling error = -µ 1A,B,C2,3, A,B,D2,3, A,B,E2,3, A,C,D2,4, A,C,E2,4, A,D,E2,4, B,C,D3,4, B,C,E3,4,54.4 9B,D,E3,4, C,D,E4,4,
Example of sampling error Samplesample Sample values Sampling error = -µ 1A,B,C2,3, A,B,D2,3, A,B,E2,3, A,C,D2,4, A,C,E2,4, A,D,E2,4, B,C,D3,4, B,C,E3,4,54.4 9B,D,E3,4, C,D,E4,4, The last column in the above table computes the error in estimation. That is while drawing a sample of size 3 from the given population, if we get say sample number 3, and use the corresponding value to estimate the population mean µ then the error in estimation is -.27 units.
Exercise 7.4 Population consists of six numbers 15,13,8,17, 9,12 a.Population mean = b.Liza selected a sample with n=4 and values 13,8,9,12. sample mean = then sampling error = = c. while calculating sample mean Liza mistakenly entered a 6 in place of 9 in the above sample. That is she entered 13,8,6,12. That is a non-sampling error has occurred. And the sample mean is Total error = sampling error + non-sampling error. Total error = 9.75 – = out of which is the sampling error. Thus non sampling error = (-1.83) = -.75
Exercise 7.49 X= GPA of a student enrolled at a large university X~ N( 3.02,.29) (This x represents the characteristics of whole population of students) That is average GPA of all the students in the population is 3.02 and standard deviation is.29. We draw a sample of size n=20 from this population and compute the sample mean To find P( >3.10) (as asked in part a) To compute such a probability we must know what is the distribution of Since the sample is small but the parent population is normal hence ~ N( 3.02,.29/√20) at this point we convert the probability statement in the form of probability statement in z using the transformation formula P( >3.10) = P(z > ) = P(z > )
Exercise 7.52 X = time spent by a college student in studying /week X~ right skewed ( 8.4, 2.7) that is the population of all college student spend 8.4 hrs/week on the average with a standard deviation of 2.7 hrs. And the distribution is right skewed (i.e. not normal) If we draw a sample of size n=45 students from this population and compute the sample mean then we are asked to find P(8 < <9) To find such a probability we must know the distribution of Though the parent distribution is right skewed, since sample size large, we apply the CLT to conclude that ~ N(8.4, 2.7/√45 ) P(8< <9) = P( < z < )
Population and sample proportions Consider a categorical variable with just two categories. Let the population size be N out of which X falls in category I. Then population proportion of category I = X/N (denoted by p) Thus population proportion p = X/N If we draw a sample of size n from this population and observe that out of n fall in category I then sample proportion of category I = /n (denoted by Thus sample proportion = /n
Population and sample proportions A population consists of 9000 families in a small town. Out of these, 3600 families have their houses insured. Then population proportion of house insured families = p = 3600/9000 =.4 Suppose we drew a sample of size 100 from the above population and observed that 42 families out of 100 have house insurance. Then the sample proportion of the house insured families = 42/100 =.42 Sampling error = - p = =.02
Sampling distribution of
Exercise 7.60 N = 1000, X = 640 Then population proportion p = 640/1000 =.64 n= 40, x = 24 then sample proportion = 24/40 =.60
Exercise 7.70