Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7 Sampling and Sampling Distributions

Similar presentations


Presentation on theme: "Lecture 7 Sampling and Sampling Distributions"— Presentation transcript:

1 Lecture 7 Sampling and Sampling Distributions

2 N= Population size, can be very large or even infinite
Population = The complete set of all items about which information is desired. N= Population size, can be very large or even infinite Parameter = It is a specific characteristic of population like mean, variance, standard deviation If the data set is entire population, then population mean is: The variance of the population is Collect Data: Sampling In finance, economics or any other area of concern it is usually impossible to access entire population data, mainly because of money and time restrictions. Sample = It is an observed SUBSET of POPULATION. n= Sample size such that n< N Random Sampling = The procedure to select “n” objects from the population “N” with equal chance (probability) of selection for each member of the population. Sample Statistics = It is a specific characteristic of a sample!! If the data set is from a sample, then the sample mean and variance are followings:

3 Sampling Distributions of SAMPLE MEANS
Different samples may result different sample means. Example: Lets consider the following population: 1, 2, 3, 4. N=4 and Lets consider all possible sample of size 2: 4C2 = 4!/[2!(4-2)!] = 6 is the total number of possible samples Sample 1: 1, 2  Mean of first sample: Sample 2: 1, 3  Mean of first sample: Sample 3: 1, 4  Mean of first sample: Sample 4: 2, 3  Mean of first sample: Sample 5: 2, 4  Mean of first sample: Sample 6: 3, 4  Mean of first sample: See Different samples may result different sample means!! Each of the sample has equal chance of occurrence, so the selection probability of each sample is (1/6)

4 Different samples may result different sample means
Different samples may result different sample means!! Lets see what is the average of the sample means i.e. average of We can generalize this result as: Sample Sample mean Probability 1,2 1.5 1/6 1,3 2 1,4 2.5 2,3 2,4 3 3,4 3.5

5 What is the variance of the sample means:
What is the relation between and : If the population size is small than Here ( N-n / N-1 ) is the correction factor for finite population. If the population size is large than In our example

6 What is the distribution of Sample Mean
Consider our example: Lets look at figure at the right hand side, we see that the distribution of the sample mean is symmetric around the mean, (looks similar to Normal Distribution !!!) In general when population size is large we learn that Sample Sample mean Probability 1,2 1.5 1/6 1,3 2 1,4 2.5 2,3 2,4 3 3,4 3.5

7 A Bunch of Proves and Central Limit Theorem:
Lets consider a population composed of elements: X1,, X2,, …, XN with mean and variance 1) When we pick up a RANDOM sample of ‘n’ which is: X1,, X2,, …, Xn These X random variables are INDEPENDENT of each other!! Sample mean is actually nothing but a linear combination independent random variables So: 2) If a population composed of elements: X1,, X2,, …, XN with mean , and variance, 2, and distributed Normally, Then ~N(, 2/n)

8 3)Central Limit Theorem (CLT) : Generalizes this property.
IF SAMPLE SIZE “n” is LARGE (n>=30), then ~ N(, 2/n) See: The real distribution of X does not have to be known neither it does not have to be Normal. If n is larger then ~ N(, 2/n)

9 How we benefit from CLT? if n is larger then ~ N(, 2/n)
Example1: The weights of people traveling by air in some region have the mean of 163 pounds and the standard deviation of 18 pounds. What is the probability that the average weight of 36 person will be greater then 167 pounds? Information about population:  = 163 pounds, 2= 18 pounds n=36 > 30  CLT then ~ N( = 163, 2/n= 182 /36)

10 ACCEPTANCE INTERVALS When we observe sample mean: We know that it comes from Normal distribution when n is large ~N(, 2/n) So we can use “Empirical Rule” EMPIRICAL RULE: For many LARGE populations empirical rule provides following approximations, (In our case with mean and standard deviation ) Approximately 68% of the observations are in the interval: Approximately 95% of the observations are in the interval: ****Almost all of the observations are in the interval: If we consider the third rule it says that: will be in the interval of with almost 100% probability. For Normal Distribution we can find EXACT boundaries of the confidence intervals !!

11 Confidence Intervals Example: Lets consider that we are informed that the health insurance claims have historical mean of $4000 and standard deviation $2000. You take a random sample of 100. What are the 95% confidence interval for the sample mean? Interpret the result.

12 = $4000 , =$2000 Here we will find 95 % confidence interval. The (1-)% confidence interval is equal to in general: Here is the Standard normal table values when the upper tail probability is /2. In our case =1-0.95=0.05/2=0.025 Thus P(-z ≤ Z ≤ z) = 0.95 here z=1.96 and Thus with 95%probability (confidence) we can say that the sample mean lies between

13 Sampling Distributions of Sample Variance
The variance of the population is The sample variance is: If “n” is small proportion of “N” i.e. (n/N) is small i.e N is large Then :

14 CONFIDENCE INTERVALS The (1-)% confidence interval for sample mean:
it means NOTE: we consider that “n” observations are taken from NORMALLY distributed POPULATION The (1-)% confidence interval for population mean: it means here if we know then If we know population standard deviation ( i.e. if we know population standard deviation, ), then plug it into CONFIDENCE INTERVAL!!!! And USE standard NORMAL table to find If we do NOT know then we can use SAMPLE VARIANCE, s2, as an estimator of population variance. As we know s2 is an consistent estimator i.e. s2 

15 If we do NOT know and use SAMPLE VARIANCE, s2, as an estimator, then we do NOT use standard Normal distribution but “student’s t” distribution with (n-1) degree of freedom to find NOTE: we consider that “n” observations are taken from NORMALLY distributed POPULATION. We cannot use N(0,1) table since population variance is NOT known Some Properties of Student’s t distribution It is symmetric around mean “0” It approximates to Normal distribution as n increases (specifically if n>30)

16 Examples Example 8.3 from textbook: (if we know population variance)
Suppose that shopping times for customers at a local grocery store are normally distributed. A random sample of 16 shoppers in the local grocery store had a mean of 25 minutes. Assume =6 minutes. Find the standard error of the sample mean, margin of error, and width for a 95 % confidence interval for the population mean. Standard Error = Standard Deviation Standard Error of sample mean = Margin of Error = Width of the 95% confidence interval = 2* Margin of Error = 2*(2.94) =5.88 95 % confidence interval is:

17 Example 8.5 from textbook: (if we do NOT know population variance)
Gasoline prices rose drastically during the early years of this century. Suppose that a recent study was conducted using truck drivers with equivalent years of experience to test run 24 trucks of a particular model over the same high way. Estimate the population mean fuel consumption for this truck model with 90%confidence if the fuel consumption, in miles per gallon, for these 24 trucks was: 15.5, 21, 18.5, 19.3, 19.7, …., 21.8 Here what we know about population? Nothing, we do not know population variance So we n=24, we will use sample variance to estimate population variance. Note: we should assume that population is Normal. How we can test this assumption?


Download ppt "Lecture 7 Sampling and Sampling Distributions"

Similar presentations


Ads by Google