Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.

Slides:



Advertisements
Similar presentations
Chapter 7 Sampling and Sampling Distributions
Advertisements

Estimation of Means and Proportions
Chapter 6 Sampling and Sampling Distributions
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
© 2011 Pearson Education, Inc
Statistics for Managers Using Microsoft® Excel 5th Edition
Sampling Distributions
Ka-fu Wong © 2003 Chap 8- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Sampling Methods and the Central Limit Theorem Chapter 8.
Chapter 7 Introduction to Sampling Distributions
Chapter 7 Sampling Distributions
Chapter 6 Introduction to Sampling Distributions
Chapter 7 Sampling and Sampling Distributions
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson8-1 Lesson 8: One-Sample Tests of Hypothesis.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Seven Sampling Methods and Sampling Distributions GOALS When you.
Sampling Methods and Sampling Distributions Chapter.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Chapter 7 Estimation: Single Population
Inferences About Process Quality
McGraw-Hill-Ryerson © The McGraw-Hill Companies, Inc., 2004 All Rights Reserved. 7-1 Chapter 7 Chapter 7 Created by Bethany Stubbe and Stephan Kogitz.
Review of Probability and Statistics
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
BCOR 1020 Business Statistics
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-1 Additional real-life examples (proportions) Supplement 8: Additional real-life examples.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 6 Sampling and Sampling.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Chapter 5 Sampling Distributions
Introduction Parameters are numerical descriptive measures for populations. For the normal distribution, the location and shape are described by  and.
1 Ch6. Sampling distribution Dr. Deshi Ye
Sampling: Theory and Methods
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin The Normal Probability Distribution and the Central Limit Theorem Chapter 7&8.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
STAT 111 Introductory Statistics Lecture 9: Inference and Estimation June 2, 2004.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Ch 8 ==> Statistics Is Fun!
Ka-fu Wong © 2003 Chap 8- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Longwood University 201 High Street Farmville, VA 23901
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
8- 1 Chapter Eight McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Sampling Methods and Sampling Distributions
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Week 6 October 6-10 Four Mini-Lectures QMM 510 Fall 2014.
Chapter Eight McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Sampling Methods and the Central Limit Theorem.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 7-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Discrete Probability Distributions Define the terms probability distribution and random variable. 2. Distinguish between discrete and continuous.
Discrete Probability Distributions Define the terms probability distribution and random variable. 2. Distinguish between discrete and continuous.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Sampling Methods and the Central Limit Theorem Chapter 8.
Basic Business Statistics
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
8- 1 Chapter Eight McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Chapter 6 Sampling and Sampling Distributions
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Sampling Methods and the Central Limit Theorem
Lecture 7 Sampling and Sampling Distributions
Presentation transcript:

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-2 Outline Point estimate Why sample the population? Probability sampling Choice of sampling method: Sampling straws Sampling distribution of the sample means Probability histograms and empirical histograms Central Limit Theorem Normal approximation to Binomial

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-3 Making statements about a population by examining sample results Inferential Statistics Inference Population parameters (unknown, but can be estimated from sample evidence) Sample statistics (known)

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-4 Inferential Statistics Estimation Hypothesis Testing

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-5 Point Estimates Examples of point estimates are the sample mean, the sample standard deviation, the sample variance, the sample proportion. A point estimate is one value ( a single point ) that is used to estimate a population parameter.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-6 Estimating the percentage of Earth covered by water Experiments: Paint a dot on your thumb. Catch the globe and tell me whether the dot on your thumb lands on water. Estimate the percentage of Earth covered by water by the average of all trials. Idea: If we draw many observations with replacement, the sample average will approach the population proportion. Code water as 1 and land as 0, the sample average will be an estimate of the proportion will be the percentage of Earth covered by water. Truth: Water covers 71% of the Earth's surface. e.g.,

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-7 Why Sample the Population? The physical impossibility of checking all items in the population. The cost of studying all the items in a population. The sample results are usually adequate. Contacting the whole population would often be time- consuming. The destructive nature of certain tests.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-8 Probability Sampling A probability sample is a sample selected such that each item or person in the population being studied has a known likelihood of being included in the sample.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-9 Methods of Probability Sampling Simple Random Sample: A sample formulated so that each item or person in the population has the same chance of being included. Systematic Random Sampling: The items or individuals of the population are arranged in some order. A random starting point is selected and then every k-th member of the population is selected for the sample.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-10 Methods of Probability Sampling Stratified Random Sampling: A population is first divided into subgroups, called strata, and a sample is selected from each stratum. Stratification is the process of grouping members of the population into relatively homogeneous subgroups before sampling. The strata should be mutually exclusive : every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-11 Methods of Probability Sampling Cluster Sampling: A population is first divided into primary units then samples are selected from the primary units. Cluster sampling is an example of 'two-stage sampling' or 'multistage sampling': in the first stage a sample of areas is chosen; in the second stage a sample of respondent within those areas is selected. This can reduce travel and other administrative costs. It also means that one does not need a sampling frame for the entire population, but only for the selected clusters.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-12 Independent identically distributed (iid) “ random draws from any population, with replacement ” are independent identically distributed (i.i.d.). Independent: the probability of drawing the current observation does not depend on what has been drawn previously. Identically distributed: the probability of drawing the current observation is the same as what has been drawn previously and what will be drawn in the future. Most of the things covered in this Lesson holds even when we do not have iid observations.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-13 Choice of sampling method -- “ Sampling Straws ” Choice of sampling method is important. An exercise of “ Sampling Straws ” experiments will illustrate that some sampling method can produce a biased estimate of the population parameters. The bag contain a total of 12 straws, 4 of which are 4 inches in length, 4 are 2 inches long, and 4 are 1 inch long. The population mean length is 2.33 (=4*(1+2+4)/12) Randomly draw 4 straws one by one with replacement. Compute the sample mean. The average of the sample means of experiments is generally larger than 2.33.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-14 Choice of sampling method -- “ Sampling Straws ” The sample scheme is biased because the longer straws have a higher chance of being drawn, if the draw is truly random (say, draw your first touched straw). The draw may not be random because we can feel the length of the straw before we pull out the straw.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-15 Choice of sampling method -- “ Sampling Straws ” Alternative sampling scheme: Label the straws 1 to 12. Label 12 identical balls 1 to 12. Draw four balls with replacement. Measure the corresponding straws and compute the sample mean

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-16 Choice of sampling method -- “ Telephone interview ” Suppose we are interested in estimating unemployment rate by a phone survey. 1.Interview a group selected based on a random sample of mobile phone numbers. 2.Interview a group selected based on a random sample of residential phone numbers. 3.Interview a group selected based on a random sample of mobile and residential phone numbers. Which sampling method will yield a good estimate of the population unemployment rate?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-17 Non-Probability Sampling In nonprobability sample, whether an observation is included in the sample is based on the judgment of the person selecting the sample. The sampling error is the difference between a sample statistic and its corresponding population parameter. Sampling error is almost always nonzero.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-18 Property of sample means Unbiasedness Consistency Central Limit Theorem

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-19 Unbiasedness A point estimator is said to be an unbiased estimator of the parameter  if the expected value, or mean, of the sampling distribution of is , Examples: The sample mean is an unbiased estimator of μ The sample variance is an unbiased estimator of σ 2 The sample proportion is an unbiased estimator of P

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-20 Bias Let be an estimator of  The bias in is defined as the difference between its mean and  The bias of an unbiased estimator is 0

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-21 Sample mean is an unbiased estimator of  Let x 1, …., x n be drawn from the population of mean  with replacement. (i.e., X 1,…,X n are i.i.d.) Sample mean m = (x 1 +…. + x n )/n E(m) = E[ (x 1 +…. + x n )/n ] = [ E(x 1 ) +…. +E(x n ) ]/n = [  + … +  ]/n =  m is an unbiased estimator (with zero bias) of the population mean.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-22 Sample mean is a biased estimator of  Let x 1, …., x n be drawn from the population of mean  with replacement. (i.e., X 1,…,X n are i.i.d.) Sample mean m = (x 1 +…. + x n +100)/n E(m) = E[ (x 1 +…. + x n +100)/n ] = [ E(x 1 ) +…. +E(x n )+100]/n = [  + … +  +100]/n =  + 100/n m is a biased estimator (with upward bias) of the population mean. m is asymptotically unbiased. That is, bias approaches zero as n increases.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-23 Consistency A consistent estimator is an estimator that converges in probability to the quantity being estimated as the sample size grows.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-24 Consistency Let x 1, …., x n be drawn from the population of mean  with replacement. (i.e., X 1,…,X n are i.i.d.) Sample mean m = (x 1 +…. + x n )/n E(m) =  Var(m) = Var[ (x 1 +…. + x n )/n ] =[ Var(x 1 )+…. + Var(x n )] /n 2 =Var(x)/n Hence, m will approach  as n increases.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-25 Consistency Let x 1, …., x n be drawn from the population of mean  with replacement. (i.e., X 1,…,X n are i.i.d.) Sample mean m = (x 1 +…. + x n +100)/n E(m) =  +100/n converges to  as n increases Var(m) = Var[ (x 1 +…. + x n + 100)/n ] =[ Var(x 1 )+…. + Var(x n )] /n 2 =Var(x)/n Hence, m will approach  as n increases.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-26 Most Efficient Estimator Suppose there are several unbiased estimators of  The most efficient estimator or the minimum variance unbiased estimator of  is the unbiased estimator with the smallest variance Let and be two unbiased estimators of , based on the same number of sample observations. Then, is said to be more efficient than if The relative efficiency of with respect to is the ratio of their variances:

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-27 Sampling Distribution of the Sample Means The sampling distribution of the sample mean is a probability distribution consisting of all possible sample means of a given sample size selected from a population.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-28 EXAMPLE 1 The law firm of Hoya and Associates has five partners. At their weekly partners meeting each reported the number of hours they billed clients for their services last week. The population mean is 25.2 hours.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-29 Example 1 If two partners are selected randomly, how many different samples are possible? This is the combination of 5 objects taken 2 at a time. That is: There are a total of 10 different samples.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-30 Example 1 continued

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-31 EXAMPLE 1 continued Organize the sample means into a frequency distribution. The mean of the sample means is 25.2 hours. The mean of the sample means is exactly equal to the population mean. RV

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-32 Example 1 Population variance = [ ( ) 2 +( ) 2 + … + ( ) 2 ] / 5 = 8.96 Variance of the sample means: =[ (1)( ) 2 +(4)( ) 2 + (3)( ) 2 + (2)( ) 2 ] / ( ) = 3.36 The variance of sample means < variance of population variance 3.36/8.96 = < 1/2 The ratio is different from 1/2 (i.e., 1/n) because it is like sampling without replacement.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-33 Example Suppose we had a uniformly distributed population containing equal proportions (hence equally probable instances) of (0,1,2,3,4). If you were to draw a very large number of random samples from this population, each of size n=2, the possible combinations of drawn values and the sums are SumsCombinations 00,0 10,1 1,0 21,1 2,0 0,2 31,2 2,1 3,0 0,3 41,3 3,1 2,2 4,0 0,4 51,4 4,1 3,2 2,3 63,3 4,2 2,4 73,4 4,3 84,4 Note that this is sampling with replacement.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-34 Example Population mean = mean of sample means Population mean = ( )/5=2 Mean of sample means = [ (1)(0) + (2)(0.5) + …+(1)(4) ] / 25 = 2 Variance of sample means = Population variance/ sample size Population variance =(0-2) 2 + … + (4-2) 2 / 5 = 2 Variance of sample means =(1)(0-2) 2 +… +(1)(4-2) 2 / 25 =1 MeansCombinations 0.00,0 0.50,1 1,0 1.01,1 2,0 0,2 1.51,2 2,1 3,0 0,3 2.01,3 3,1 2,2 4,0 0,4 2.51,4 4,1 3,2 2,3 3.03,3 4,2 2,4 3.53,4 4,3 4.04,4

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-35 Probability Histograms In a probability histograms, the area of the bar represents the chance of a value happening as a result of the random (chance) process Empirical histograms (from observed data) for a process converge to the probability histogram

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-36 Examples of empirical histogram Roll a fair die: 50, 200 times 50 times 200 times The empirical histogram will approach the probability histogram as the number of draws increase.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-37 Empirical histogram #1 Two balls in the bag: Draw 1 ball 1000 times with replacement. Plot a relative frequency histogram (empirical probability histogram). 0.5 The empirical histogram looks like the population distribution !!! What is the probability of getting a red ball in any single draw? 0.5

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-38 Empirical histogram #2 5 balls in the bag: Draw 1 ball 1000 times with replacement. Plot a relative frequency histogram (empirical probability histogram). 0.6 The empirical histogram looks like the population distribution !!! What is the probability of getting a red ball in any single draw?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-39 Empirical histogram #3 5 balls in the bag: Draw 1 ball 1000 times with replacement. Plot a relative frequency histogram (empirical probability histogram) The empirical histogram looks like the population distribution !!! What is the probability of getting a “three” in any single draw? What is the expected value (i.e., population mean) of a single draw? 0.2* *1 + … + 0.2*4 = 2 Variance = 0.2*(-2) *(-1) 2 +… +0.2*(2) 2 = 2

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-40 Empirical histogram #3 continued MeansCombinations 0.00,0 0.50,1 1,0 1.01,1 2,0 0,2 1.51,2 2,1 3,0 0,3 2.01,3 3,1 2,2 4,0 0,4 2.51,4 4,1 3,2 2,3 3.03,3 4,2 2,4 3.53,4 4,3 4.04,4 5 balls in the bag: Draw 2 balls 1000 times with replacement. Compute the sample mean. Plot a relative frequency histogram (empirical probability histogram) of the 1000 sample means All combinations are equally likely.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-41 Empirical histogram #3 continued 5 balls in the bag: Draw 2 ball 1000 times with replacement. Compute the sample mean. Plot a relative frequency histogram (empirical probability histogram) of the 1000 sample means What is the probability of getting a sample mean of 2.5 in any single draw? 0.16 What is the expected sample mean of a single draw? 0.04* *0.5 +… *4 = 2 Variance of sample mean = 0.04*(-2) *(-1.5) 2 + … *(2) 2 = 1

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-42 Empirical histogram of sum Roll a fair die 20 times and sum the outcome of the 20 rolls

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-43 Empirical histogram of average Roll a fair die 20 times and average rolls

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-44 Distribution of Sample means of different sample sizes and from different population distribution and choose basic and distribution of mean.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-45 Central Limit Theorem #1 5 balls in the bag: Draw n (n>30) ball 1000 times with replacement. Compute the sample mean. Plot a relative frequency histogram (empirical probability histogram) of the 1000 sample means The Central Limit Theorem says 1.The empirical histogram looks like a normal density. 2.Expected value (mean of the normal distribution) = mean of the original population mean = 2. 3.Variance of the sample means = variance of the original population /n = 2/n.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-46 Central Limit Theorem #2 Some unknown number of numbered balls in the bag: We know only that the population mean is  and the variance is  2. Draw n (n>30) ball 1000 times with replacement. Compute the sample mean. Plot a relative frequency histogram (empirical probability histogram) of the 1000 sample means The Central Limit Theorem says 1.The empirical histogram looks like a normal density. 2.Expected value (mean of the normal distribution) = . 3.Variance of the sample means =  2 /n. ? ?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-47 Confidence interval #1 Some unknown number of numbered balls in the bag: We know only that the population mean is  and the variance is  The Central Limit Theorem says 1.The empirical histogram looks like a normal density. 2.Expected value (mean of the normal distribution) = . 3.Variance of the sample means =  2 /n. ? ? What is the probability that the sample mean of a randomly drawn sample lies between    /  n ?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-48 Central Limit Theorem For a population with a mean  and a variance  2 the sampling distribution of the means of all possible samples of size n generated from the population will be approximately normally distributed. The mean of the sampling distribution equal to  and the variance equal to  2 /n. The sample mean of n observation The population distribution

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-49 Central Limit Theorem: Sums For a large number of random draws, with replacement, the distribution of the sum approximately follows the normal distribution Mean of the normal distribution is n* (expected value of one random draw) SD for the sum (SE) is This holds even if the underlying population is not normally distributed

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-50 Central Limit Theorem: Averages For a large number of random draws, with replacement, the distribution of the average = (sum)/n approximately follows the normal distribution The mean for this normal distribution is (expected value of one random draw) The SD for the average (SE) is This holds even if the underlying population is not normally distributed

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-51 Law of large numbers The sample mean converges to the population mean as n gets large. For a large number of random draws from any population, with replacement, the distribution of the average = (sum)/n approximately follows the normal distribution The mean for this normal distribution is the (expected value of one random draw) The SD for the average (SE) is SD for the average tends to zero as n increases. This holds even if the underlying population is not normally distributed

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-52 Central Limit Theorem Simulation

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-53 Effect of Sample Size Regardless of the underlying population, the larger the sample size, the more nearly normally distributed is the population of all possible sample means.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-54 Central Limit Theorem If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution. To determine the probability a sample mean falls within a particular region, use:

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-55 Central Limit Theorem If the population does not follow the normal distribution, but the sample is of at least 30 observations, the sample means will follow the normal distribution. To determine the probability a sample mean falls within a particular region, use:

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-56 Example 2 Suppose the mean selling price of a gallon of gasoline in the United States is $1.30. Further, assume the distribution is positively skewed, with a standard deviation of $0.28. What is the probability of selecting a sample of 35 gasoline stations and finding the sample mean within $.08 of the population mean ($1.30)?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-57 Example 2 continued The first step is to find the z-values corresponding to $1.22 (= ) and $1.38 (= ). These are the two points within $0.08 of the population mean.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-58 Example 2 continued Next we determine the probability of a z-value between and It is: We would expect about 91 percent of the sample means to be within $0.08 of the population mean.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-59 Sampling Distribution of Sample Proportion If a random sample of size n is taken from a population then the sampling distribution of the sample proportion is Approximately normal, if n is large. Has mean Has standard deviation Approximately normal because the sample proportion is a simple average of zeros and ones from difference trials.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-60 The Normal Approximation to the Binomial revisited The normal distribution (a continuous distribution) yields a good approximation of the binomial distribution (a discrete distribution) for large values of n. The normal probability distribution is generally a good approximation to the binomial probability distribution when n  and n(1-  ) are both greater than 5. Recall for the binomial experiment: There are only two mutually exclusive outcomes (success or failure) on each trial. A binomial distribution results from counting the number of successes. Each trial is independent. The probability is fixed from trial to trial, and the number of trials n is also fixed. iid

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-61 The Normal Approximation to the Binomial revisited Recoding: Failure as 0 and success as 1. x/n is simply the proportion of success and hence the simple average of the outcomes from the n trials. x/n will be approximately normal according to CLT. Hence x (=n*x/n) will also be approximately normal according to CLT.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-62 Chi-square distribution: If z 1, z 2, z 3 ….. z  i.i.d.standard normal variables, then X 2 = z 1 2 +z 2 2 +z 3 2 …..z n 2 has a  2 (n) distribution, with n degrees of freedom.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-63 Sample Variance Let x 1, x 2,..., x n be a random sample from a population. The sample variance is The square root of the sample variance is called the sample standard deviation The sample variance is different for different random samples from the same population

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-64 Sampling Distribution of sample Variances The sampling distribution of s 2 has mean σ 2 If the population distribution is normal, then If the population distribution is normal then has a  2 distribution with n – 1 degrees of freedom

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-65 The Chi-square Distribution The chi-square distribution is a family of distributions, depending on degrees of freedom: d.f. = n – 1 Text Table 7 contains chi-square probabilities d.f. = 1d.f. = 5d.f. = 15 22 22 22

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-66 If the mean of these three values is 8.0, then X 3 must be 9 (i.e., X 3 is not free to vary) Degrees of Freedom (df) Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean) Idea: Number of observations that are free to vary after sample mean has been calculated Example: Suppose the mean of 3 numbers is 8.0 Let X 1 = 7 Let X 2 = 8 What is X3?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-67 A commercial freezer must hold a selected temperature with little variation. Specifications call for a standard deviation of no more than 4 degrees (a variance of 16 degrees 2 ). A sample of 14 freezers is to be tested What is the upper limit (K) for the sample variance such that the probability of exceeding this limit, given that the population standard deviation is 4, is less than 0.05? Chi-square Example

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-68 Finding the Chi-square Value Use the the chi-square distribution with area 0.05 in the upper tail: probability α =.05  2 (13) = 22  2 (13) = (α =.05 and 14 – 1 = 13 d.f.) Is chi-square distributed with (n – 1) = 13 degrees of freedom

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson6-69 Chi-square Example So:  2 (13) = (α =.05 and 14 – 1 = 13 d.f.) (where n = 14) so If s 2 from the sample of size n = 14 is greater than 27.52, there is strong evidence to suggest the population variance exceeds 16. or

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson END - Lesson 6: Sampling Methods and the Central Limit Theorem