Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management.

Similar presentations


Presentation on theme: "Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management."— Presentation transcript:

1 Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

2 Agenda Normal Approximation to Binomial Poisson Process Random sampling Sampling statistics and sampling distributions Expected values and standard errors of sample sums and sample means

3 Binomial Random Variable Binomial random variable X is the number of “successes” in n trials, where Probability of success remains the same from trial to trial Trials are independent

4 Binomial Probability Distribution Discrete distribution with: P(X=x) = ( n!/(x!(n-x)!)) p x q n-x n is number of trials x is number of successes in n trials (x = 0, 1, 2,..., n) p is the probability of success on a single trial q is the probability of failure on a single trial

5 Properties of the Binomial RV Mean:  = np Variance:   = npq Standard Deviation: 

6 Binomial(n = 10, p =.4) xP(X=x) 00.006047 10.040311 20.120932 30.214991 40.250823 50.200658 60.111477 70.042467 80.010617 90.001573 100.000105

7 Approximation to Binomial Distribution Use normal distribution when: n is large np > 10 n(1 - p) > 10 Parameters of the approximating normal distribution are the mean and standard deviation from the binomial distribution

8 Approximation of Binomial Distribution n = 80, p =.4

9 How Good is the Approximation? Normal with mean = 32.0000 and standard deviation = 4.38000 x P( X <= x) 28.0000 0.1806 x P( X <= x) 28.5000 0.2121 Binomial with n = 80 and p = 0.400000 x P( X <= x) 28.00 0.2131 P(X < 29)

10 Application 1 The Chicago Equal Employment Commission believes that the Chicago Transit Authority (CTA) discriminates against Republicans. The records show that 37.5% of the individuals listed as passing the CTA exam were Republicans; the remainder were Democrats (no one registers as an independent in Illinois). CTA hired 30 people last year, 25 of them were Democrats. What is the probability that this situation could exist if CTA did not discriminate?

11 Application 1 (cont.) Success: a Republican is hired The probability of success, p = 0.375 The number of trials, n = 30 The number of successes, x = 5 P(x  5) = ???

12 Application 1 (cont.) Mean:  = np = 30*.375 = 11.25 Variance:   = npq = 30*.375*.625 = 7.03 Standard Deviation:  = 2.65 Normal with mean = 11.25 and standard deviation = 2.65 x P( X <= x) 5.5000 0.0150

13 Poisson Process time homogeneity independence no clumping rate xxx 0 time Assumptions

14 Poisson Process Earthquakes strike randomly over time with a rate of = 4 per year. Model time of earthquake strike as a Poisson process Count: How many earthquakes will strike in the next six months? Duration: How long will it take before the next earthquake hits?

15 Count: Poisson Distribution What is the probability that 3 earthquakes will strike during the next six months?

16 Poisson Distribution Count in time period t

17 Minitab Probability Calculation Click: Calc > Probability Distributions > Poisson Enter: For mean 2, input constant 3 Output: Probability Density Function Poisson with mu = 2.00000 x P( X = x) 3.00 0.1804

18 Duration: Exponential Distribution Time between occurrences in a Poisson process Continuous probability distribution Mean =1/ t

19 Exponential Probability Problem What is the probability that 9 months will pass with no earthquake?  t = 1/12, t = 1/3 1/ t = 3

20 Minitab Probability Calculation Click: Calc > Probability Distributions > Exponential Enter: For mean 3, input constant 9 Output: Cumulative Distribution Function Exponential with mean = 3.00000 x P( X <= x) 9.0000 0.9502

21 Exponential Probability Density Function MTB > set c1 DATA > 0:12000 DATA > end Let c1 = c1/1000 Click: Calc > Probability distributions > Exponential > Probability density > Input column Enter: Input column c1 > Optional storage c2 Click: OK > Graph > Plot Enter: Y c2 > X c1 Click: Display > Connect > OK

22 Exponential Probability Density Function

23 Sampling Population - entire set of objects that we are interested in studying Sample - a chosen subset of a population

24 Some Samples Are... random -- each item in the population has an equal chance of being selected to be part of the sample representative -- has the same characteristics as the population under study, a microcosm of the population

25 Population Parameters and Sample Statistics Population Parameter Numerical descriptor of a population Values usually uncertain e.g., population mean (  ), population standard deviation (  ) Sample Statistics Numerical descriptor of a sample Calculated from observations in the sample e.g., sample mean, sample standard deviation S

26 What is a sampling distribution? Sample statistics are random variables Sample statistics have probability distributions “Sampling distribution” is the probability distribution of a sample statistic

27 MTB > Retrieve 'C:\MTBWIN\DATA\RESTRNT.MTW'. Retrieving worksheet from file: C:\MTBWIN\DATA\RESTRNT.MTW Worksheet was saved on 5/31/1994 MTB > info Information on the Worksheet Column Name Count Missing C1 ID 279 0 C2 OUTLOOK 279 1 C3 SALES 279 25 C4 NEWCAP 279 55 C5 VALUE 279 39 C6 COSTGOOD 279 42 C7 WAGES 279 44 C8 ADS 279 44 C9 TYPEFOOD 279 12 C10 SEATS 279 11 C11 OWNER 279 10 C12 FT.EMPL 279 14 C13 PT.EMPL 279 13 C14 SIZE 279 16

28 MTB > desc 'sales' Descriptive Statistics Variable N N* Mean Median TrMean StDev SEMean SALES 254 25 332.6 200.0 248.9 650.5 40.8 Variable Min Max Q1 Q3 SALES 0.0 8064.0 83.7 382.7 MTB > boxp 'sales' * NOTE * N missing = 25

29 MTB > hist 'sales' * NOTE * N missing = 25

30 MTB > let c15 = loge('sales') J *** Values out of bounds during operation at J Missing returned 1 times MTB > let c15 = loge('sales' + 1) MTB > name c15 'logsales' MTB > desc 'logsales' Descriptive Statistics Variable N N* Mean Median TrMean StDev SEMean logsales 254 25 5.1830 5.3033 5.2134 1.1387 0.0715 Variable Min Max Q1 Q3 logsales 0.0000 8.9953 4.4394 5.9500 MTB > boxp 'logsales' * NOTE * N missing = 25

31

32

33 Four Samples of Size 50 From Restaurant “Logsales” Data--Histograms

34 MTB > Desc c16-c19 Descriptive Statistics Variable N N* Mean Median TrMean StDev SEMean C16 43 7 5.246 5.375 5.280 0.867 0.132 C17 43 7 5.351 5.352 5.383 1.223 0.186 C18 48 2 5.366 5.461 5.388 0.888 0.128 C19 43 7 5.244 5.198 5.253 0.937 0.143 Variable Min Max Q1 Q3 C16 2.773 6.621 4.625 5.787 C17 1.099 8.456 4.710 6.176 C18 2.485 7.091 4.961 5.994 C19 3.434 6.868 4.595 6.089 Random Samples from Restaurant “Logsales” Data--Summary

35 Next Time... Central Limit Theorem--”Sample averages are approximately normally distributed”


Download ppt "Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management."

Similar presentations


Ads by Google