Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.

Similar presentations


Presentation on theme: "Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables."— Presentation transcript:

1 Paul Cornwell March 31, 2011 1

2  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large. 2

3  How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good?  What is a ‘good’ approximation? 3

4  Permits analysis of random variables even when underlying distribution is unknown  Estimating parameters  Hypothesis Testing  Polling 4

5  Performing a hypothesis test to determine if set of data came from normal  Considerations ◦ Power: probability that a test will reject the null hypothesis when it is false ◦ Ease of Use 5

6  Problems ◦ No test is desirable in every situation (no universally most powerful test) ◦ Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal) ◦ The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected 6

7  Symmetric  Unimodal  Bell-shaped  Continuous 7

8  Skewness: Measures the asymmetry of a distribution. ◦ Defined as the third standardized moment ◦ Skew of normal distribution is 0 8

9  Kurtosis: Measures peakedness or heaviness of the tails. ◦ Defined as the fourth standardized moment ◦ Kurtosis of normal distribution is 3 9

10  Cumulative distribution function: 10

11 11 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 20 p =.2 -.0014 (.25).3325 (1.5).0434.1283.9999 1.786 n = 25 p =.2.002.3013.0743.1165.0007 2.002 n = 30 p =.2.0235.2786.0363.1065.997 2.188 n = 50 p =.2.0106.209.0496.08310.001 2.832 n = 100 p =.2.005.149.05988.057419.997 4.0055 *from R

12  Cumulative distribution function: 12

13 13 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 5 (a,b) = (0,1) -.236 (-1.2).004 (0).0477.0061.4998.1289 (.129) n = 5 (a,b) = (0,50) -.2340.04785.005824.99 6.468 (6.455) n = 5 (a,b) = (0,.1) -.238-.0008.048.0060.0500.0129 (.0129) n = 3 (a,b) = (0,50) -.397-.001.0468.0124.99 8.326 (8.333) *from R

14  Cumulative distribution function: 14

15 15 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 5 λ = 1 1.239 (6).904 (2).0434.0598.9995.4473 (.4472) n = 10.597.630.045.0421.0005.316 (.316) n = 15.396.515.0464.034.9997.258 (.2581) *from R

16  Find n values for more distributions  Refine criteria for quality of approximation  Explore meanless distributions  Classify distributions in order to have more general guidelines for minimum sample size 16

17 Paul Cornwell May 2, 2011 17

18  Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases  Rate of converge depends on underlying distribution  What sample size is needed to produce a good approximation from the CLT? 18

19  Real-life applications of the Central Limit Theorem  What does kurtosis tell us about a distribution?  What is the rationale for requiring np ≥ 5?  What about distributions with no mean? 19

20  Probability for total distance covered in a random walk tends towards normal  Hypothesis testing  Confidence intervals (polling)  Signal processing, noise cancellation 20

21  Measures the “peakedness” of a distribution  Higher peaks means fatter tails 21

22  Traditional assumption for normality with binomial is np > 5 or 10  Skewness of binomial distribution increases as p moves away from.5  Larger n is required for convergence for skewed distributions 22

23  Has no moments (including mean, variance)  Distribution of averages looks like regular distribution  CLT does not apply 23

24  α = β = 1/3  Distribution is symmetric and bimodal  Convergence to normal is fast in averages 24

25  Heavier-tailed, bell-shaped curve  Approaches normal distribution as degrees of freedom increase 25

26  4 statistics: K-S distance, tail probabilities, skewness and kurtosis  Different thresholds for “adequate” and “superior” approximations  Both are fairly conservative 26

27 27 Distribution∣Kurtosis∣ <.5 ∣Skewness∣ <.25 Tail Prob..04<x<.06 K-S Distance <.05 max Uniform31223 Beta (α=β=1/3)41334 Exponential126458 Binomial (p=.1)1111414332 Binomial (p=.5)411268 Student’s t with 2.5 df NA 1320 Student’s t with 4.1 df 120112

28 28 Distribution∣Kurtosis∣ <.3 ∣Skewness∣ <.15 Tail Prob..04<x<.06 K-S Distance <.02 max Uniform41224 Beta (α=β=1/3)61346 Exponential20178545178 Binomial (p=.1)18317141850 Binomial (p=.5)7112390 Student’s t with 2.5 df NA 13320 Student’s t with 4.1 df 200115

29  Skewness is difficult to shake  Tail probabilities are fairly accurate for small sample sizes  Traditional recommendation is small for many common distributions 29


Download ppt "Paul Cornwell March 31, 2011 1.  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables."

Similar presentations


Ads by Google