Download presentation
Presentation is loading. Please wait.
Published byVirgil Cox Modified over 9 years ago
1
Paul Cornwell March 31, 2011 1
2
Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large. 2
3
How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good? What is a ‘good’ approximation? 3
4
Permits analysis of random variables even when underlying distribution is unknown Estimating parameters Hypothesis Testing Polling 4
5
Performing a hypothesis test to determine if set of data came from normal Considerations ◦ Power: probability that a test will reject the null hypothesis when it is false ◦ Ease of Use 5
6
Problems ◦ No test is desirable in every situation (no universally most powerful test) ◦ Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal) ◦ The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected 6
7
Symmetric Unimodal Bell-shaped Continuous 7
8
Skewness: Measures the asymmetry of a distribution. ◦ Defined as the third standardized moment ◦ Skew of normal distribution is 0 8
9
Kurtosis: Measures peakedness or heaviness of the tails. ◦ Defined as the fourth standardized moment ◦ Kurtosis of normal distribution is 3 9
10
Cumulative distribution function: 10
11
11 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 20 p =.2 -.0014 (.25).3325 (1.5).0434.1283.9999 1.786 n = 25 p =.2.002.3013.0743.1165.0007 2.002 n = 30 p =.2.0235.2786.0363.1065.997 2.188 n = 50 p =.2.0106.209.0496.08310.001 2.832 n = 100 p =.2.005.149.05988.057419.997 4.0055 *from R
12
Cumulative distribution function: 12
13
13 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 5 (a,b) = (0,1) -.236 (-1.2).004 (0).0477.0061.4998.1289 (.129) n = 5 (a,b) = (0,50) -.2340.04785.005824.99 6.468 (6.455) n = 5 (a,b) = (0,.1) -.238-.0008.048.0060.0500.0129 (.0129) n = 3 (a,b) = (0,50) -.397-.001.0468.0124.99 8.326 (8.333) *from R
14
Cumulative distribution function: 14
15
15 parametersKurtosisSkewness% outside 1.96*sd K-S distance Mean Std Dev n = 5 λ = 1 1.239 (6).904 (2).0434.0598.9995.4473 (.4472) n = 10.597.630.045.0421.0005.316 (.316) n = 15.396.515.0464.034.9997.258 (.2581) *from R
16
Find n values for more distributions Refine criteria for quality of approximation Explore meanless distributions Classify distributions in order to have more general guidelines for minimum sample size 16
17
Paul Cornwell May 2, 2011 17
18
Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases Rate of converge depends on underlying distribution What sample size is needed to produce a good approximation from the CLT? 18
19
Real-life applications of the Central Limit Theorem What does kurtosis tell us about a distribution? What is the rationale for requiring np ≥ 5? What about distributions with no mean? 19
20
Probability for total distance covered in a random walk tends towards normal Hypothesis testing Confidence intervals (polling) Signal processing, noise cancellation 20
21
Measures the “peakedness” of a distribution Higher peaks means fatter tails 21
22
Traditional assumption for normality with binomial is np > 5 or 10 Skewness of binomial distribution increases as p moves away from.5 Larger n is required for convergence for skewed distributions 22
23
Has no moments (including mean, variance) Distribution of averages looks like regular distribution CLT does not apply 23
24
α = β = 1/3 Distribution is symmetric and bimodal Convergence to normal is fast in averages 24
25
Heavier-tailed, bell-shaped curve Approaches normal distribution as degrees of freedom increase 25
26
4 statistics: K-S distance, tail probabilities, skewness and kurtosis Different thresholds for “adequate” and “superior” approximations Both are fairly conservative 26
27
27 Distribution∣Kurtosis∣ <.5 ∣Skewness∣ <.25 Tail Prob..04<x<.06 K-S Distance <.05 max Uniform31223 Beta (α=β=1/3)41334 Exponential126458 Binomial (p=.1)1111414332 Binomial (p=.5)411268 Student’s t with 2.5 df NA 1320 Student’s t with 4.1 df 120112
28
28 Distribution∣Kurtosis∣ <.3 ∣Skewness∣ <.15 Tail Prob..04<x<.06 K-S Distance <.02 max Uniform41224 Beta (α=β=1/3)61346 Exponential20178545178 Binomial (p=.1)18317141850 Binomial (p=.5)7112390 Student’s t with 2.5 df NA 13320 Student’s t with 4.1 df 200115
29
Skewness is difficult to shake Tail probabilities are fairly accurate for small sample sizes Traditional recommendation is small for many common distributions 29
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.