Download presentation
Presentation is loading. Please wait.
Published byAndrea George Modified over 9 years ago
1
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal distribution for confidence intervals Normal distribution for p-values Standard normal
2
Statistics: Unlocking the Power of Data Lock 5 Slope :Restaurant tips Correlation: Malevolent uniforms Mean :Body Temperatures Diff means: Finger taps Mean : Atlanta commutes Proportion : Owners/dogs What do you notice? All bell-shaped distributions! Bootstrap and Randomization Distributions
3
Statistics: Unlocking the Power of Data Lock 5 The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution Normal Distribution
4
Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal www.lock5stat.com/StatKey
5
Statistics: Unlocking the Power of Data Lock 5
6
CLT for a Mean Population Distribution of Sample Data Distribution of Sample Means n = 10 n = 30 n = 50
7
Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work For small samples, it is more important that the data itself is approximately normal
8
Statistics: Unlocking the Power of Data Lock 5 Accuracy The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate) The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate) If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close
9
Statistics: Unlocking the Power of Data Lock 5 The normal distribution is fully characterized by it’s mean and standard deviation Normal Distribution
10
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distributions If a bootstrap distribution is approximately normally distributed, we can write it as a)N(parameter, sd) b)N(statistic, sd) c)N(parameter, se) d)N(statistic, se) sd = standard deviation of variable se = standard error = standard deviation of statistic
11
Statistics: Unlocking the Power of Data Lock 5 Hearing Loss In a random sample of 1771 Americans aged 12 to 19, 19.5% had some hearing loss (this is a dramatic increase from a decade ago!) What proportion of Americans aged 12 to 19 have some hearing loss? Give a 95% CI. Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10.www.nytimes.com
12
Statistics: Unlocking the Power of Data Lock 5 Hearing Loss (0.177, 0.214)
13
Statistics: Unlocking the Power of Data Lock 5 Hearing Loss N(0.195, 0.0095)
14
Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals If the bootstrap distribution is normal: To find a P% confidence interval, we just need to find the middle P% of the distribution N(statistic, SE) www.lock5stat.com/statkey
15
Statistics: Unlocking the Power of Data Lock 5 Hearing Loss (0.176, 0.214) www.lock5stat.com/statkey
16
Statistics: Unlocking the Power of Data Lock 5 Randomization Distributions If a randomization distribution is approximately normally distributed, we can write it as a)N(null value, se) b)N(statistic, se) c)N(parameter, se)
17
Statistics: Unlocking the Power of Data Lock 5 p-values If the randomization distribution is normal: To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution N(, )
18
Statistics: Unlocking the Power of Data Lock 5 First Born Children
19
Statistics: Unlocking the Power of Data Lock 5 First Born Children
20
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing
21
Statistics: Unlocking the Power of Data Lock 5 First Born Children N(0, 37) www.lock5stat.com/statkey p-value = 0.207
22
Statistics: Unlocking the Power of Data Lock 5 Standardized Data Often, we standardize the data to have mean 0 and standard deviation 1 This is done with z-scores From x to z :From z to x: Places everything on a common scale
23
Statistics: Unlocking the Power of Data Lock 5 Standard Normal The standard normal distribution is the normal distribution with mean 0 and standard deviation 1
24
Statistics: Unlocking the Power of Data Lock 5 Standardized Data Confidence Interval (bootstrap distribution): mean = sample statistic, sd = SE From z to x: (CI) Bootstrap Distribution: N(statistic, SE)
25
Statistics: Unlocking the Power of Data Lock 5 z*z* -z * P% P% Confidence Interval 2. Return to original scale with statistic z* SE 1. Find z-scores (–z * and z * ) that capture the middle P% of the standard normal
26
Statistics: Unlocking the Power of Data Lock 5 Confidence Interval using N(0,1) If a statistic is normally distributed, we find a confidence interval for the parameter using statistic z* SE where the area between –z* and +z* in the standard normal distribution is the desired level of confidence.
27
Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Find z * for a 99% confidence interval. www.lock5stat.com/statkey z * = 2.575
28
Statistics: Unlocking the Power of Data Lock 5 z* Why use the standard normal? z * is always the same, regardless of the data! Common confidence levels: 95%: z * = 1.96 (but 2 is close enough) 90%: z * = 1.645 99%: z * = 2.576
29
Statistics: Unlocking the Power of Data Lock 5 In March 2011, a random sample of 1000 US adults were asked “Do you favor or oppose ‘sin taxes’ on soda and junk food?” 320 adults responded in favor of sin taxes. Give a 99% CI for the proportion of all US adults that favor these sin taxes. From a bootstrap distribution, we find SE = 0.015 Sin Taxes
30
Statistics: Unlocking the Power of Data Lock 5 Sin Taxes
31
Statistics: Unlocking the Power of Data Lock 5 Sin Taxes
32
Statistics: Unlocking the Power of Data Lock 5 Standardized Data Hypothesis test (randomization distribution): mean = null value, sd = SE From x to z (test) : Randomization Distribution: N(null value, SE)
33
Statistics: Unlocking the Power of Data Lock 5 p-value using N(0,1)
34
Statistics: Unlocking the Power of Data Lock 5 First Born Children
35
Statistics: Unlocking the Power of Data Lock 5 z-statistic If z = –3, using = 0.05 we would (a) Reject the null (b) Not reject the null (c) Impossible to tell (d) I have no idea
36
Statistics: Unlocking the Power of Data Lock 5 z-statistic Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale
37
Statistics: Unlocking the Power of Data Lock 5 Confidence Interval Formula From original data From bootstrap distribution From N(0,1) IF SAMPLE SIZES ARE LARGE…
38
Statistics: Unlocking the Power of Data Lock 5 Formula for p-values From randomization distribution From H 0 From original data Compare z to N(0,1) for p-value IF SAMPLE SIZES ARE LARGE…
39
Statistics: Unlocking the Power of Data Lock 5 Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!! Or at least we’ll be able to next class…
40
Statistics: Unlocking the Power of Data Lock 5 To Do Read Chapter 5 Do HW 5 (due Friday, 4/3)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.