Normal Distribution Chapter 5 Normal distribution

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Advertisements

Hypothesis Testing: Intervals and Tests
Chapter 7 Statistical Inference: Confidence Intervals
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics and Quantitative Analysis U4320
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/25/12 Sections , Single Mean t-distribution (6.4) Intervals.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
t scores and confidence intervals using the t distribution
Ka-fu Wong © 2003 Chap 9- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
HIM 3200 Normal Distribution Biostatistics Dr. Burton.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Chapter 11: Inference for Distributions
Getting Started with Hypothesis Testing The Single Sample.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
AM Recitation 2/10/11.
Overview Definition Hypothesis
Confidence Intervals: Bootstrap Distribution
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Lecture 3: Review Review of Point and Interval Estimators
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Significance Tests in practice Chapter Tests about a population mean  When we don’t know the population standard deviation σ, we perform a one.
X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ X _ μ.
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Chapter 8: Statistical Inference: Confidence Intervals
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Inference for Quantitative Variables 3/12/12 Single Mean, µ t-distribution Intervals and tests Difference in means, µ 1 – µ 2 Distribution Matched pairs.
Statistical Sampling & Analysis of Sample Data
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Agresti/Franklin Statistics, 1 of 87  Section 7.2 How Can We Construct a Confidence Interval to Estimate a Population Proportion?
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
CHAPTER-6 Sampling error and confidence intervals.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
CONFIDENCE INTERVALS.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Section 6.4 Distribution of a Sample Mean.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 6.2 Confidence Interval for a Single Proportion.
1 Probability and Statistics Confidence Intervals.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Statistics: Unlocking the Power of Data Lock 5 Section 6.12 Test for a Difference in Means.
Many times in statistical analysis, we do not know the TRUE mean of a population on interest. This is why we use sampling to be able to generalize the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 8 Statistical Inference: Confidence Intervals Section 8.1 Point and Interval Estimates.
And distribution of sample means
Understanding Sampling Distributions: Statistics as Random Variables
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Normal Distribution Chapter 5 Normal distribution
Distribution of a Difference in Means
Daniela Stan Raicu School of CTI, DePaul University
Presentation transcript:

Normal Distribution Chapter 5 Normal distribution STAT 101 Dr. Kari Lock Morgan Normal Distribution Chapter 5 Normal distribution Central limit theorem Normal distribution for confidence intervals Normal distribution for p-values Standard normal

Re-grade Requests 4e potential grading mistake: 0.025 is correct Requests for a re-grade must be submitted in writing by class on Wednesday, March 5th Partial credit will NOT be adjusted Valid re-grade requests: You got points off but believe your answer is correct Points were added incorrectly Warning: scores may go up or down

Bootstrap and Randomization Distributions Correlation: Malevolent uniforms Slope :Restaurant tips What do you notice? Mean :Body Temperatures Diff means: Finger taps Proportion : Owners/dogs Mean : Atlanta commutes

Normal Distribution The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution

Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal www.lock5stat.com/StatKey

Distribution of 𝒑

CLT for a Mean Population Distribution of Sample Data Distribution of Sample Means n = 10 n = 30 n = 50

Central Limit Theorem The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work For small samples, it is more important that the data itself is approximately normal

Central Limit Theorem For distributions of a quantitative variable that are not very skewed and without large outliers, n ≥ 30 is usually sufficient to use the CLT For distributions of a categorical variable, counts of at least 10 within each category is usually sufficient to use the CLT

Accuracy The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate) The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate) If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close

Normal Distribution The normal distribution is fully characterized by it’s mean and standard deviation

Bootstrap Distributions If a bootstrap distribution is approximately normally distributed, we can write it as N(parameter, sd) N(statistic, sd) N(parameter, se) N(statistic, se) sd = standard deviation of variable se = standard error = standard deviation of statistic

Hearing Loss In a random sample of 1771 Americans aged 12 to 19, 19.5% had some hearing loss (this is a dramatic increase from a decade ago!) What proportion of Americans aged 12 to 19 have some hearing loss? Give a 95% CI. Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10.

Hearing Loss (0.177, 0.214)

Hearing Loss N(0.195, 0.0095)

Confidence Intervals N(statistic, SE) If the bootstrap distribution is normal: To find a P% confidence interval , we just need to find the middle P% of the distribution N(statistic, SE)

Area under a Curve The area under the curve of a normal distribution is equal to the proportion of the distribution falling within that range Knowing just the mean and standard deviation of a normal distribution allows you to calculate areas in the tails and percentiles www.lock5stat.com/statkey

Hearing Loss www.lock5stat.com/statkey (0.176, 0.214) Show them this on StatKey (0.176, 0.214)

Standardized Data Often, we standardize the data to have mean 0 and standard deviation 1 This is done with z-scores From x to z : From z to x: Places everything on a common scale

Standard Normal The standard normal distribution is the normal distribution with mean 0 and standard deviation 1

Standardized Data Confidence Interval (bootstrap distribution): mean = sample statistic, sd = SE From z to x: (CI)

P% Confidence Interval 1. Find z-scores (–z* and z*) that capture the middle P% of the standard normal 2. Return to original scale with statistic  z* SE P% -z* z*

Confidence Interval using N(0,1) If a statistic is normally distributed, we find a confidence interval for the parameter using statistic  z* SE where the area between –z* and +z* in the standard normal distribution is the desired level of confidence.

Confidence Intervals Find z* for a 99% confidence interval. z* = 2.575 www.lock5stat.com/statkey z* = 2.575

z* Why use the standard normal? Common confidence levels: 95%: z* = 1.96 (but 2 is close enough) 90%: z* = 1.645 99%: z* = 2.576

Sin Taxes In March 2011, a random sample of 1000 US adults were asked “Do you favor or oppose ‘sin taxes’ on soda and junk food?” 320 adults responded in favor of sin taxes. Give a 99% CI for the proportion of all US adults that favor these sin taxes. From a bootstrap distribution, we find SE = 0.015

Sin Taxes

Sin Taxes

Randomization Distributions If a randomization distribution is approximately normally distributed, we can write it as N(null value, se) N(statistic, se) N(parameter, se)

p-values If the randomization distribution is normal: To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution

First Born Children Are first born children actually smarter? Explanatory variable: first born or not Response variable: combined SAT score Based on a sample of college students, we find 𝑥 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − 𝑥 𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 =30.26 From a randomization distribution, we find SE = 37

First Born Children 𝑥 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − 𝑥 𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 =30.26 SE = 37 𝑥 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − 𝑥 𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 =30.26 SE = 37 What normal distribution should we use to find the p-value? N(30.26, 37) N(37, 30.26) N(0, 37) N(0, 30.26)

Hypothesis Testing

First Born Children N(0, 37) p-value = 0.207 www.lock5stat.com/statkey Go through this on StatKey for them

Standardized Data Hypothesis test (randomization distribution): mean = null value, sd = SE From x to z (test) :

p-value using N(0,1) If a statistic is normally distributed under H0, the p-value is the probability a standard normal is beyond 𝑧= 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 −𝑛𝑢𝑙𝑙 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑆𝐸

First Born Children 𝑥 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 − 𝑥 𝑛𝑜𝑡 𝑓𝑖𝑟𝑠𝑡 𝑏𝑜𝑟𝑛 =30.26, SE = 37 Find the standardized test statistic Compute the p-value

First Born Children

z-statistic If z = –3, using  = 0.05 we would (a) Reject the null (b) Not reject the null (c) Impossible to tell (d) I have no idea

z-statistic Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale

Confidence Interval Formula IF SAMPLE SIZES ARE LARGE… From N(0,1) From original data From bootstrap distribution

From randomization distribution Formula for p-values IF SAMPLE SIZES ARE LARGE… From original data From H0 From randomization distribution Compare z to N(0,1) for p-value

Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!! Or at least we’ll be able to next class…

t-distribution For quantitative data, we use a t- distribution instead of the normal distribution The t distribution is very similar to the standard normal, but with slightly fatter tails (to reflect the uncertainty in the sample standard deviations)

Degrees of Freedom The t-distribution is characterized by its degrees of freedom (df) Degrees of freedom are based on sample size Single mean: df = n – 1 Difference in means: df = min(n1, n2) – 1 Correlation: df = n – 2 The higher the degrees of freedom, the closer the t-distribution is to the standard normal

t-distribution

Aside: William Sealy Gosset

The Pygmalion Effect Teachers were told that certain children (chosen randomly) were expected to be intellectual “growth spurters,” based on the Harvard Test of Inflected Acquisition (a test that didn’t actually exist). These children were selected randomly. The response variable is change in IQ over the course of one year. Source: Rosenthal, R. and Jacobsen, L. (1968). “Pygmalion in the Classroom: Teacher Expectation and Pupils’ Intellectual Development.” Holt, Rinehart and Winston, Inc.

The Pygmalion Effect n s Control Students 255 8.42 12.0 “Growth Spurters” 65 12.22 13.3 Can this provide evidence that merely expecting a child to do well actually causes the child to do better? If so, how much better? SE = 1.8 *s1 and s2 were not given, so I set them to give the correct p-value

Pygmalion Effect

Pygmalion Effect From the paper: “The difference in gains could be ascribed to chance about 2 in 100 times”

Pygmalion Effect

To Do Do Project 1 (due 3/7) Read Chapter 5