Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Advertisements

Hypothesis Testing: Intervals and Tests
INFERENCE: SIGNIFICANCE TESTS ABOUT HYPOTHESES Chapter 9.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/25/12 Sections , Single Mean t-distribution (6.4) Intervals.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Significance Testing Chapter 13 Victor Katch Kinesiology.
Describing Data: One Quantitative Variable
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Chapter 11: Inference for Distributions
Chapter 9 Hypothesis Testing.
Chapter 12 Section 1 Inference for Linear Regression.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
AM Recitation 2/10/11.
Hypothesis Testing – Examples and Case Studies
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Inference for Quantitative Variables 3/12/12 Single Mean, µ t-distribution Intervals and tests Difference in means, µ 1 – µ 2 Distribution Matched pairs.
Essential Statistics Chapter 131 Introduction to Inference.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Using Randomization Methods to Build Conceptual Understanding of Statistical Inference: Day 2 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse- Joint.
Chapter 10 – Sampling Distributions Math 22 Introductory Statistics.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence Intervals: Bootstrap Distribution
Chapter 221 What Is a Test of Significance?. Chapter 222 Thought Question 1 The defendant in a court case is either guilty or innocent. Which of these.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
PSY 307 – Statistics for the Behavioral Sciences Chapter 9 – Sampling Distribution of the Mean.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Section 6.4 Distribution of a Sample Mean.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Review Confidence Intervals Sample Size. Estimator and Point Estimate An estimator is a “sample statistic” (such as the sample mean, or sample standard.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
 Normal Curves  The family of normal curves  The rule of  The Central Limit Theorem  Confidence Intervals  Around a Mean  Around a Proportion.
Understanding Sampling Distributions: Statistics as Random Variables
Normal Distribution Chapter 5 Normal distribution
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Simulation-Based Approach for Comparing Two Means
Connecting Intuitive Simulation-Based Inference to Traditional Methods
CHAPTER 12 More About Regression
Sampling Distribution Models
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem Normal distribution for confidence intervals Normal distribution for p-values Standard normal

Statistics: Unlocking the Power of Data Lock 5 Project 1 Due Tuesday 5 pages, double spaced, including figures Hypotheses should not change based on data This is a research paper – there should be text and complete sentences.

Statistics: Unlocking the Power of Data Lock 5 Slope :Restaurant tips Correlation: Malevolent uniforms Mean :Body Temperatures Diff means: Finger taps Mean : Atlanta commutes Proportion : Owners/dogs What do you notice? All bell-shaped distributions! Bootstrap and Randomization Distributions

Statistics: Unlocking the Power of Data Lock 5 The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution Normal Distribution

Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal

Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work

Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem For distributions of a quantitative variable that are not very skewed and without large outliers, n ≥ 30 is usually sufficient to use the CLT For distributions of a categorical variable, counts of at least 10 within each category is usually sufficient to use the CLT

Statistics: Unlocking the Power of Data Lock 5 The normal distribution is fully characterized by it’s mean and standard deviation Normal Distribution

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distributions If a bootstrap distribution is approximately normally distributed, we can write it as a)N(parameter, sd) b)N(statistic, sd) c)N(parameter, se) d)N(statistic, se) sd = standard deviation of variable se = standard error = standard deviation of statistic

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals If the bootstrap distribution is normal: To find a P% confidence interval, we just need to find the middle P% of the distribution N(statistic, SE)

Statistics: Unlocking the Power of Data Lock 5 Best Picture What proportion of visitors to thought The Artist should win best picture?

Statistics: Unlocking the Power of Data Lock 5 Best Picture

Statistics: Unlocking the Power of Data Lock 5 Area under a Curve The area under the curve of a normal distribution is equal to the proportion of the distribution falling within that range Knowing just the mean and standard deviation of a normal distribution allows you to calculate areas in the tails and percentiles

Statistics: Unlocking the Power of Data Lock 5 Best Picture

Statistics: Unlocking the Power of Data Lock 5 Best Picture

Statistics: Unlocking the Power of Data Lock 5 For a normal sampling distribution, we can also use the formula to give a 95% confidence interval. Confidence Intervals

Statistics: Unlocking the Power of Data Lock 5 For normal bootstrap distributions, the formula gives a 95% confidence interval. How would you use the N(0,1) normal distribution to find the appropriate multiplier for other levels of confidence? Confidence Intervals

Statistics: Unlocking the Power of Data Lock 5 For a P% confidence interval, use where P% of a N(0,1) distribution is between –z * and z * Confidence Intervals

Statistics: Unlocking the Power of Data Lock 5 z*z* -z * P% Confidence Intervals

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Find z * for a 99% confidence interval. z * = 2.575

Statistics: Unlocking the Power of Data Lock 5 News Sources “A new national survey shows that the majority (64%) of American adults use at least three different types of media every week to get news and information about their local community” The standard error for this statistic is 1% Find a 99% confidence interval for the true proportion. Source:

Statistics: Unlocking the Power of Data Lock 5 News Sources

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval Formula From original data From bootstrap distribution From N(0,1)

Statistics: Unlocking the Power of Data Lock 5 First Born Children Are first born children actually smarter? Based on data from last semester’s class survey, we’ll test whether first born children score significantly higher on the SAT From a randomization distribution, we find SE = 37

Statistics: Unlocking the Power of Data Lock 5 First Born Children What normal distribution should we use to find the p-value? a)N(30.26, 37) b)N(37, 30.26) c)N(0, 37) d)N(0, 30.26) Because this is a hypothesis test, we want to see what would happen if the null were true, so the distribution should be centered around the null. The variability is equal to the standard error.

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing

Statistics: Unlocking the Power of Data Lock 5 p-values If the randomization distribution is normal: To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution N(null value, SE)

Statistics: Unlocking the Power of Data Lock 5 First Born Children N(0, 37) p-value = 0.207

Statistics: Unlocking the Power of Data Lock 5 First Born Children

Statistics: Unlocking the Power of Data Lock 5 Standard Normal Sometimes, it is easier to just use one normal distribution to do inference The standard normal distribution is the normal distribution with mean 0 and standard deviation 1

Statistics: Unlocking the Power of Data Lock 5 Standardized Test Statistic The standardized test statistic is the number of standard errors a statistic is from the null value The standardized test statistic (also called a z-statistic) is compared to N(0,1)

Statistics: Unlocking the Power of Data Lock 5 p-value 1)Find the standardized test statistic: 2)The p-value is the area in the tail(s) beyond z for a standard normal distribution

Statistics: Unlocking the Power of Data Lock 5 First Born Children 1)Find the standardized test statistic

Statistics: Unlocking the Power of Data Lock 5 First Born Children 2)Find the area in the tail(s) beyond z for a standard normal distribution p-value = 0.207

Statistics: Unlocking the Power of Data Lock 5 z-statistic If z = –3, using  = 0.05 we would (a) Reject the null (b) Not reject the null (c) Impossible to tell (d) I have no idea About 95% of z-statistics are within -2 and +2, so anything beyond those values will be in the most extreme 5%, or equivalently will give a p-value less than 0.05.

Statistics: Unlocking the Power of Data Lock 5 z-statistic Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale

Statistics: Unlocking the Power of Data Lock 5 Formula for p-values From randomization distribution From H 0 From original data Compare z to N(0,1) for p-value

Statistics: Unlocking the Power of Data Lock 5 IF SAMPLE SIZES ARE LARGE… A p-value is the area in the tail(s) of a N(0,1) beyond Tests Using N(0,1)

Statistics: Unlocking the Power of Data Lock 5 Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!! Or rather, we’ll be able to next week!

Statistics: Unlocking the Power of Data Lock 5 To Do Do Project 1 (due 10/23)Project 1 Read Chapter 5