Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections 6.1-6.3, 6.7-6.9 Single Proportion, p Distribution (6.1)

Slides:



Advertisements
Similar presentations
Chi-Square Tests 3/14/12 Testing the distribution of a single categorical variable :  2 goodness of fit Testing for an association between two categorical.
Advertisements

Statistics: Unlocking the Power of Data Lock 5 Testing Goodness-of- Fit for a Single Categorical Variable Kari Lock Morgan Section 7.1.
Hypothesis Testing: Intervals and Tests
Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
12.1 Inference for A Population Proportion.  Calculate and analyze a one proportion z-test in order to generalize about an unknown population proportion.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Point and Confidence Interval Estimation of a Population Proportion, p
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Objective: To test claims about inferences for proportions, under specific conditions.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Chapter 7 Confidence Intervals and Sample Sizes
More Randomization Distributions, Connections
Inference for Proportions(C18-C22 BVD) C19-22: Inference for Proportions.
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
CHAPTER 20: Inference About a Population Proportion ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Lesson Comparing Two Proportions. Knowledge Objectives Identify the mean and standard deviation of the sampling distribution of p-hat 1 – p-hat.
From the Data at Hand to the World at Large
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
AP Statistics Chapter 20 Notes
Confidence Intervals and Tests of Proportions. Assumptions for inference when using sample proportions: We will develop a short list of assumptions for.
1 Chapter 12: Inference for Proportions 12.1Inference for a Population Proportion 12.2Comparing Two Proportions.
Lesson Comparing Two Proportions. Inference Toolbox Review Step 1: Hypothesis –Identify population of interest and parameter –State H 0 and H a.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
AP STATISTICS LESSON INFERENCE FOR A POPULATION PROPORTION.
The z test statistic & two-sided tests Section
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.
Ch 12 – Inference for Proportions YMS 12.1
Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Minimum Sample Size Proportions on the TI Section
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.
12.1 Inference for A Population Proportion.  Calculate and analyze a one proportion z-test in order to generalize about an unknown population proportion.
Statistics: Unlocking the Power of Data Lock 5 Section 4.2 Measuring Evidence with p-values.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 4.5 Confidence Intervals and Hypothesis Tests.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
9-3 Testing a proportion. What if you wanted to test - the change in proportion of students satisfied with the french fry helpings at lunch? - the recent.
Statistics: Unlocking the Power of Data Lock 5 Section 6.2 Confidence Interval for a Single Proportion.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
Statistics: Unlocking the Power of Data Lock 5 Section 6.7 Distribution of Differences in Proportions.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Statistics: Unlocking the Power of Data Lock 5 Section 6.3 Test for a Single Proportion.
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Inference about proportions Example: One Proportion Population of students Sample of 175 students CI: What proportion (percentage) of students abstain.
Statistics: Unlocking the Power of Data Lock 5 Section 6.8 Confidence Interval for a Difference in Proportions.
 Confidence Intervals  Around a proportion  Significance Tests  Not Every Difference Counts  Difference in Proportions  Difference in Means.
Topic 12 Sampling Distributions. Sample Proportions is determined by: = successes / size of sample = X/n If you take as SRS with size n with population.
Inference for Proportions
Hypothesis Testing.
Inference on Proportions
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1) Intervals and tests (6.2, 6.3) Difference in proportions, p 1 – p 2 One proportion or two? (6.7) Distribution (6.7) Intervals and tests (6.8, 6.9) Inference for Proportions: Normal Distribution

Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal

Statistics: Unlocking the Power of Data Lock 5 IF SAMPLE SIZES ARE LARGE… A confidence interval can be calculated by where z * is a N(0,1) percentile depending on the level of confidence. Interval Using N(0,1)

Statistics: Unlocking the Power of Data Lock 5 IF SAMPLE SIZES ARE LARGE… A p-value is the area in the tail(s) of a N(0,1) beyond Tests Using N(0,1)

Statistics: Unlocking the Power of Data Lock 5 Today, we’ll learn formulas for the standard errors. Standard Errors

Statistics: Unlocking the Power of Data Lock 5 The standard error for a sample proportion can be calculated by SE of a Proportion *Notice the sample size in the denominator… as the sample size increases, the standard error decreases

Statistics: Unlocking the Power of Data Lock 5 If he is truly guessing randomly, then p = 0.5 so the SE of his sample proportion correct out of 8 guesses is Paul the Octopus

Statistics: Unlocking the Power of Data Lock 5 Paul the Octopus This is the same value we get from a randomization distribution…

Statistics: Unlocking the Power of Data Lock 5 If Paul really does have psychic powers, and can guess the correct team every time, then p = 1, and Paul the Octopus

Statistics: Unlocking the Power of Data Lock 5

If counts for each category are at least 10 (np ≥ 10 and n(1 – p) ≥ 10), then CLT for a Proportion

Statistics: Unlocking the Power of Data Lock 5 One small problem… if we are doing inference for p, we don’t know p! For confidence intervals, use your best guess for p: Standard Error

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval for a Single Proportion

Statistics: Unlocking the Power of Data Lock 5 On 10/17/12, a random sample of 500 North Carolina likely voters were polled. 260 said they plan to vote for Mitt Romney. Give a 95% CI for the proportion of likely voters in North Carolina that support Mitt Romney. n_2012/election_2012_presidential_election/north_carolina/election_2012_n orth_carolina_president Obama vs Romney

Statistics: Unlocking the Power of Data Lock 5 Counts are greater than 10 in each category For a 95% confidence interval, z * = 2 Obama vs Romney We are 95% confident that between 47.5% and 56.6% of likely voters in North Carolina support Romney.

Statistics: Unlocking the Power of Data Lock 5 Obama vs Romney

Statistics: Unlocking the Power of Data Lock 5 Obama vs Romney

Statistics: Unlocking the Power of Data Lock 5 Other Levels of Confidence Technically, for 95% confidence, z* = 1.96, but 2 is much easier to remember, and close enough

Statistics: Unlocking the Power of Data Lock 5 z*z* -z * P% z* on TI-83 2 nd  DISTR  3: invNorm(  Proportion below z* (for a 95% CI, the proportion below z* is 0.975)

Statistics: Unlocking the Power of Data Lock 5 Margin of Error For a single proportion, what is the margin of error? a) b) c) CI = statistic  margin of error

Statistics: Unlocking the Power of Data Lock 5 Margin of Error You can choose your sample size in advance, depending on your desired margin of error! Given this formula for margin of error, solve for n.

Statistics: Unlocking the Power of Data Lock 5 Margin of Error

Statistics: Unlocking the Power of Data Lock 5 Margin of Error Suppose we want to estimate a proportion with a margin of error of 0.03 with 95% confidence. How large a sample size do we need? (a)About 100 (b)About 500 (c)About 1000 (d) About 5000

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing For hypothesis testing, we want the distribution of the sample proportion assuming the null hypothesis is true What to use for p?

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing The p-value is the area in the tail(s) beyond z in a N(0,1)

Statistics: Unlocking the Power of Data Lock 5 Of the 2430 Major League Baseball (MLB) games played in 2009, the home team won in 54.9% of the games. If we consider 2009 as a representative sample of all MLB games, is this evidence of a home field advantage in Major League Baseball? (a) Yes (b) No (c) No idea The p-value is very small, so we have very strong evidence of a home field advantage. Baseball Home Field Advantage

Statistics: Unlocking the Power of Data Lock 5 Baseball Home Field Advantage Counts are greater than 10 in each category Based on this data, there is strong evidence of a home field advantage in major league baseball.

Statistics: Unlocking the Power of Data Lock 5 Baseball Home Field Advantage

Statistics: Unlocking the Power of Data Lock 5 p-value on TI-83 2 nd  DISTR  3: normalcdf(  lower bound, upper bound Hint: if you want greater than 2, just put 2, 100 (or some other large number)

Statistics: Unlocking the Power of Data Lock 5 One Proportion or Two? Two proportions: there are two separate categorical variables One proportion: there is only one categorical variable

Statistics: Unlocking the Power of Data Lock 5 One Proportion or Two? Of residents in the triangle area on Saturday, was the proportion of people cheering for Duke or UNC greater? How much greater? a)Inference for one proportion b)Inference for two proportions (Note: assume no one will be cheering for both) This is one categorical variable: which team each person will be cheering for on Saturday night.

Statistics: Unlocking the Power of Data Lock 5 One Proportion or Two? Who was more likely to be wearing a blue shirt on Saturday night, a UNC fan or a Duke fan? a)Inference for one proportion b)Inference for two proportions This is two categorical variables: which team each person will be cheering for on Saturday night, and whether each person is wearing a blue shirt.

Statistics: Unlocking the Power of Data Lock 5

If counts within each category (each cell of the two- way table) are at least 10

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Give a 90% confidence interval for the difference in proportions. Source: Saraux, et. al. (2011). “Reliability of flipper- banded penguins as indicators of climate change,” Nature, 469, Are metal tags detrimental to penguins? A study looked at the 10 year survival rate of penguins tagged either with a metal tag or an electronic tag. 20% of the 167 metal tagged penguins survived, compared to 36% of the 189 electronic tagged penguins.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins We are 90% confident that the survival rate is between 0.09 and lower for metal tagged penguins, as opposed to electronically tagged.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing What should we use for p 1 and p 2 in the formula for SE for hypothesis testing?

Statistics: Unlocking the Power of Data Lock 5 Pooled Proportion Overall sample proportion across both groups. It will be in between the two observed sample proportions.

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing The p-value is the area in the tail(s) beyond z in a N(0,1)

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Are metal tags detrimental to penguins? (a) Yes (b) No (c) Cannot tell from this data 20% of the 167 metal tagged penguins survived, compared to 36% of the 189 electronic tagged penguins. Yes. The p-value is very small.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins Are metal tags detrimental to penguins?

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins This is very strong evidence that metal tags are detrimental to penguins.

Statistics: Unlocking the Power of Data Lock 5 Metal Tags and Penguins

Statistics: Unlocking the Power of Data Lock 5 Accuracy The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate) The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate) If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close

Statistics: Unlocking the Power of Data Lock 5 For a single proportion: For a difference in proportions: Summary

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Do Homework 5 (due Tuesday, 10/30)Homework 5