Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Advertisements

Hypothesis Testing: Intervals and Tests
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Chapter 10: Estimating with Confidence
Dr. Kari Lock Morgan Department of Statistics Penn State University Teaching the Common Core: Making Inferences and Justifying Conclusions ASA Webinar.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
One Sample  M ean μ, Variance σ 2, Proportion π Two Samples  M eans, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π Multiple.
1 Chapter 9 Inferences from Two Samples In this chapter we will deal with two samples from two populations. The general goal is to compare the parameters.
More Randomization Distributions, Connections
Ch 10 Comparing Two Proportions Target Goal: I can determine the significance of a two sample proportion. 10.1b h.w: pg 623: 15, 17, 21, 23.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Estimation: Sampling Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Confidence Intervals: Bootstrap Distribution
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 10.1 Confidence Intervals: The Basics.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Testing for an Association between two Categorical Variables
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 6.2 Confidence Interval for a Single Proportion.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.2 χ 2 test for association (7.2) Testing for an Association between.
CHAPTER 8 (4 TH EDITION) ESTIMATING WITH CONFIDENCE CORRESPONDS TO 10.1, 11.1 AND 12.1 IN YOUR BOOK.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Inference: Conclusion with Confidence
Inference for Proportions
Significance Test for the Difference of Two Proportions
Normal Distribution Chapter 5 Normal distribution
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
When we free ourselves of desire,
Introduction to Inference
Confidence Intervals: The Basics
Estimating a Population Proportion
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Inference for Proportions
Chapter 8: Estimating with Confidence
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for standard errors Normal based inference

Statistics: Unlocking the Power of Data Lock 5 Central Limit Theorem For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal For proportions, “sufficiently large” can be taken as at least 10 in each category The larger the sample sizes, the more normal the distribution of the statistic will be

Statistics: Unlocking the Power of Data Lock 5 Accuracy The accuracy of simulation methods depends on the number of simulations (more simulations = more accurate) The accuracy of the normal distribution depends on the sample size (larger sample size = more accurate) If the distribution of the statistic is normal and you have generated many simulated statistics, the answers should be very close

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval Formula From original data From bootstrap distribution From N(0,1) IF SAMPLE SIZES ARE LARGE…

Statistics: Unlocking the Power of Data Lock 5 Formula for p-values From randomization distribution From H 0 From original data Compare z to N(0,1) for p-value IF SAMPLE SIZES ARE LARGE…

Statistics: Unlocking the Power of Data Lock 5 Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!!

Statistics: Unlocking the Power of Data Lock 5 ParameterDistributionStandard Error Proportion Normal Difference in Proportions Normal Meant, df = n – 1 Difference in Meanst, df = min(n 1, n 2 ) – 1 Standard Error Formulas

Statistics: Unlocking the Power of Data Lock 5 SE Formula Observations n is always in the denominator (larger sample size gives smaller standard error) Standard error related to square root of 1/n Standard error formulas use population parameters… (uh oh!) For intervals, plug in the sample statistic(s) as your best guess at the parameter(s) For testing, plug in the null value for the parameter(s), because you want the distribution assuming H 0 true (if the parameters have anything to do with H 0 )

Statistics: Unlocking the Power of Data Lock 5 Hitchhiker Snails A type of small snail (Tornatellides boeningi) is very widespread in Japan, and colonies of the snails that are genetically very similar have been found very far apart. How could the snails travel such long distances? Biologist Shinichiro Wada fed 174 live snails to birds, and found that 26 were excreted live out the other end. (The snails are apparently able to seal their shells shut to keep the digestive fluids from getting in). Yong, E. “The Scatological Hitchhiker Snail,” Discover, October 2011, 13.The Scatological Hitchhiker Snail

Statistics: Unlocking the Power of Data Lock 5 Question #1 of the Day What proportion of “hitchhiker” snails survive when ingested by a bird?

Statistics: Unlocking the Power of Data Lock 5 Interval for Proportion 1) Calculate sample statistic 2) Find z* for desired level of confidence 3) Calculate standard error 4) Use statistic ± z* × SE 5) Interpret in context statistic ± z* × SE

Statistics: Unlocking the Power of Data Lock 5 Sample Statistic 26 of 174 snails survived

Statistics: Unlocking the Power of Data Lock 5 z* for 90%

Statistics: Unlocking the Power of Data Lock 5 Standard Error ParameterDistributionStandard Error Proportion Normal Difference in Proportions Normal Meant, df = n – 1 Difference in Meanst, df = min(n 1, n 2 ) – 1

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Standard Error

Statistics: Unlocking the Power of Data Lock 5 Interval for Proportion statistic ± z* × SE ± × (0.105, 0.193) We are 90% confident that between and of hitchhiker snails survive being ingested by a bird.

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Interval

Statistics: Unlocking the Power of Data Lock 5 Different Birds The snails were fed to two different birds 14.3% of the 119 snails fed to Japanese white-eyes survived 16.4% of the 55 snails fed to brown-eared bulbuls survived Wada, S., Kawakami, K., & Chiba, S. Snails can survive passage through a bird’s digestive system. Journal of Biogeography, June 21, 2011.Snails can survive passage through a bird’s digestive system

Statistics: Unlocking the Power of Data Lock 5 Your Turn! Give a 95% confidence interval for the difference in proportions, white-eye – bulbul. Japanese white-eye: 14.3% out of 119 Brown-eared bulbul: 16.4% out of 55 a) (-0.10, 0.06) b) (-0.12, 0.08) c) (-0.14, 0.10) d) (-0.17, 0.13)

Statistics: Unlocking the Power of Data Lock 5 Your Turn

Statistics: Unlocking the Power of Data Lock 5 Question #2 of the Day Does hormone replacement therapy cause increased risk of breast cancer?

Statistics: Unlocking the Power of Data Lock 5 Hormone Replacement Therapy Until 2002, hormone replacement therapy (HRT), estrogen and/or progesterone, was commonly prescribed to post-menopausal women. This changed in 2002, when the results of a large clinical trial were published 8506 women were randomized to take HRT, 8102 were randomized to placebo. 166 HRT and 124 placebo women developed any kind of cancer.

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test State hypotheses Calculate sample statistic Calculate SE Calculate z = (statistic – null)/SE Compare z to standard normal distribution to find p-value. Make a conclusion.

Statistics: Unlocking the Power of Data Lock 5 Hypotheses Does hormone replacement therapy cause increased risk of invasive breast cancer? p 1 = proportion of women taking HRT who get invasive breast cancer p 2 = proportion of women not taking HRT who get invasive breast cancer a) H 0 : p 1 = p 2, H a : p 1 ≠ p 2 b) H 0 : p 1 = p 2, H a : p 1 > p 2 c) H 0 : p 1 = p 2, H a : p 1 < p 2

Statistics: Unlocking the Power of Data Lock 5 Calculate z-statistic null value = 0 What to use for p 1 and p 2 ??? Have to assume null is true…

Statistics: Unlocking the Power of Data Lock 5 Testing When testing, use the null value for p, rather than the sample statistic So, if testing whether the proportion of people not taking HRT who develop invasive breast cancer is less than 0.1, the standard error would be

Statistics: Unlocking the Power of Data Lock 5 Null Values Testing a difference in proportions: H 0 : p 1 = p 2 Use the overall sample proportion from both groups (called the pooled proportion) as an estimate for both p 1 and p 2 Note that this is in between the sample proportions for each group.

Statistics: Unlocking the Power of Data Lock 5 Standard Error

Statistics: Unlocking the Power of Data Lock 5 Standard Error

Statistics: Unlocking the Power of Data Lock 5 z-statistic

Statistics: Unlocking the Power of Data Lock 5 p-value

Statistics: Unlocking the Power of Data Lock 5 p-value

Statistics: Unlocking the Power of Data Lock 5 Conclusion If there were no difference between HRT and placebo regarding invasive breast cancer, we would only see results this extreme about 2 out of 100 times. We have evidence that HRT increases risk of invasive breast cancer. (The trial was terminated because of this, and hormone replacement therapy is now no longer routinely recommended)

Statistics: Unlocking the Power of Data Lock 5 Your Turn! Same trial, different variable of interest women were randomized to take HRT, 8102 were randomized to placebo. 502 HRT and 458 placebo women developed any kind of cancer. Does hormone replacement therapy cause increased risk of cancer in general? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Your Turn

Statistics: Unlocking the Power of Data Lock 5 Margin of Error For a single proportion, what is the margin of error? a) b) c)

Statistics: Unlocking the Power of Data Lock 5 Margin of Error You can choose your sample size in advance, depending on your desired margin of error! Given this formula for margin of error, solve for n.

Statistics: Unlocking the Power of Data Lock 5 Margin of Error

Statistics: Unlocking the Power of Data Lock 5 Margin of Error Suppose we want to estimate a proportion with a margin of error of 0.03 with 95% confidence. How large a sample size do we need? (a)About 100 (b)About 500 (c)About 1000 (d) About 5000

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Do HW 6 (due Friday, 11/6)