Lecture 18.

Slides:



Advertisements
Similar presentations
Plausible values and Plausibility Range 1. Prevalence of FSWs in some west African Countries 2 0.1% 4.3%
Advertisements

Lecture 20. Parameters and statistics Example: A random sample of 1014 voters are asked if they think the President is too liberal, too conservative,
Lecture 18. Final Project Design your own survey! – Find an interesting question and population – Design your sampling plan – Collect Data – Analyze using.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
Chapter 10– Estimating Voter Preferences Statistics is the science of making decisions in the face of uncertainty. We use information gathered from a sample.
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Estimation PowerPoint Prepared by Alfred P. Rovai.
A.P. STATISTICS LESSON SAMPLE PROPORTIONS. ESSENTIAL QUESTION: What are the tests used in order to use normal calculations for a sample? Objectives:
CONFIDENCE STATEMENT MARGIN OF ERROR CONFIDENCE INTERVAL 1.
Sampling Variability Section 8.1. Sampling Distribution Represents the long run behavior of the mean when sample after sample is selected. It is used.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
CHAPTER 8 ESTIMATING WITH CONFIDENCE 8.1 Confidence Intervals: The Basics Outcome: I will determine the point estimate and margin of error from a confidence.
Active Learning Lecture Slides For use with Classroom Response Systems
FREQUENCY DISTRIBUTION
Chapter 3 INTERVAL ESTIMATES
Sources of Error In Sampling
HW Page 23 Have HW out to be checked.
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Sampling Distribution Models
Chapter 3 INTERVAL ESTIMATES
Sampling Distributions
CHAPTER 4 Designing Studies
Section 9.2 – Sample Proportions
Inference: Conclusion with Confidence
Confidence Interval for the Difference of Two Proportions
CHAPTER 8 Estimating with Confidence
Confidence Intervals: Sampling Distribution
Lecture 17.
Sampling Distributions
Lecture 20.
Lecture 19.
Ch. 8 Estimating with Confidence
guarantees that each group is represented in the sample
CHAPTER 8 Estimating with Confidence
CHAPTER 14: Confidence Intervals The Basics
CHAPTER 22: Inference about a Population Proportion
Section 8.1 Day 4.

CHAPTER 8 Estimating with Confidence
Section 7.7 Introduction to Inference
Confidence Intervals: The Basics
Chapter 10: Estimating with Confidence
Sampling Distribution of a Sample Proportion
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Pull 2 samples of 10 pennies and record both averages (2 dots).
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating With Confidence
CHAPTER 10 Comparing Two Populations or Groups
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Definition: Margin of Error
Chapter 8: Estimating with Confidence
Sampling Distributions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Lecture 18

Final Project Design your own survey! Find an interesting question and population Design your sampling plan Collect Data Analyze using R Write 5 page paper on your results Due December 1

Final presentation All are required to give a short presentation Last two classes (December 1 and 6) Select one of the three projects Make a powerpoint presentation (no more than 3-5 slides) Present your results to the class

Statistics Main ideas of statistics Given multiple plausible models select one (or several) that is (are) the most consistent with the observed data Quantify a measure of belief in our solution The main idea is that if something looks like a very unlikely coincidence we would prefer another more likely explanation

Parameters and statistics Example: A random sample of 603 voters are asked if they prefer cats or dogs. http://news.nationalgeographic.com/news/2013/06/130619-pets-poll-animals-united-states-nation-dogs-cats/ 314 (52%) prefer dogs The observed percentage in the sample, 52%, is a statistic. The unknown percent of the population who would say yes is a parameter. Presumably it’s close to 52%, but we will never know. Question we’d like to ask: “How likely is the statistic to be how close to the parameter?”

Experiment Population (urn) – our class (x people) Question: Dogs vs Cats We will sample 5 Assign numbers Use R to sample Repeat 5 times ascertain truth and run R

US Polls 2016 estimate of the number of eligible voters is 225,778,000. We sampled 603 people at random and got 314 dogs.

Main idea We learned that given a model we can check if the data is consistent with it Idea: Find models that are consistent with the data.

US Polls 2016 estimate of the number of eligible voters is 225,778,000. We sampled 603 people at random and got 314 dogs. We will consider various models: True proportion is p = 0.001, 0.002, 0.003, … Which of the models is the data in agreement with?

Results

“P-value” Consider proportion of fake samples < 314 Values close to 0 or 1 are not consistent with the model. (Why?)

Cutoff Using better resolution of models A usual cutoff 0.05 split between both sides (0.975 and .025) Models selected: [.482, .560]

Fake data 3 sub-groups The sample contains: 46,392,000 (R) Sampled 111, 101 yes (R) Sampled 166, 30 yes (D) Sampled 326, 183 yes (I)

Issue Too many levers to “fiddle” (three different Ks for each group) Not easy to simply look how well the data fits for all models. Needs more sophisticated statistics

Naïve parametric bootstrap Compute a number/numbers estimated from the data (point estimate). Use this number to simulate a lot of fake data and see how the fake data can vary. Use this variability to estimate the uncertainty in our estimator

Without Stratification Estimated K= 225,778,000 * (314/603)= 117,569,307 Estimated p in (.481,.561)

Stratified problem Estimate K for each group – combine to get joint p estimate pcombined=.20*.91 +.29*.18 +.51*.56=.50 This is small difference by using stratification There is a (small) gain in uncertainty Bootstrap interval (.472,.535)

Bootstrap Stratified Blah

Bootstrap SRS Blah

Homework Modify the bootstrap codes to get the bootstrap intervals for the population proportion of yes in the following two situations: Population of 30000; SRS sample of 100, observe 32 yes. Population of 30000 stratified to subpopulations of 10000 and 20000. Sample 30 observing 27 from fist subpopulation and sample 70 observing 5 from second subpopulation.