4.1 (Day 2) 10.10.2017.

Slides:



Advertisements
Similar presentations
+ Sampling and Surveys Inference for Sampling The purpose of a sample is to give us information about alarger population. The process of drawing conclusions.
Advertisements

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.1 Samples and Surveys.
CHAPTER 8: Producing Data Sampling ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
SECTION 4.1. INFERENCE The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.1 Samples and Surveys.
Inference for Sampling. The purpose of a sample is to give usinformation about a larger population. The process of drawing conclusions abouta population.
Chapter 4: Designing Studies
Status for AP Congrats! We are done with Part I of the Topic Outline for AP Statistics! (20%-30%) of the AP Test can be expected to cover topics from chapter.
Designing Studies In order to produce data that will truly answer the questions about a large group, the way a study is designed is important. 1)Decide.
SWBAT: Explain how undercoverage, nonresponse, and question wording can lead to bias in a sample survey. Do Now: An airline that wants to assess customer.
Chapter 4: Designing Studies... Sampling. Convenience Sample Voluntary Response Sample Simple Random Sample Stratified Random Sample Cluster Sample Convenience.
Chapter 5 Data Production
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Statistical Reasoning – April 14, 2016
Part III – Gathering Data
Some of what you need to know…
Section 5.1 Designing Samples
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 10 Samples.
Status for AP Congrats! We are done with Part I of the Topic Outline for AP Statistics! (20%-30%) of the AP Test can be expected to cover topics from chapter.
The Practice of Statistics,3rd edition – For AP* STARNES, YATES, MOORE
CHAPTER 12 Sample Surveys.
CHAPTER 8: Producing Data: Sampling
CHAPTER 4 Designing Studies
Wednesday, October 19, 2016 Warm-up
Section 5.1 Designing Samples
Sampling and Surveys How do we collect data? 8/20/2012.
Federalist Papers Activity
CHAPTER 8: Producing Data: Sampling
CHAPTER 4 Designing Studies
Producing Data Chapter 5.
CHAPTER 8: Producing Data: Sampling
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Section 5.1 Designing Samples
Chapter 5: Producing Data
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Warmup.
Chapter 5: Producing Data
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Inference for Sampling
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 3 producing data
Chapter 4: Designing Studies
Chapter 4: Designing Studies
What do Samples Tell Us Variability and Bias.
Chapter 4: Designing Studies
10/18/ B Samples and Surveys.
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Designing Samples Section 5.1.
EQ: What is a “random sample”?
Chapter 4: Designing Studies
Presentation transcript:

4.1 (Day 2) 10.10.2017

Other Sampling Methods Last week we talked about simple random samples Simplest way to eliminate bias Sometimes, however, there are factors that make more complex sampling methods more appealing Taking an SRS is too difficult/expensive We are not convinced that an SRS will actually give is a representative sample

Stratified Random Sample Probably the most commonly used sampling method Instead of sampling from the entire population, you divide the population into sub-groups (“strata”) You then randomly sample from each of these strata Combine these “sub-samples” together to form your overall sample

Stratified Random Sampling Taking an SRS is too difficult/expensive We are not convinced that an SRS will actually give is a representative sample Which of these problems is Stratified Random Sampling fixing?

Stratified Random Sampling Taking an SRS is too difficult/expensive We are not convinced that an SRS will actually give is a representative sample Which of these problems is Stratified Random Sampling fixing? #2 Sidenote: do NOT abbreviate stratified random sampling as SRS SRS always means simple random sample

Stratified Random Sampling When is it most commonly used? Commonly used to make sure that important groups are all represented Income Race/ethnicity Gender Age Etc.

Downsides to Stratified Random Sampling

Downsides to Stratified Random Sampling Harder/more expensive than SRS You have to know in advance who falls into which strata You have to know how many to sample from each strata to make it representative

Cluster Sampling Cluster sampling is a solution to the other problem: that SRS can be expensive/difficult Particularly if your population is large and/or very spread out, it is difficult to use SRS or stratified random sampling These methods are also very difficult to use if you cannot easily identify each individual in your population Think of the trees in the forest

Cluster Sampling to the Rescue The procedure for carrying out a cluster sample is similar to a stratified random sample First divide the population into smaller groups (or “clusters”) So far this sounds exactly like stratified random sampling But now we are not dividing into different groups to try to make it representative, but instead we are dividing into groups that can be readily found together Homerooms Neighborhoods 50x50 sections of the forest

Cluster Sampling to the Rescue But now we are not dividing into different groups to try to make it representative, but instead we are dividing into groups that can be readily found together Homerooms Neighborhoods 50x50 sections of the forest Then we randomly select which clusters to sample Typically, once a cluster is selected, ALL individuals in the cluster are included in the sample Occasionally you may choose to then take an SRS within each cluster

Cluster Sampling As with stratified random sampling, we then combine the clusters back together to form our overall sample So cluster sampling is NOT necessarily making our sample more representative than an SRS But it is, in some cases, making it easier to perform and more cost/time effective

The Stratified/Cluster Sample Hybrid On a test, a given method will clearly be an SRS, a stratified random sample, or a cluster sample But in practice, there is some overlap between cluster samples and stratified random samples You may even use such a hybrid later this semester

The Stratified/Cluster Sample Hybrid But in practice, there is some overlap between cluster samples and stratified random samples You may even use such a hybrid later this semester For example, if my population is citizens of the United States, it would be too difficult to take an SRS So I decide to use a cluster sample to make it easier But I want to make sure that people in both smaller cities/towns (under 40,000 people) and bigger cities (over 40,000 people) are represented

The Stratified/Cluster Sample Hybrid For example, if my population is citizens of the United States, it would be too difficult to take an SRS So I decide to use a cluster sample to make it easier But I want to make sure that people in both smaller cities/towns (under 40,000 people) and bigger cities (over 40,000 people) are represented So I decide to randomly sample individual from 10 big cities, and 5 smaller cities

The Stratified/Cluster Sample Hybrid So I decide to randomly sample individual from 10 big cities, and 5 smaller cities Notice that I have clustered by city But also stratified by size of the city So this is not a pure stratified random sample, and not a pure cluster sample—it is a hybrid Your book (as far as I know) does not really talk about this, but I just wanted you to be aware of it

Inference Why random sampling? Let’s do an activity I have 3 decks of cards One at a time, come shuffle a deck, then pick 5 cards from a deck Then put them back in the deck Then record on the (left) dotplot what percentage were red What does it look like?

Inference Why random sampling? Let’s do an activity I have 3 decks of cards Now pick 10 cards Then record on the (middle) dotplot what percentage were red What does it look like?

Inference Why random sampling? Let’s do an activity I have 3 decks of cards Now pick 20 cards Then record on the (right) dotplot what percentage were red What does it look like?

Inference What did this activity tell us? What does it imply for random sampling?

Random Sampling Random sampling works so well because each individual is equally likely to be chosen Each card was equally likely to be chosen As the sample gets larger, it comes closer and closer to representing the entire population

Margin of Error Each sample has a margin of error Sets bounds for the likely potential range of the true population value based on the sample

Sample Surveys: What Can Go Wrong? Undercoverage occurs when some groups in the population are left out of the process of choosing the sample. Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate. A systematic pattern of incorrect responses in a sample survey leads to response bias. The wording of questions is the most important influence on the answers given to a sample survey.

Wording of the Question “Since they are breaking the law, should illegal immigrants be deported?” “Should illegal immigrants be deported?” “Despite holding jobs and helping the US economy, should illegal immigrants be deported?”

Order of the Questions A series of two questions were asked to college students as part of a larger survey. The order of the two questions was randomly assigned Order #1: “How happy are you with your life in general? (on a scale of 1 to 5)” “How many dates have you been on in the past month?” This order found almost no correlated between happiness and number of dates

Order of the Questions Order #2: “How many dates have you been on in the past month?” “How happy are you with your life in general? (on a scale of 1 to 5)” This order found a moderately strong positive correlation between happiness and number of dates Sometimes the answer to one question changes the answer to another—so even the order can matter

Surveys Moral of the story: There are lots of ways to create bias in a survey You will explore these in the first semester project In general, if you want unbiased results: Be careful with your wording—make sure that the questions are as neutral as possible Try not to put questions near each other that would affect the answer to the other Make your sample as representative as possible It is HARD to know for sure that you’re not getting bias—but you can take precautions It is EASY to intentionally create bias But this makes it more PROPAGANDA than statistics

The End