Presentation is loading. Please wait.

Presentation is loading. Please wait.

You want to survey a school You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students.

Similar presentations


Presentation on theme: "You want to survey a school You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students."— Presentation transcript:

1

2 You want to survey a school You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students are not on this list? A phenomenon known as? Potentially problematic because? (Hint: Dillman, p. 196)

3 Some reminders… Population: The group about whom we want to draw our inference Sample Frame: Members of the population who could potentially be in our sample Coverage Error: The extent to which members of population are excluded from sample frame (not good)

4 Welcome… …to a hopefully productive lesson on SAMPLING METHODOLOGY! What’s ideal? Nifty tricks?? Common misconceptions??? Limitations of our methods????????? P.S. We are going to do (some) math and it is going to be FUN!!!

5 Simple Random Sampling (what’s ideal) Members of a sample frame, which hopefully includes our entire population, are selected one at a time independently & without replacement (Drawing names out of a hat) Sample is equal in expectation to population on all outcomes, but no guarantees

6 Stratified Random Sampling (possibly even more ideal) Use criterion to divide sample frame by group membership (e.g. racial category) Randomly sample within each group What is the advantage of this procedure?

7 Scenario… We want to know what percentage of Americans support Obama for president We need 1100 members from each racial group to be confident about group means (more on this later) American Indians / Alaskan Natives comprise 1% of our population. Through simple random sampling, how large of a sample would we theoretically need to reach n = 1100 for this subgroup?

8 Scenario cont’d… OR, we could use stratified random sampling and draw 1100 from each subgroup without all this trouble. BUT, now we have oversampled from American Indians--they are over- represented in our sample! Implications? Solutions?

9 (This data is very fake) Proportion supporting B.O. African American:.50 Asian American:.50 Latino:.50 White:.50 American Indian: 0 Unweighted avg: ??

10 Weighting (nifty trick) Now, let’s do a weighted average instead… What’s going on here? 99% (.50) + 1% (0) = 49.50% Big difference, eh?

11 So, why was 1100 an ideal subgroup number? Because no matter how large your population, a sample of 1100 will get you very close to the true population value if your outcome is binary (e.g. Obama: Yes or No) How come?

12 Because this man said so William Sealy Gossett (1876-1937) Chemist, “math person”, Guinness Brewery worker A patient man

13 Yes, a patient man Using barley (somehow), spent two years empirically studying relationship between sample means and population means. “The Probable (Standard) Error of a Mean” (1908) Standard errors are what we use to estimate sampling error

14 Sampling error Describes how closely our sample mean allows us to estimate our population mean Conceptually similar to a confidence interval (Dillman, p. 207; http://www.researchsolutions.co.nz/sample_sizes.htm Depends on: Population variance (“spread”) (estimated by sample variance) Sample size Population size (to a point)

15 Sampling error: big picture Larger variances and (to a point) larger population sizes require larger samples to estimate the population mean at a given level of precision Increasing sample size reduces sampling error, BUT there are diminishing returns to increasing our sample size

16 Sampling error: big picture Diminishing Returns? For large populations… Increasing “n” from 100 to 200 is helpful Increasing from 500-600 is less helpful Increasing from 1200-1300 helps very little (no matter how large the population)

17 Why Diminishing Returns? Because there is an upper bound (“ceiling”) on the variance of any sample. For binary (Yes/no, “1” or “0”) outcomes, max variance is.25 Thus, it’s only a matter of time till more “n” in the denominator makes our standard error very low

18 Why Diminishing Returns? Even for continuous outcomes, there is still an upper bound on variance unless scale is infinite Thus, there are still diminishing returns on increasing “n” For more on this topic… -take S-012 -look up Confidence Intervals in stats books “You don't need a large sample of users to obtain meaningful data: Continuous Data (e.g. Task Time)” http://www.measuringusability.com/sample_continuous.htm

19 Limitations of Sampling error calculations Does not take coverage error into account! Assumes you have drawn an simple random sample (e.g. does not take “clustering” into account)

20 Clustering??? There are 20,000 students in a city with 40 schools. We want a sample of 1100 Ideally, we would draw students at random from every school. But, it would be cheaper and easier if we drew a few schools at random and obtained information from every student Implications?

21 Clustering??? If there is a lot of school-level variation in our outcome, our sample will not be representative and our sample estimate will be biased. Sampling error formula does not account for this possibility

22 One more limitation of sampling error formula Non-response bias Even if you have drawn a beautifully random sample, your sample estimate will be biased if those who do not return your survey are different on your outcome of interest. That’s why Dillman’s advice on getting high response rates is so important!


Download ppt "You want to survey a school You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students."

Similar presentations


Ads by Google