Sample Size.

Slides:



Advertisements
Similar presentations
Thinking about Probabilities CCC8001. Assignment Watch episode 1 of season 1 of Ancient Aliens.
Advertisements

Probability and Induction
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Authority 2. HW 8: AGAIN HW 8 I wanted to bring up a couple of issues from grading HW 8. Even people who got problem #1 exactly right didn’t think about.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Statistical Issues in Research Planning and Evaluation
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) What is the uncertainty.
Statistics Introduction.
Chapter 19 Confidence Intervals for Proportions.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Thinking about Probabilities CCC8001. Assignment Watch episode 1 of season 1 of “Ancient Aliens.”
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Confidence Intervals and Hypothesis Testing
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Cognitive Biases 4 Fallacies Involving Probability.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Introduction to Data Analysis Probability Distributions.
A Tale of Three Numbers Statistical Significance, Effect Size, and Sample Size.
Evidence Based Medicine
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
A Tale of Three Numbers Statistical Significance, Effect Size, and Sample Size.
AP Statistics Section 11.1 A Basics of Significance Tests
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Section 10.1 Confidence Intervals
AP STATISTICS LESSON INFERENCE FOR A POPULATION PROPORTION.
Inferential Statistics Part 1 Chapter 8 P
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Chapter 8: Estimating with Confidence
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chapter 21: More About Tests
Welcome to MM570 Psychological Statistics
Inference: Probabilities and Distributions Feb , 2012.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Cognitive Biases 4 Fallacies Involving Probability.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Statistics 19 Confidence Intervals for Proportions.
GOVT 201: Statistics for Political Science
Section Testing a Proportion
INF397C Introduction to Research in Information Studies Spring, Day 12
Unit 5: Hypothesis Testing
Chapter 21 More About Tests.
Hypothesis Testing for Proportions
By C. Kohn Waterford Agricultural Sciences
Significance Tests: The Basics
Chapter 10: Estimating with Confidence
Significance Tests: The Basics
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
WARM - UP Is the coin Fair?
Chapter 12 Power Analysis.
CHAPTER 9 Testing a Claim
Presentation transcript:

Sample Size

Review: Base Rate Neglect On HW 4, I asked you to find a fallacy on the internet. More than one of you found studies where scientists took data from thousands of people and found correlations.

Misunderstanding You said, “this is the base rate neglect fallacy. There are billions of people in the world, and these studies only looked at thousands of people. They are neglecting the base rate.” But that is not the base rate neglect fallacy. And this is important to know: you shouldn’t ignore good science just because you’re confused about what the base rate fallacy is.

Base Rate Neglect First of all, the base rate neglect fallacy has nothing at all to do with the number of people there are in the world. Nothing. It has to do with the probability of a variable taking on a certain value, for instance, the probability that someone’s height = 1.5m, the probability that terrorist = true (someone is a terrorist)…

Base Rates This is the “base rate” of people who are 1.5m tall, and the “base rate” of terrorists. If 1 in 100 people are terrorists, then the rate of terrorists is 1 in 100 and the probability that a randomly selected person is a terrorist is 1 in 100.

Base Rates We call this the base rate, because it is the probability that someone is a terrorist when we don’t know anything else about them. It might be that the base rate of terrorists is 1 in 100, but the rate of terrorists among people who are holding rocket launchers is 1 in 2, and the rate of terrorists among retirees is 1 in 500.

Tests The base rate neglect fallacy happens when we have a test that is meant to detect the value of a variable. For example we might have a test that tells us whether someone has AIDS or not, or whether someone is driving over the speed limit, or whether they are drunk.

Reliability of Tests Here is the important, and crucial fact. Please learn this: As the base rate of X = x decreases, the # of false positives on tests for X = x increases. Tests are less reliable when the condition we are testing for becomes rare (low base rate).

Base Rate Neglect Fallacy The base rate neglect fallacy happens when: There is a low base rate of some condition. We have a test for that condition. Someone tests positive. We assume that means they have the condition, ignoring the unreliability of tests for conditions with low base rates.

Prosecutor’s Fallacy The base rate neglect fallacy is often called the prosecutor’s fallacy, as I shall explain.

Murder! Let’s suppose that there has been a murder. There is almost no evidence to go on except that the police find one hair at the crime scene.

You are the Suspect If someone is the killer, there is a 100% chance that their DNA will match the hair’s DNA. The police have a database that contains the DNA of everyone in Hong Kong. They run the DNA in the hair through their database and discover that you are a match!

Comprehension Question If you have been following along you should be able to answer this question: What is the probability that you are the murderer, given that you are a DNA match for the hair?

Answer If you said 100%, then you have just committed the base rate neglect fallacy. The correct answer is “Much lower, because the base rate of people who committed this murder out of the Hong Kong population as a whole is 1 in 7 million.”

Perfect Conditions for Fallacy Here’s what we have: A low base rate (only 1 person who committed this murder in the world). A test for whether someone is the murderer. You, who’ve tested positive on this test. And the police who think you did it!

Let’s Look at the Numbers We know that if you are the murderer, then there is a 100% chance of a DNA match. But what is the false positive rate? How likely is a randomly selected person will match the DNA?

False Results Here’s a quote from “False result fear over DNA tests,” Nick Paton Walsh, The Guardian: “Researchers had asked the labs to match a series of DNA samples. They knew which ones were from the same person, but found that in over 1 per cent of cases the labs falsely matched samples, or failed to notice a match.”

Let’s assume that half of the cases where “labs falsely matched samples, or failed to notice a match.” were cases where they falsely matched samples. So the probability of a false positive is ½ x 1% = 0.5%, or 5 in 1,000.

Since there are 7 million people in Hong Kong, we expect about 0 Since there are 7 million people in Hong Kong, we expect about 0.5% x 7 million = 35,000 of them to match the hair’s DNA. Actually, it’s 35,000 + 1, because the true killer is a match, and not by accident.

So we expect that there are 35,001 DNA matches in all of Hong Kong So we expect that there are 35,001 DNA matches in all of Hong Kong. And only one of them is the murderer. So what is the probability that you are the murderer? 1 in 35,001. That’s way less than 100%.

Important Things to Remember There are three important things to remember: If the test is more accurate (fewer false positives), then it’s more reliable If the base rate is higher, the test is more reliable. If the police have other reasons to suspect you, the test is more reliable.

1. If the test is more reliable… Theoretically, DNA tests only return a false positive about 1 in 3 billion times. In that case, we’d expect only .002 false positives in all of Hong Kong. So your chances of being guilty would be 1 in 1.002, or 99.8%. Still, that’s lower than 100%.

2. If there base rate is higher… Maybe the person who died was stabbed 5,000 times, once each by 5,000 different people. So there are 5,000 murderers. Then with the previous false positive number at 35,000, you have a 5,000 in 40,000 chance of being one of the killers, or 12.5%.

3. If the police have some other reason to suspect you… To figure out your chances of being guilty, we looked at the probability that a randomly selected person from HK would be a DNA match. We were assuming you were randomly selected. But what if you weren’t randomly selected? What if the police tested you because you had a reason to kill the victim?

Reason to Suspect You Then we would have to look at not the probability that a randomly selected person would match, but the probability that a person who had reason to kill the victim would match. Suppose there are 5 people who had reasons to kill the victim, and the killer is one of them.

Much Higher Chance Then your chances are: Let K = you’re the killer and M = you’re a match P(K/ M) = [P(K) x P(M/ K)] ÷ P(M) = [(1/5) x 100%] ÷ P(M) = 0.2 ÷ [(1 + 0.025) ÷ 5] = 97.6%

sampling

Now we know what the base rate neglect bias is (hopefully) Now we know what the base rate neglect bias is (hopefully). But this still doesn’t answer our question: how many people do we need in our scientific study to reliably generalize the results to everyone?

For example, if I want to know whether increased economic dependence in men is correlated with increased infidelity, how many people do I need to study? Surely one is too few. Is 10 fine? Do I need 100? A million?

Sample In statistics, the people who we are studying are called the sample. (Or if I’m studying the outcomes of coin flips, my sample is the coin flips that I’ve looked at. Or if I’m studying penguins, it’s the penguins I’ve studied.) Our question is then: what sample size is needed for a result that applies to the population?

Evaluating Evidence Well, remember what we learned last class. There are two measures of success for a study: Statistical significance: how likely would my results be if they were just due to random chance? Does the study rule out the null hypothesis?

Evaluating Evidence Well, remember what we learned last class. There are two measures of success for a study: Effect size: If I find that A and B are positively correlated, how much does the value of A affect B? What’s the percentage difference in the odds/probability of B as we vary the odds/probability of A?

Two Questions So there are really two questions we’re asking: How many people do I need to study to obtain statistically significant results? How big should my sample be to accurately estimate effect sizes in the population at large?

Law of Large Numbers Luckily, we do know that more is always better. The “Law of Large Numbers” says that if you make a large number of observations, the results should be close to the expected value. (There is no “Law of Small Numbers”)

Average of Dice Rolls

Example Let’s think about a particular problem. Suppose we are having an election between Mitt and Barack and we want to know how many people in the population plan to vote for Mitt. How many people do we need to ask?

Non-Random Samples The first thing we should realize is that it’s not going to do us any good to ask a non-random group of people. Suppose everyone who goes to ILoveMitt.com is voting for Mitt. If I ask them, it will seem like 100% of the population will vote for Mitt, even if only 3% will really vote for him.

Internet Polls (Important Critical Thinking Lesson: Internet polls are not trustworthy. They are biased toward people who have the internet, people who visit the site that the poll is on, and people who care enough to vote on a useless internet poll.)

Representative Samples The opposite of a biased sample is a representative sample. A perfectly representative sample is one where if n% of the population is X, then n% of the sample is X, for every X. For example, if 10% of the population smokes, 10% of the sample smokes.

Random Sampling One way to get a representative sample is to randomly select people from the population, so that each has a fair and equal chance of ending up in the sample. For example, when we randomize our experiments, we randomly sample the participants to obtain our experimental group. (Ideally our participants are randomly sampled from the population at large.)

Problems with Random Sampling Random sampling isn’t a cure-all, however. For example, if I randomly select 10 people from a (Western) country, on average I’ll get 5 men and 5 women. On average. But, on any particular occasion, I might select (randomly) 7 men and 3 women, or 4 men and 6 women.

Stratified Sampling One way to fix these problems would be to randomly sample 5 women and randomly sample 5 men. Then I would always have an even split between men and women, and my men would be randomly drawn from the group of men, while my women were randomly drawn from the group of women.

Example Let’s continue with our example. We’re convinced that we should randomly sample n individuals from the population of women and n from the population of men. Still, what is that number n?

We know that, of the people in our sample, X% will vote for Mitt We know that, of the people in our sample, X% will vote for Mitt. We want to know, of the people in the population, what percent will vote for Mitt? We can never know that it is exactly X, unless we ask everyone. But we can increase our confidence.

Confidence Interval What we can do is find out, based on our sample, that we are Z% sure (confident) that the number of people who will vote for Mitt is between X% and Y%. For example, we can be 90% confident that the percentage of people who vote for Mitt is between 44% and 48%.

Confidence Interval This would mean we think there’s a 10% chance that either less than 44% or more than 48% of people vote for Mitt. The very same data might warrant us in saying that we are 95% confident that the percentage of people who vote for Mitt is between 40% and 52%.

Sample Size Determination So if we want to know how many people to look at, we should determine: What level of confidence we want How big we want our confidence interval to be.

Common Choices Common choices for these numbers are: We want to be 95% confident of our estimation. We want our confidence interval to be 6% wide (e.g. between 42% and 48%).

Expected Value, Deviation Each variable has an expected value (for example, 3.5 is the expected value of a dice roll, the average of all the sides of a die). Each variable has an expected deviation from its expected value: how far are all the dice values (1, 2, 3, 4, 5, 6) from the expected value (3.5)– the answer is 1.5 on average.

Variance The variance is the expected squared deviation– [(6 – 3.5)^2 + (5 – 3.5)^2 + (4 – 3.5)^2 + (3.5 – 3)^2 + (3.5 – 2)^2 + (3.5 – 1)^2] ÷ 6 Or about 2.9 for a die. Don’t worry you don’t need to know this.

Standard Deviation The standard deviation is the square root of the variance. So √2.9 for a die. The important point is that we can use this number, the standard deviation, to figure out how many people we need in our sample.

Solving for Sample Size If we want our confidence interval to be 6% wide, then a 95% confidence interval of this width will be: 4 x standard deviation = 6% The standard deviation of any estimate of a proportion will be √(0.25/n)

A Little Bit of Math So, 4 x standard deviation = 6% 4 x √(0.25/n) = 0.06 √(0.25/n) = 0.06/4 = 0.015 (0.25/n) = 0.015^2 = 0.000225 n = 0.25/0.000225 = 1,111

The Important Point What’s the point? The point is that you need about 1,000 people to be 95% sure that the vote counts you estimate from the sample are within 6% of the actual voting behavior of the population.

Things to Note This doesn’t mean that studies with less than a thousand people can’t tell us anything— What they tell us will just be either less confident than 95% or have greater error bars than 6%. If a confidence interval of 20% is fine, you only need 100 people.

The Base Rate Still Matters It also doesn’t mean that 100 or 1,000 people is sufficient for any study. When the value of the variable being studied is rare in the population, you need more people. For example, if it’s 1 in 1 million, then most samples of 1,000 won’t contain it, but that doesn’t mean it’s at 0 prevalence.