Today: Hypothesis testing p-value
Example: Paul the Octopus In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is this evidence that Paul’s chance of guessing correctly, p, is really greater than 50%? What are the null and alternative hypotheses? a) H 0 : p ≠ 0.5, H 1 : p = 0.5 b) H 0 : p = 0.5, H 1 : p ≠ 0.5 c) H 0 : p = 0.5, H 1 : p > 0.5 d) H 0 : p > 0.5, H 1 : p = 0.5
For testing H 0 : p = 0.5, H 1 : p > 0.5. With sample size of n. Which of these sample statistics do you think is statistically significant against the null hypothesis? Statistical Evidence
For testing H 0 : p = 0.5, H 1 : p > 0.5. With sample size of n. Which of these sample statistics do you think is statistically significant against the null hypothesis? Statistical Evidence
If it is very unusual, we have statistically significant evidence against the null hypothesis Today’s Question: How do we measure how unusual a sample statistic is, if H 0 is true? How unusual is it to see a sample statistic as extreme as that observed, if H 0 is true?
To see if a statistic provides evidence against H 0, we need to see what kind of sample statistics we would observe, just by random chance, if H 0 were true Measuring Evidence against H 0
We need to know what kinds of statistics we would observe just by random chance, if the null hypothesis were true How could we figure this out??? Simulate many samples of size n = 8 with p = 0.5 Example: Paul the Octopus
Simulate! We can simulate this with a coin! Each coin flip = a guess between two teams (Heads = correct, Tails = incorrect) Flip a coin 8 times, count the number of heads, and calculate the sample proportion of heads Did you get all 8 heads (correct)? (a) Yes(b) No How extreme is Paul’s sample proportion of 1?
Simulate the statistics we would observe just, if the null hypothesis were true
Quantifying Evidence We need a way to quantify evidence against the null…
p-value The p-value is the chance of obtaining a sample statistic as extreme (or more extreme) than the observed sample statistic, if the null hypothesis is true The p-value can be calculated as the proportion of statistics in the simulated distribution that are as extreme (or more extreme) than the observed sample statistic
p-value: Paul the Octopus
1000 Simulations p-value = If Paul is just guessing, the chance of him getting all 8 correct is p-value Proportion as extreme as observed statistic observed statistic
1.What kinds of statistics would we get, just by random chance, if the null hypothesis were true? (We looked at the simulated distribution) 2.What proportion of these statistics are as extreme as our original sample statistic? (p-value) Calculating a p-value
A one-sided alternative contains either > or < A two-sided alternative contains ≠ The p-value is the proportion in the tail in the direction specified by H 1 For a two-sided alternative, the p-value is twice the proportion in a single tail Alternative Hypothesis
p-value and H 1 Upper-tail (Right Tail) Lower-tail (Left Tail) Two-tailed
p-value and H 0 If the p-value is small, then a statistic as extreme as that observed would be unlikely if the null hypothesis were true, providing significant evidence against H 0 The smaller the p-value, the stronger the evidence against the null hypothesis and in favor of the alternative
The smaller the p-value, the stronger the evidence against H o. p-value and H 0
Formal Decisions If the p-value is small: REJECT H 0 the sample would be extreme if H 0 were true the results are statistically significant we have evidence for H 1 If the p-value is not small: DO NOT REJECT H 0 the sample would not be too extreme if H 0 were true the results are not statistically significant the test is inconclusive; either H 0 or H 1 may be true
H 0 : X is not guilty H 1 : X is guilty If the evidence proves the defendant guilty beyond a reasonable doubt, the verdict is 'guilty'. If the evidence is not enough to prove the defendant guilty, a 'not guilty' verdict does not mean the judge or jury concluded that the defendant is innocent -- it just means that the evidence was not strong enough to persuade the judge or jury that the defendant was guilty. Guilty or not Guilty
“For the logical fallacy of believing that a hypothesis has been proved to be true, merely because it is not contradicted by the available facts, has no more right to insinuate itself in statistical than in other kinds of scientific reasoning…” -Sir R. A. Fisher Never Accept H 0 “Do not reject H 0 ” is not the same as “accept H 0 ”! Lack of evidence against H 0 is NOT the same as evidence for H 0 !
Formal Decisions A formal hypothesis test has only two possible conclusions: 1.The p-value is small: reject the null hypothesis in favor of the alternative 2.The p-value is not small: do not reject the null hypothesis How small?
Significance Level The significance level, , the probability that type I error occurs Is the threshold below which the p-value is deemed small enough to reject the null hypothesis p-value < Reject H 0 p-value > Do not Reject H 0
Type I Error and Type II Error “Guility or not Guilty” example: Type I error : innocent person go to jail Type II error : unpunished criminals H o : X is not guilty H 1 : X is guilty
Choosing α By default, usually α = 0.05 If a Type I error (rejecting a true null) is much worse than a Type II error, we may choose a smaller α, like α = 0.01 If a Type II error (not rejecting a false null) is much worse than a Type I error, we may choose a larger α, like α = 0.10
Class Project Find an a question that you would like to investigate. (You may find an idea from the news or the web) Answer the following question in 3-5 pages: 1.What is the back ground of the study. 2.Find existing data or collect data. 3.Perform exploratory data analysis 4.Construct your hypothesis regarding the research question 5.Perform a hypothesis test 6.Conclusion
Project Proposal due Oct 30th Submit half-page document: Describe the question that you want to investigate. i.e population parameter of interest. Describe the data that you obtained. If found online, provide the link