Hypothesis testing Summer Program Brian Healy
Last class Study design Study design –What is sampling variability? –How does our sample effect the questions we can answer? Basics of probability Basics of probability Central limit theorem Central limit theorem Sample mean Sample mean
What are we doing today? Rare event Rare event p-value p-value Hypothesis test Hypothesis test t-distribution / sample standard deviation t-distribution / sample standard deviation
Big picture We discussed last week that we could estimate the population mean with the sample mean and the central limit theorem told us the distribution of the sample mean. We discussed last week that we could estimate the population mean with the sample mean and the central limit theorem told us the distribution of the sample mean. Now, we are going to consider testing whether or not our sample mean is equal to a hypothesized value. We call this hypothesized value the null hypothesis. This test allows us to compare our sample to a value in a statistically meaningful way. Now, we are going to consider testing whether or not our sample mean is equal to a hypothesized value. We call this hypothesized value the null hypothesis. This test allows us to compare our sample to a value in a statistically meaningful way.
Null hypothesis We set up our null hypothesis so that we can reject the null hypothesis. The test is designed to disprove the null We set up our null hypothesis so that we can reject the null hypothesis. The test is designed to disprove the null The first and most important step in any problem. This part requires knowledge of the problem. The first and most important step in any problem. This part requires knowledge of the problem. Notation: H 0 Notation: H 0 H 0 : My mother can run a 5 minute mile. H 0 : My mother can run a 5 minute mile. –Not: My mother cannot run a 5 minute mile. H 0 : The probability of heads on the coin is 0.5. H 0 : The probability of heads on the coin is 0.5. –Not: The probability is not 0.5
Alternative hypothesis Notation: H A or H 1 Notation: H A or H 1 Has two characteristics Has two characteristics –Must cover all values not included in the null –Must contain the value that we think is going to happen H A : My mother runs a mile slower than 5 minutes H A : My mother runs a mile slower than 5 minutes H A : The probability of heads is not 0.5 H A : The probability of heads is not 0.5
Hypothesis test Definition: A statistical test of a null hypothesis Definition: A statistical test of a null hypothesis Completed under the assumption that the null is true (conditional probability) Completed under the assumption that the null is true (conditional probability) Always want to disprove the null hypothesis Always want to disprove the null hypothesis –Ex. H 0 : Mom’s mean time<=5:00 –H A : Mom’s mean time>5:00 –Alternatively: H 0 : Probability of heads=0.5 –H A : Probability of heads != 0.5 The most important step is properly defining the null and alternative hypotheses The most important step is properly defining the null and alternative hypotheses One-sided Two-sided
How do we test this hypothesis? Take a sample Take a sample As we have discussed, we want to think carefully about the how to collect the sample to ensure that we limit bias confounding and allow the results to be generalized to the proper population. As we have discussed, we want to think carefully about the how to collect the sample to ensure that we limit bias confounding and allow the results to be generalized to the proper population. From this sample, we can find a summary statistic and compare this to null hypothesis From this sample, we can find a summary statistic and compare this to null hypothesis –Mean (t-test, linear regression) –Median (Wilcoxon tests, quantile regression)
What does this have to do with the CLT? To test a hypothesis, we take a sample and find the sample mean To test a hypothesis, we take a sample and find the sample mean –Ex. Have my mom run a mile 10 times, or flip the coin 20 times –Determining the proper sample size is next class Under the null hypothesis, we know the population mean Under the null hypothesis, we know the population mean We sometimes may know the population variance We sometimes may know the population variance The distribution of the sample mean is normal with known mean and variance under these conditions The distribution of the sample mean is normal with known mean and variance under these conditions
Distribution of test statistic Under the null hypothesis, we know that the distribution of is normal with mean and standard deviation Under the null hypothesis, we know that the distribution of is normal with mean and standard deviation Now, we want to find the probability of observing the sample mean or a value more extreme, under the null (p-value) to see if the null hypothesis is likely true or false. Now, we want to find the probability of observing the sample mean or a value more extreme, under the null (p-value) to see if the null hypothesis is likely true or false. Have we observed a rare event? Is it rare enough to reconsider the null? Have we observed a rare event? Is it rare enough to reconsider the null?
What is a rare event? My mom claims that she runs a mile in 5 minutes. My mom claims that she runs a mile in 5 minutes. I think she can’t I think she can’t How can I test this? How can I test this? What happens if she ran a mile in What happens if she ran a mile in –5:15 minutes? –6 minutes? –10 minutes? What if she ran 5 separate miles at 10 minutes on average? What if she ran 5 separate miles at 10 minutes on average?
What is a rare event? You play a game against a friend. In this game, you win a dollar if the coin is heads and you lose a dollar if the coin is tails You play a game against a friend. In this game, you win a dollar if the coin is heads and you lose a dollar if the coin is tails What is the null hypothesis? What is the null hypothesis? What if the coin landed on tails 2 consecutive times? What if the coin landed on tails 2 consecutive times? What if the coin landed on tails 10 consecutive times? What if the coin landed on tails 10 consecutive times? At what point would you start to get suspicious? At what point would you start to get suspicious? We want to know if the event we observed could have happened simply by chance or if something else is more likely going on We want to know if the event we observed could have happened simply by chance or if something else is more likely going on
P-value Tells you how rare the event is Tells you how rare the event is Definition: Given a null hypothesis, the probability of the observed value or something more extreme Definition: Given a null hypothesis, the probability of the observed value or something more extreme P(event or something more extreme | H o is true) P(event or something more extreme | H o is true) Ex. Coin toss problem Ex. Coin toss problem –Null hypothesis: P(tails)=0.5 –Sample 9 out of 10 tails –P(9 or more tails | H 0 is true)=P(9 tails | H 0 is true)+P(10 tails | H 0 is true)=0.011
Alpha level-type I error Definition: probability of rejecting the null hypothesis when the null hypothesis is in fact true (rejection probability). Definition: probability of rejecting the null hypothesis when the null hypothesis is in fact true (rejection probability). Usually 0.05 or 0.1, but set by the investigator Usually 0.05 or 0.1, but set by the investigator Compare the p-value to the alpha level to determine if you have a significant result. This value defines how rare an event needs to be for use to say that the event did not occur by chance. Compare the p-value to the alpha level to determine if you have a significant result. This value defines how rare an event needs to be for use to say that the event did not occur by chance. It is called an error because this conclusion that the result was not due to chance is wrong of the time. It is called an error because this conclusion that the result was not due to chance is wrong of the time. One-sided or two-sided One-sided or two-sided
Steps for hypothesis testing 1) State null and alternative hypotheses 2) State type of test and alpha level 3) Determine and calculate appropriate test statistic 4) Calculate p-value 5) Decide whether to reject or not reject the null hypothesis NEVER accept nullNEVER accept null 6) Write conclusion
Example A study in New Bedford was looking at pregnant teens to see how long after pregnancy did each young woman arrive at the physician’s office for the first visit and the amount of time between the first visit and the second visit. A study in New Bedford was looking at pregnant teens to see how long after pregnancy did each young woman arrive at the physician’s office for the first visit and the amount of time between the first visit and the second visit. Questions: Do teens from a low income area arrive at a clinic later than the average woman? Is there more time between the first and second visit among these teens? Questions: Do teens from a low income area arrive at a clinic later than the average woman? Is there more time between the first and second visit among these teens?
It is known that the average amount of time from conception until a woman first visits her doctor is 8.5 weeks (this number is an estimate because it is difficult to know exactly when conception occurred) and the average amount of time from first visit to second visit is 4.3 weeks. It is known that the average amount of time from conception until a woman first visits her doctor is 8.5 weeks (this number is an estimate because it is difficult to know exactly when conception occurred) and the average amount of time from first visit to second visit is 4.3 weeks. For the moment, let’s assume that we know the population standard deviations for each of these are 2.6 weeks and 2.2 weeks, respectively. For the moment, let’s assume that we know the population standard deviations for each of these are 2.6 weeks and 2.2 weeks, respectively. We have collected a sample of 35 pregnant teens and we would like to know if they take longer to get their first visit than the average woman We have collected a sample of 35 pregnant teens and we would like to know if they take longer to get their first visit than the average woman
Sample data As with all of the data sets from now on, the data is on the BIO232 website. As with all of the data sets from now on, the data is on the BIO232 website. Let’s determine the mean for this sample and compare it to the hypothesized value. Let’s determine the mean for this sample and compare it to the hypothesized value. preg<-read.table(“preg.dat”, header=T) preg<-read.table(“preg.dat”, header=T)first<-preg[,1] mean(first) #This is the sample mean [1] 9.74 So the sample mean is clearly not equal to the population mean (8.5 weeks), but is it sufficiently different to say that these girls are different than the population. So the sample mean is clearly not equal to the population mean (8.5 weeks), but is it sufficiently different to say that these girls are different than the population.
Steps for hypothesis testing 1) Null: =8.5 weeks, Alternative: != 8.5 weeks 2) One sample hypothesis test, alpha=0.05 3) 3) 4) Area in upper tail = , p-value = ) Reject null 6) Conclusion: There is a difference in the amount of time from conception to the first visit to a physician. The time is longer for the pregnant teens.
Picture Here is a picture Here is a picture 8.5 Area= Area=0.0024
Normal hypothesis test in R To complete a normal hypothesis test in R, you can simply use the pnorm command with the appropriate mean and standard deviation. Remember, pnorm provides the area in the lower tail in all cases To complete a normal hypothesis test in R, you can simply use the pnorm command with the appropriate mean and standard deviation. Remember, pnorm provides the area in the lower tail in all cases For the previous problem, to get the appropriate 2-sided p-value, use For the previous problem, to get the appropriate 2-sided p-value, use(1-pnorm(9.74,8.5,2.6))*2
Another way to look at the test Given a specific alpha level, you can find the cut-off for which all values more extreme, the null hypothesis would be rejected Given a specific alpha level, you can find the cut-off for which all values more extreme, the null hypothesis would be rejected The region more extreme is called the rejection region The region more extreme is called the rejection region For our present problem, the cut-off for the rejection region would be For our present problem, the cut-off for the rejection region would be 8.5 Area=0.025 cut-off=9.36
Practice Here are the times my mom ran in the 10 trials. Test the null hypothesis that she can runs a 9:00 mile on average. Here are the times my mom ran in the 10 trials. Test the null hypothesis that she can runs a 9:00 mile on average. mom<-c(9.5, 10, 8.75, 9, 11.2, 8.65, 9.6, 10.2, 8.8, 9.8) mom<-c(9.5, 10, 8.75, 9, 11.2, 8.65, 9.6, 10.2, 8.8, 9.8) What are the null and alternative hypotheses? What are the null and alternative hypotheses? What do you conclude? What do you conclude? What would have happened if we had completed a two-sided test? What would have happened if we had completed a two-sided test?
Comparison of one-sided and two- sided tests Two-sided p-value is always twice one-sided p- value. Two-sided p-value is always twice one-sided p- value. Two-sided test is more conservative because the rejection region is split between the high and low side. For the one-sided test, the rejection region is only on the side of interest Two-sided test is more conservative because the rejection region is split between the high and low side. For the one-sided test, the rejection region is only on the side of interest Two-sided test most common in literature even though usually people know the direction of effect they are interested in detecting. Two-sided test most common in literature even though usually people know the direction of effect they are interested in detecting. Picture Picture
Wait a minute Up to now, assumed we know the population variance (is this a good assumption?) Up to now, assumed we know the population variance (is this a good assumption?) How could we estimate the population variance? How could we estimate the population variance? –Sample variance!!! – Is the sample variance exactly equal to population variance? –How can we account for the additional uncertainty? Now, we need to do a little math Now, we need to do a little math
t-distribution Assume X i are iid normal Assume X i are iid normal Normal distribution Normal distribution Chi-square distribution (Proof of this is given in Casella and Berger and in Inference I) Chi-square distribution (Proof of this is given in Casella and Berger and in Inference I) t-distribution- ratio of Normal (U) and chi- square (V) t-distribution- ratio of Normal (U) and chi- square (V)
t-distribution Heavier tails than normal distribution Heavier tails than normal distribution –Accounts for additional variability –Tails heavier with fewer degrees of freedom (dof) As dof goes to infinity, t dist normal dist As dof goes to infinity, t dist normal dist Can use t-dist test statistic just as the previous Can use t-dist test statistic just as the previous Remember assumption of underlying normal Remember assumption of underlying normal
Example We can use a t-test to test the second null hypothesis about our pregnant teens, namely that the time from the first visit to the second visit is the same as in the general population We can use a t-test to test the second null hypothesis about our pregnant teens, namely that the time from the first visit to the second visit is the same as in the general population First, we need to ensure that the underlying distribution is approximately normal First, we need to ensure that the underlying distribution is approximately normal
Steps for hypothesis testing 1) Null: =4.3 weeks, Alternative: != 4.3 weeks 2) One sample hypothesis t-test, alpha=0.05 3) 3) 4) p-value = ) Reject null 6) Conclusion: There is a difference in the amount of time from the first visit to the second visit. The time is longer for the pregnant teens.
One sample t-test in R To complete a t-test in R, use To complete a t-test in R, use > t.test(second,mu=4.8) One Sample t-test One Sample t-test data: second t = , df = 34, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x
Practice Using the class data set, test the following hypotheses: Using the class data set, test the following hypotheses: –The average age of an incoming student to the biostat program is 25. Is the mean age of this year’s class significantly different? Is there anything we need to consider in this analysis? –The average height of an incoming student is 71 inches. Is the mean height of this year’s class significantly shorter?
More practice The TV watching habits of my seventh grade classes are shown in the dataset TV.dat from the course website. The gender and age of the students is given as well. How did my students TV watching habits compare to the national average for 7 th graders of 4 hours/day? Use an alpha level of The TV watching habits of my seventh grade classes are shown in the dataset TV.dat from the course website. The gender and age of the students is given as well. How did my students TV watching habits compare to the national average for 7 th graders of 4 hours/day? Use an alpha level of 0.01.