Stat 251 (2009, Summer) Final Lab TA: Yu, Chi Wai
Hypothesis Testing and Confidence intervals One sample test Two sample test (i) Independent samples 1) Equal variance 2) Unequal variance (ii) Dependent samples (paired samples)
One sample t test Two-sided One-sided OR
Test statistics
Based on the test statistic, we have two ways to test the hypothesis. Compare the test statistic with a critical value; Get a p-value, and then compare it with a significance level α, say α = 0.05 or 0.1.
If p-value is LESS than the significance level, then we conclude that we have enough evidence to reject Ho at the significance level α
If p-value is GREATER than the significance level, then we conclude that we DO NOT have enough evidence to reject Ho at the significance level α We DO NOT say that we ACCEPT H0, because
we DO NOT have enough evidence to reject Ho at the significance level α we have enough evidence to accept Ho at the significance level α
we DO NOT have enough evidence to reject Ho at the significance level α we have enough evidence to accept Ho at the significance level α
Example Download the “biotest.txt” data file Read into R using function read.table() Extract the 1st column and store as ‘x1’ Store the 2nd column as ‘x2’
Example x1 = read.table(“biotest.txt”) [ ,1]
Example 1 Test Find the p-value of this test. 115 112 107 119 138 126 105 104 Take ‘x1’ as the sample in this case Test with the significance level α=0.05 Find the p-value of this test.
t.test( , alternative=“ ”, mu= ) [R] command: t.test t.test( , alternative=“ ”, mu= ) two.sided, less, or greater true value of μ, i.e. μ0 Data set
t.test( , alternative=“ ”,mu= ) with the significance level α=0.05 Two sided, baby! Alternative hypothesis t.test( , alternative=“ ”,mu= ) x1 115 two.sided Data set two.sided, less, or greater true value of μ
One Sample t-test data: x1 t = 0.1841, df = 9, p-value = 0.858 alternative hypothesis: true mean is not equal to 115 95 percent confidence interval: 108.2257 122.9743 sample estimates: mean of x 115.6
One Sample t-test data: x1 t = 0.1841, df = 9, p-value = 0.858 alternative hypothesis: true mean is not equal to 115 95 percent confidence interval: 108.2257 122.9743 sample estimates: mean of x 115.6
α=0.05 One Sample t-test data: x1 t = 0.1841, df = 9, p-value = 0.858 alternative hypothesis: true mean is not equal to 115 95 percent confidence interval: 108.2257 122.9743 sample estimates: mean of x 115.6 α=0.05 we DO NOT have enough evidence to reject Ho at the significance level α=0.05
Conclusion with the significance level α=0.05 p-value = 0.858 We DO NOT have enough evidence to say that the mean of x1 is significantly different from 115 at the level of significance α=0.05.
95 % confidence interval: One Sample t-test data: x1 t = 0.1841, df = 9, p-value = 0.858 alternative hypothesis: true mean is not equal to 115 95 % confidence interval: 108.2257 122.9743 sample estimates: mean of x 115.6 μ0=115 is inside this 95% confidence interval for μ
Example 2 Test with the significance level α=0.05
t.test( , alternative=“ ”, mu= ) One sided (greater) with the significance level α=0.05 t.test( , alternative=“ ”, mu= ) x1 greater 108
α=0.05 One Sample t-test data: x1 t = 2.3314, df = 9, p-value = 0.02232 alternative hypothesis: true mean is greater than 108 95 percent confidence interval: 109.6243 Inf sample estimates: mean of x 115.6 α=0.05 we have enough evidence to reject Ho at the significance level α=0.05 (i.e. accept H1)
Conclusion with the significance level α=0.05 p-value = 0.02232 We have enough evidence to say that the mean of x1 is significantly greater than 108 at the level of significance α=0.05.
By default, the function t.test() includes a 95% confidence interval How to get the confidence interval with other %, i.e. how to change the confidence level, say 99%?
conf.level = 0.99) t.test(x1, alternative=“greater”, mu=108 ) One sided (greater) with the significance level α=0.05 t.test(x1, alternative=“greater”, mu=108 ) Find the 99 % confidence interval for μ. t.test(x1, alternative=“greater”, mu=108, conf.level = 0.99)
Two samples a) Independent samples i) Equal variance ii) Different variances b) Dependent samples (paired t test)
One sample t test Two-sided One-sided
Two sample t test Interest: Mean difference Two-sided One-sided
1 2 3 x1 = read.table(“biotest.txt”) [,1] a) Independent samples x1 = read.table(“biotest.txt”) [,1] x2 = read.table(“biotest.txt”) [,2] Want to test if there is a significant difference between the mean of x1 and that of x2. 1 2 3
Two samples a) Independent samples i) Equal variance ii) Different variances b) Dependent samples (paired t test)
t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) ( Two-sided, two samples ) i) Equal variance t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T)
t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) ( Two-sided, two samples ) i) Equal variance t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) Input the data sets of two samples, x1 and x2
t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) ( Two-sided, two samples ) i) Equal variance t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) Two sided test, and under H0, assume that the true mean difference is 0
t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) ( Two-sided, two samples ) i) Equal variance t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T) Assume that the two samples have equal variance.
Two Sample t-test (equal variance, a pooled variance estimate ) data: x1 and x2 t = -0.9052, df = 18, p-value = 0.3773 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -15.940831 6.340831 sample estimates: mean of x mean of y 115.6 120.4
Two Sample t-test data: x1 and x2 t = -0.9052, df = 18, p-value = 0.3773 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -15.940831 6.340831 sample estimates: mean of x mean of y 115.6 120.4
Two Sample t-test data: x1 and x2 t = -0.9052, df = 18, p-value = 0.3773 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -15.940831 6.340831 sample estimates: mean of x mean of y 115.6 120.4
Two Sample t-test data: x1 and x2 t = -0.9052, df = 18, p-value = 0.3773 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -15.940831 6.340831 sample estimates: mean of x mean of y 115.6 120.4
t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = F) ii) Different variances t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = F) Welch Two Sample t-test data: x1 and x2 t = -0.9052, df = 16.987, p-value = 0.378 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -15.988641 6.388641 sample estimates: mean of x mean of y 115.6 120.4
Confidence interval for the mean difference i) Equal variance t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = T, conf.level = 0.90) ii) Different variances t.test(x1, x2, alternative=“two.sided”, mu=0, var.equal = F, conf.level = 0.90)
t.test(x1, x2, alternative=“two.sided”, mu=0, paired = T) b) Dependent samples (paired data) This test is used when the samples are dependent; i.e., when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been paired. the data is in form of (x1, x2) t.test(x1, x2, alternative=“two.sided”, mu=0, paired = T)
Paired t-test data: x1 and x2 t = -3.3247, df = 9, p-value = 0.008874 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.066013 -1.533987 sample estimates: mean of the differences -4.8
Since the paired samples have the same sample size, we can simply use to be the test statistic for can be regarded as the one-sample case i.e. we consider the one sample di = xi-yi , where i=1,…,n
t.test(x1-x2, alternative=“two.sided”, mu=0) b) Dependent samples (paired data) t.test(x1, x2, alternative=“two.sided”, mu=0, paired = T) Alternatively, t.test(x1-x2, alternative=“two.sided”, mu=0)
One Sample t-test data: x1 - x2 t = -3.3247, df = 9, p-value = 0.008874 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -8.066013 -1.533987 sample estimates: mean of x -4.8
Paired t-test data: x1 and x2 t = -3.3247, df = 9, p-value = 0.008874 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -8.066013 -1.533987 sample estimates: mean of the differences -4.8
t.test(x1, x2, alternative=“two.sided”, mu=0, paired = T) b) Dependent samples (paired data) t.test(x1, x2, alternative=“two.sided”, mu=0, paired = T) No need to consider whether the variances of two samples are equal or not!! Alternatively, t.test(x1-x2, alternative=“two.sided”, mu=0)
Not reject Ho Not reject Ho Reject Ho Remark: Use the same data sets of x1 and x2 Two Sample t-test (equal variance) data: x1 and x2 t = -0.9052, df = 18, p-value = 0.3773 Not reject Ho Welch Two Sample t-test (different variance) data: x1 and x2 t = -0.9052, df = 16.987, p-value = 0.378 Not reject Ho Paired t-test data: x1 and x2 t = -3.3247, df = 9, p-value = 0.008874 Reject Ho
Final Remarks Notice that the conclusion from the two sample t-test and the paired t-test are different even if we are looking at the same data set. Should check if the two sample are independent or not.