From Confidence to Hypothesis-Testing @UWE_KAR Practice & Communication of Science From Confidence to Hypothesis-Testing @UWE_KAR
Where We Are/Where We Are Going Many things we measure are normally dist When we sample an ND population, we get… the standard deviation of the sample sample mean estimate of the population mean plus a measure of the mean’s distribution (the SEM) a 95%CI of the mean calculation uses the t-distribution (via t-table) 95%CI = mean ± (t(N-1),0.05 * SEM) range around the estimated mean in which 95/100 further estimates of the pop mean would lie So far, so descriptive… we are using the data to estimate the pop mean but what if we already know the pop mean?
Comparing to Something Known If we already know the pop mean then another possibility opens up. Eg… a survey of daily travel time to UWE (min) gave… 26,33,65,28,34,55,25,44,50,36,26,37,43,62,35,38, 45,32,28,34 mean is 38.8 ± 11.7 min, n = 20 national average for commuting is 45.8 min we can ask the following question; are our times… significantly different to the national average? ie does the difference of 7 min mean something? this is hypothesis-testing, where… not significantly different is the Null Hypothesis significantly different is the Alternative Hypothesis
Testing Differences We can extend our use of the t-dist to help us decide which to accept and which to reject UWE (38.8 ± 11.7 min, n=20) vs Nation (45.8) SEM = 11.7/√20 = 2.62 DoF = N-1 = 19 95%CI = sample mean ± (t(20-1),0.05 * SEM) = 38.8 ± (t(20-1),0.05 * 2.62) = 38.8 ± (2.093 * 2.62) = 38.8 ± 5.48= (33.32, 44.28 min) Confident that 95/100 UWE surveys would yield a mean between 33.32 and 44.28 min does not include the National average, so we conclude we are significantly different to the rest!
Testing Differences In practice usually calculate slightly differently calculate the diff between sample and pop means calculate t as diff / SEM (this ‘standardises’ it) compare t to ‘critical t’ in t-table if t > critical-t then reject Null Hypo UWE (38.8 ± 11.7 min, n=20) vs Nation (45.8) diff = 38.8 – 45.8 = -7 min SEM diff = 11.7/√20 = 2.62 t = -7/2.62 = -2.67 = 2.67 (we ignore sign) critical t(20-1),0.05 = 2.093 2.67>2.093 (SEMs to start of 2.5% ‘tail’) so in ‘tail’, so p<0.05, so reject the Null Hypo
Testing Differences Or stick the numbers into a stats package like Minitab and ask it to do a 1-sample t-test One-Sample T: C1 Test of mu = 45.8 vs not = 45.8 Variable N Mean StDev SE Mean 95% CI T P UWE time 20 38.80 11.70 2.62 (33.33, 44.27) -2.68 0.015 Note actual p-value calculated.. ie 1.5% chance that a difference in travel time this big would be seen if travel to UWE was no different to the national picture ‘no different’ is the Null Hypo so reject the Null Hypo in favour of alternative travel to UWE is significantly quicker than national average
A Special Case of the 1-sample t-test Say we surveyed the same people at UWE before and after some road improvements are interested in seeing the effect of improvements for each person, we have two pieces of data; before and after travel times if we put the data into two columns, then adjacent cells ‘pair up’ – the data are said to be paired now calculate difference between each data pair end up with third column containing the differences it will have a mean, an SD and an ‘n’ The Null Hypo says that the mean of our third column is not significantly different to 0 we are doing a paired t-test
The Paired t-test Before travel time (min)…26,33,65,28,34, 55,25,44,50,36,26,37,43,62,35,38,45,32,28,34 38.8 ± 11.7 min, n = 20 After travel time (min)…28,30,62,29,31, 54,22,41,52,33,25,38,43,60,31,37,42,31,29,34 37.6 ± 11.5 min, n = 20 At first sight this does not look promising… two means are close, and huge overlap of SDs Differences (min)…-2,3,3,-1,3,1,3,3,-2,3,1, -1,0,2,4,1,3,1,-1,0 1.2 ± 1.908 min, n = 20 so, SEM = 1.908/√20 = 0.427 min
The Paired t-test t = mean diff / SEM from the table, critical t(20-1),0.05 = 2.093 2.81>2.093 (SEMs to start of 2.5% ‘tail’) our Null Hypo says that expected mean diff is 0 so 0 min (the Null Hypo) is in ‘tail’ so p<0.05 so reject the Null Hypo the shortening of journey time by an average of 1.2 min is significant! The paired t-test is very powerful as it compensates for between subject variation
The Paired t-test In practice put paired data into Minitab and ask it to do a paired t-test… N Mean StDev SE Mean Before 20 38.80 11.70 2.62 After 20 37.60 11.47 2.56 Difference 20 1.200 1.908 0.427 95% CI for mean difference: (0.307, 2.093) T-Test of mean difference = 0 (vs not = 0): T-Value = 2.81 P-Value = 0.011 Note actual p-value calculated.. 1.1% chance that a difference in travel time this big would be seen if road improvements had no effect ‘no different’ is the Null Hypo so reject the Null Hypo in favour of alternative travel to UWE has been significantly improved
The 2-sample t-test The paired t-test has two columns of data, but the test is actually done on a single column, the differences between pairs of data but what if we have two data samples that don’t ‘pair up’ (they are independent of each other)? eg data from males and also from females This is where the 2-sample t-test comes in… the 2-sample t-test is an extension of what we just looked at aka unpaired t-test but we will approach it in a way that also incorporates the four general steps that underpin hypothesis-testing
The Basis of the 2-sample t-Test Say we are looking at growth of schoolchildren, and we measured heights… the values of heights will vary and their distribution might look pretty ‘normal’ but science is all about trying to explain variation, so one part of our ‘explanation’ of variation in height might be that some of it is down to gender
The Basis of the 2-sample t-Test Ie, when it comes to height, male and female schoolchildren belong to different populations…
The Basis of the 2-sample t-Test The Null Hypothesis says… No, the two sets of data (male and female) are drawn from the same population (just plain schoolchildren) How can we decide? We know that repeated sampling from a single population yields different means, just by chance So how far ‘apart’ must our male & female means be before we conclude it is not just chance? We need some sort of ‘measure’ of separation called the ‘test statistic’ and a probability level, p, we are happy with generally use the 95% (5% or 0.05) level
The Basis of the 2-sample t-Test The ‘test statistic’ is a measure of ‘separation’ of our two putative populations, male & female for the t-test it is the t-value Depends on diff in means and variability… big difference in means implies… a strong ‘signal’ that the two populations differ diff would be zero if samples the same big variability implies… a lot of ‘noise’ masking the diff in means ‘signal’ The t-value is like a signal-to-noise ratio!
The Basis of the 2-sample t-Test The bigger our ‘signal-to-noise ratio’, t, the less likely we are dealing with two samples from the same population ie we can reject the Null Hypo and… accept the Alternative Hypo that male and female schoolchildren differ significantly in their heights How big t needs to be depends on… degrees of freedom (N-2 in this case) p-value we are working at (alpha, usually 0.05) We look up the critical value of t in the t-table, just like we did when calculating a Confidence Interval
2-sample t-Test – Worked Example Say we are looking at lung function in schoolchildren, say the FVC… 50 boys and 55 girls (n doesn’t have to be same) Male data… 2.159, 2.065, 1.518, 2.227, 2.09, 2.451, 1.871, 2.571, 2.532, 2.545, 2.538, 2.795, 2.102, 1.804, 2.432, 2.704, 2.258, 2.282, 1.663, 2.795, 2.238, 1.953, 2.382, 2.344, 2.967, 2.68, 2.413, 2.444, 1.953, 2.314, 2.15, 2.634, 2.598, 2.09, 2.641, 2.92, 2.727, 2.307, 2.76, 2.439, 2.259, 2.111, 2.58, 2.602, 2.461, 3.128, 2.241, 2.602, 3.177, 2.419 𝑥 = 2.399, s = 0.348 L, n = 50 Female data… 1.913, 2.18, 1.56, 1.586, 1.712, 2.038, 1.791, 1.869, 2.296, 1.897, 1.846, 2.246, 2.318, 1.934, 1.92, 1.958, 2.521, 2.04, 2.19, 1.886, 1.734, 2.148, 2.198, 2.351, 2.193, 1.772, 2.38, 1.776, 2.505, 2.438, 2.317, 2.857, 2.604, 2.275, 1.727, 2.185, 2, 2.428, 2.304, 1.775, 2.537, 1.904, 2.519, 2.611, 2.425, 2.302, 2.366, 1.999, 3.111, 1.923, 2.978, 2.673, 2.311, 2.428, 2.407 𝑥 = 2.185, s = 0.345 L, n = 55
2-sample t-Test – Worked Example Like any statistical test there are four stages… Formulate the Null Hypothesis Generate a test statistic Use the test statistic to work out a probability Interpret the probability 1: Formulate the Null Hypothesis there is a probability of 5% or more that our observed differences in FVC between male and female schoolchildren arose by chance and they both belong to a single underlying population called schoolchildren ie gender has no significant influence on FVC in schoolchildren
2-sample t-Test – Worked Example 2: Generate the test statistic, t previously, t calculated from diff in means/SEM easy to get the simple difference in male and female means but we have two sets of data contributing to the SEM of the difference if the variances (square of SD) are similar… SEM diff = √(SDA2/nA + SDB2/nB) for our data t = (2.399 – 2.385) √((0.3482/50) + (0.3452/55)) t = 3.16
2-sample t-Test – Worked Example 3: Use test statistic to look up the probability Row; DoF is N-2 (103) Column; level of ‘confidence’ α = 0.05 (5%) critical t = 1.96 our t, 3.16 > 1.96 ie a mean diff of 0 (Null Hypo) lies more than 1.96 SEM along the distribution so p<0.05
2-sample t-Test – Worked Example 4: Interpret the probability p < 0.05 but our t-value (3.16) > critical t (2.58) at α = 0.01 so actually < 0.01 this means that the chances of ‘randomly’ picking two samples from the same population that are as far apart as we saw, with the variability in each sample we saw, is < 1% ie < 1% chance the Null Hypo is true so we conclude that these samples represent two different populations ie we accept the Alt Hypo that males/females differ
2-sample t-Test – Worked Example In practice put data into Minitab and ask it to do a 2-sample t-test… Two-sample T for Male vs Female N Mean StDev SE Mean Male 50 2.399 0.348 0.049 Female 55 2.185 0.345 0.047 Difference = mu (Male) - mu (Female) Estimate for difference: 0.2140 95% CI for difference: (0.0798, 0.3481) T-Test of difference = 0 (vs not =): T-Value = 3.16 P-Value = 0.002 DF = 103 (Both use Pooled StDev = 0.3462) Note actual p-value calculated.. 0.2% chance that a difference in FVC this big would be seen if gender had no effect so reject the Null Hypo in favour of alternative gender significantly affects FVC
Summary Hypothesis-testing tests the Null Hypothesis Like any statistical test there are four stages… Formulate the Null Hypothesis Generate a test statistic (in this case, t) Use the test statistic to work out a probability Interpret the probability For t-test, t = ‘diff’/SEM (‘signal’/’noise’) 1-sample t-test test a sample against an expected mean paired t-test tests ‘before-after’ data (against 0) 2-sample t-test (unpaired t-test) test two independent samples to see if means differ