Having Confidence in our Means: Confidence Intervals Scientific Practice Having Confidence in our Means: Confidence Intervals
Samples are Estimates When we sample a population, we end up with a sample mean, x it’s our ‘best guess’ of the real population mean, µ the ‘real’ mean of the population is ‘hidden’ Our sample also has a measure of the variability of the data comprising it the sample Standard Deviation, s which is also an estimate of the population SD, σ s can be also be used to indicate the variability of the mean itself SEM = s / √ N can then use SEM to determine confidence limits
Confidence Limits and the SEM The SEM reflects the ‘fit’ of a sample mean, x , to the underlying population mean, µ if we calculate two sample means and they are the same, but for one the SEM is high, we are less ‘confident’ about how well that one estimates the population mean Just like the ‘raw’ data used to calculate a sample mean follows a distribution, so will repeat estimates of the population mean itself this is the t Distribution
The t Distribution Yet another distribution! but distributions are important because they define how we expect our data to behave if we know that, then we gain insight into our expts! Generally ‘flatter’ than the Normal Distribution any particular area is more ‘spread out’ (less clear) the more ‘pointed’ a curve, the clearer the peak
t Distribution Pointedness Varies! Logic… the number of samples influences the ‘accuracy’ of our estimate of the population mean from the sample mean as N increases, the ‘peak’ becomes sharper a given area of the curve is less ‘spread out’ At high N, t Distribution = Normal Distribution
Using the t Distribution When we calculate a sample mean and call it our estimate of the population mean… it’s nice to know how ‘confident’ we are in that estimate One measure of confidence is the 95% Confidence Interval (95%CI) the range over which we are 95% confident the true population mean lies derived from our sample mean (we calculate) and our SEM (we calculate) and the N (though it’s the ‘degrees of freedom’, N-1) we use this to look up a ‘critical t vlaue’
From t Distribution to 95%CI The t Distribution is centred around our mean and its shape is influenced by N-1 95%CI involves chopping off the two 2.5% tails Need a t table to look up how many SEMs along the x-axis this point will be Value varies with N-1 And level of confidence sought
Step 1: The t Table t value varies with… For N = 10, α = 0.05 Row… DoF is N-1 Column… level of ‘confidence’ 95%CI involves chopping off the two 2.5% tails α = 0.05 (5%) For N = 10, α = 0.05 t(N-1),0.05 = 2.262 when N large, t=1.96
Step 2: Using the t value The t value is the number of SEMs along the x-axis (in each direction) that encompasses that % of the t distribution centred on our mean 2.262 in the case of t(N-1),0.05 Eg we measure the FVC (litres) of 10 people… mean = 3.83, SD = 1.05, N = 10 SEM = 1.05/√10 = 0.332 litres t(N-1),0.05 = 2.262 standard errors to cover 95% curve So, litres either side of the mean = 2.262 * 0.332 = 0.751 litres either side of mean covers 95% of dist So, 95%CI is 3.83 ± 0.751 = 3.079 4.581 litres (3.079, 4.581)
Effect of Bigger N A larger sample size gives us greater confidence in any population mean we estimate so 95%CI should be smaller In previous example… mean = 3.83, SD = 1.05, N =10, SEM = 0.332 95%CI is (3.079, 4.581) But say we measured 90 more people… mean = 3.55, SD = 0.915, N = 100 mean and SD similar to before, but SEM now a lot smaller, at 0.915/√100 = 0.0915 so too is t(N-1),0.05 = 1.96 (rather than 2.262) 95%CI = 3.55 ± (1.96 * 0.0915) = 3.371 3.729
Effect of Bigger N A bigger N ‘sharpens’ the t distribution so that the 95% boundaries are less far apart ie our confidence interval will become smaller 95%CI also shrinks because SEM = SD/√N
Summary Sample means are estimates of population means Bigger samples give more confident estimates SEM reflects the distribution of mean estimates SEM = s / √ N Estimates of means follow the t Distribution t Distribution becomes ‘sharper’ with higher N ‘Width’ of t dist covering 95% is called 95%CI range in which 95/100 mean estimates would fall 95%CI = mean ± (t(N-1),0.05 * SEM) t is the number of SEMs along dist covering that %