Summary Statistics & Confidence Intervals Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal NHS Foundation Trust
Timetable TimeTask 60 minsPresentation 20 minsCoffee Break 90 mins Practical Tasks in IT Room
Outline Sampling Summary statistics Confidence intervals Statistics Packages
‘Population’ and ‘Sample’ Studying population of interest. Usually would like to know typical value and spread of outcome measure in population. Data from entire population usually impossible or inefficient/expensive so take a sample (even census data can have missing values). Sample must be representative of population. Randomise!
E.g. Randomised Controlled Trial (RCT) POPULATIONSAMPLE RANDOMISATION GROUP 1 GROUP 2 OUTCOME
Types of Data Categorical Example: Yes/No Blood Group Graphs: Bar Chart Pie Chart Summary: Frequency (n) Proportion (%) Numerical/Continuous Example: Weight Pain Score Graphs: Histogram Box and Whisker Plot Summary: Mean & Standard Deviation (SD) Median & Inter-quartile range (IQR)
Types of Average (‘Average’ - a number which typifies a set of numbers) Mean = Total divided by n Median = Middle value Mode = Most common value/group (rarely used)
Types of Average - Example Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4 Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10 Mean = ( … + 10) ÷ 10 = 5.8 Median = (6+7) ÷ 2 = 6.5 Mode = 7 5 th 6 th 2 nd 3 rd 8 th 9 th Median
Mean or Median? Roughly Normally distributed: Mean or median Mean by convention Skewed: Median Less affected by extreme values
Variation and Spread Standard Deviation (‘SD’) - Average distance from mean - Use alongside mean Inter-Quartile Range (‘IQR’) - Range in which middle 50% of the data lie (middle 50% when ordered) - Use alongside median Range - Highest and lowest value - Possibly quote in addition to SD/IQR
Types of Variation - Example Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4 Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10 SD = 2.6 IQR = (3.75, 7.25) Range = (1,10) IQR 5 th 6 th 2 nd 3 rd 8 th 9 th Median
Standard Error Not the same as standard deviation. Calculated using a measure of variability and sample size. Used to construct confidence intervals. Not very informative when given alongside statistics or as error bars on a plot.
Sample statistic is the best guess of the (true) population value E.g. Sample mean is the best estimate of mean in population. Mean likely to be different if take a new sample from the population. Know that estimate not likely to be exactly right.
Confidence Intervals (CIs) Confidence interval = “range of values that we can be confident will contain the true value of the population”. The “give or take a bit” for best estimate. Convention is to use a 95% confidence interval (‘95% CI’). But also leaves 5% confidence that this interval does not contain the true value.
Example: Legislation for smoke-free workplaces and health of bar workers in Ireland: before and after study (Allwright et al; BMJ Oct 2005) Before N=138 After N=138 Difference (95% CI) Salivary cotinine (nmol/l) Median (-26.7 to -19.0) Any respiratory symptoms n (%) 90 (65%)67 (49%)-16.7 (-26.1 to -7.3) Runny nose/sneezing n (%) 61 (44%)48 (35%)-9.4 (-19.8 to 0.9)
Example: Supplementary feeding with either ready-to- use fortified spread or corn-soy blend in wasted adults starting antiretroviral therapy in Malawi (MacDonald et al; BMJ May 2009) “After 14 weeks, patients receiving fortified spread had a greater increase in BMI and fat-free body mass than those receiving corn-soy blend: 2.2 (SD 1.9) v 1.7 (SD 1.6) (difference 0.5, 95% confidence interval 0.2 to 0.8), and 2.9 (SD 3.2) v 2.2 (SD 3.0) kg (difference 0.7 kg, 0.2 to 1.2 kg), respectively.”
Example: Sample size matters What proportion of patients attending clinic are satisfied? Sample size Number satisfied Proportion satisfied 95% CI for proportion 10770%35% to 93% %50% to 88% %55% to 82% %60% to 79% %67% to 73%
Example: % confidence matters Sample size = 50 No. satisfied = 35 Proportion satisfied= 70% 90% CI58% to 81% 95% CI55% to 82% 99% CI51% to 85% What proportion of patients attending clinic are satisfied?
p-values vs. Confidence Intervals p-value: -Weight of evidence to reject null hypothesis -No clinical interpretation Confidence Interval: -Can be used to reject null hypothesis -Clinical interpretation -Effect size -Direction of effect -Precision of population estimate
So… it’s not all about p-values! For some hypotheses p-value and CI will both indicate whether to reject it or not. A CI will also provide an estimate, as well as a range for that estimate. General medical journals prefer CI.
Statistical Packages PackageSummary StatisticsConfidence Intervals SPSS Not user-friendly Gives a large choice of statistics to calculate Doesn’t provide a CI for some key comparative statistics: e.g. simple percentage Stats Direct One right-click Will produce a set 20 or so of the most commonly used statistics Provides a CI for most statistics
Thanks for listening!