Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio.

Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio

HEALTH EDUCATION & TRAINING INSTITUTE Agenda Welcome Understanding the context Data types Presenting data Common tests Tricks and hints Practice Wrap up

HEALTH EDUCATION & TRAINING INSTITUTE Understanding statistics Never consider statistics in isolation Consider the rest of the article  Who was studied  What was measured  Why was that measure used  Where was the study completed  When was it done It is the author’s role to convince you that their results can be believed!

Types of Data

HEALTH EDUCATION & TRAINING INSTITUTE Types of data Numeric  Continuous (height, cholesterol)  Discrete (number of floors in a building) Categorical  Binary (yes/no, ie born in Australia?)  Categorical (cancer type)  Ordinal categorical (cancer stage)

HEALTH EDUCATION & TRAINING INSTITUTE Histograms Represents continuous variables  Areas of the bars represent the frequency (count) or percent Indicates the distribution of the data

Continuous Data

HEALTH EDUCATION & TRAINING INSTITUTE Measures of association

HEALTH EDUCATION & TRAINING INSTITUTE Stem and leaf plot- heights 6*11 6*2 6*3333333 6*44444444444 6*555555555555 6*66666666666666666666666 6*777777777777777777777777777777 6*8888888888888888 6*99999999999999999999999999999999 7*0000000000000000000000000 7*1111111111111111111 7*222222222222 7*333333 7*44 7*55

HEALTH EDUCATION & TRAINING INSTITUTE Skewed Data

HEALTH EDUCATION & TRAINING INSTITUTE Salient features- the mean The average value: mean - 1978 mean - 2010 Birthweight 3000 grams 3500 grams

HEALTH EDUCATION & TRAINING INSTITUTE Salient features- the median The observation in the middle  Example- newborn birth weights  3100, 3100,3200,3300,3400,3500,3600,3650 g -(3300+3400)/2 = 3350  Not affected by extreme values  Wastes information

HEALTH EDUCATION & TRAINING INSTITUTE Salient features- the mean and median mean - 2010 median- 2010 Birthweight 3356 grams 3350 grams

HEALTH EDUCATION & TRAINING INSTITUTE Mean and Median Mean is preferable Symmetric distributions mean ~ median  Present the Mean Skewed distributions  Mean is pulled toward the ‘tail’  Present the Median

HEALTH EDUCATION & TRAINING INSTITUTE Mean and Median Body fatMeanMedian 13.611.7

HEALTH EDUCATION & TRAINING INSTITUTE Variability – Standard deviation and variance The average distance between the observations and the mean Standard deviation : with original units, ie. 0.3 % Variance =  With the original units squared

HEALTH EDUCATION & TRAINING INSTITUTE Range Example, infant birth weight 3100, 3100,3200,3300,3400,3500,3600,3650, 3800  Range = (3100 to 3800) grams or 700 grams Interquartile range: the range between the first and 3 rd quartiles (Q1 and Q3)  3100, 3100,3200,3300,3400,3500,3600,3650, 3800  IQR = (3200 to 3600) grams or 400 grams

HEALTH EDUCATION & TRAINING INSTITUTE Presenting variability Present standard deviation if the mean is used Present Interquartile range if the median is used

HEALTH EDUCATION & TRAINING INSTITUTE Graphics for Continuous Variables Boxplot : Median 25 th percentile (Q1) 75 th percentile (Q3) outlier IQR Minimum in Q1 Maximum in Q3

Categorical Data

HEALTH EDUCATION & TRAINING INSTITUTE Categorical Variables- table summaries SportFrequencyPercentCumulative Percent Soccer2512.6 Football3718.731.3 Basketball2311.642.9 Swimming2211.154.0 Golf199.663.6 Rugby4422.285.9 Cycling115.691.4 Tennis178.6100.0 TOTAL198100.0

HEALTH EDUCATION & TRAINING INSTITUTE Bar charts Relative frequency for a categorical or discrete variable

HEALTH EDUCATION & TRAINING INSTITUTE Bar chart vs Histogram Histogram  For continuous variables  The area represents the frequency  Bars join together Bar chart  For categorical variables  The height represents the frequency  The bars don’t join together

HEALTH EDUCATION & TRAINING INSTITUTE Pie chart Areas of “slices” represent the frequency

Precision 26

HEALTH EDUCATION & TRAINING INSTITUTE Presenting statistics Tables should need no further explanation Means  No more than one decimal place more than the original data  Standard deviations may need an extra decimal place Percentages  Not more than one decimal place (sometimes no decimal place)  Sample size <100, decimal places are not necessary  If sample size <20, may need to report actual numbers

HEALTH EDUCATION & TRAINING INSTITUTE Example of data presentation

Statistical Inference

HEALTH EDUCATION & TRAINING INSTITUTE Sampling PopulationSample Inference Sampling

HEALTH EDUCATION & TRAINING INSTITUTE Sampling, cont’d A statistic that is used as an estimate of the population parameter. Example: average parity Population Mean Sample Mean

HEALTH EDUCATION & TRAINING INSTITUTE Confidence intervals We are confident the true mean lies within a range of values 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time How does confidence interval change as the sample size increases?

HEALTH EDUCATION & TRAINING INSTITUTE Confidence intervals cont’d

HEALTH EDUCATION & TRAINING INSTITUTE Hypothesis testing Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams? Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams 3500 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population

HEALTH EDUCATION & TRAINING INSTITUTE Hypothesis testing State a null hypothesis:  There is no difference between the sample mean and the true mean: Ho = 3500  Calculate a test statistic from the data t = 2.65  Report the p-value = 0.012

HEALTH EDUCATION & TRAINING INSTITUTE What is a p-value? The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true The smaller the p-value, the more evidence against the null hypothesis  < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference)  > 0.05 – evidence to accept the null hypothesis (not statistically significant)

HEALTH EDUCATION & TRAINING INSTITUTE Summary – Confidence intervals and p values P –value: Indicates statistical significance Confidence interval: range of values for which we are 95% certain our true value lies Recommended to present confidence intervals where possible

Analysing Continuous Outcomes 38

HEALTH EDUCATION & TRAINING INSTITUTE T tests What are they used for?  Analyse means  Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference  P-value – a measure of the evidence against the null hypothesis of no difference between the two groups

HEALTH EDUCATION & TRAINING INSTITUTE T tests- paired vs independent Paired:  Outcome is measured on the same individual  Eg: before and after, cross-over trial  Pairs may be two different individuals who are matched on factors like age, sex etc. PatientBaseline weight (kg) 3 months weight (kg) 18582 27673 310298 4110108

HEALTH EDUCATION & TRAINING INSTITUTE Paired T-tests Calculate the difference for each of the pairs The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12) PatientBaseline weight (kg) 3 months weight (kg) Difference (kg) 18582-3 27673-3 310298-4 4110100-10 Mean9388-5

HEALTH EDUCATION & TRAINING INSTITUTE Paired T-tests There was no evidence that there was a significant change in weight after 3 months (p value = 0.19) Assumptions  Bell shaped curve with no outliers  Assess shape by graphing the difference  Use a histogram or stem and leaf plot

HEALTH EDUCATION & TRAINING INSTITUTE Independent T tests  Two groups that are unrelated  Eg: weights of different groups of people Weight (kg) NW Public SchoolSW Public School 5245 5154 7182 1415

HEALTH EDUCATION & TRAINING INSTITUTE Independent samples t-tests Same assumption as for paired t tests plus the assumption of independence and equal variance NW Public School (weight kg) SW Public School (weight, kg) Difference (weight, kg) 52457 51543 718211 726111 Mean62611 (-22,24)

HEALTH EDUCATION & TRAINING INSTITUTE Interpretation –independent t tests The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg The mean difference in weight between the two schools was 1 kg (-22, 24) There was no evidence of a significant difference in weight between the two schools (p=0.92)

HEALTH EDUCATION & TRAINING INSTITUTE One-way Analysis of Variance (ANOVA) What happens when there are more than two groups to compare? Null hypothesis: means for all groups are approximately equal No way to measure the difference in means between more than two groups, so the variance between the groups is analysed Can measure variance within a group as well as variance between groups

HEALTH EDUCATION & TRAINING INSTITUTE One-way ANOVA Comparing multiple groups NW Public School NE Public School SW Public School SE Public School 42394656 53525145 46585641 75414432 56656356

HEALTH EDUCATION & TRAINING INSTITUTE Interpretations – One-way ANOVA There was evidence of a difference between the average student weight between the four schools p<0.05 There was evidence of no difference between the average student weight between the four schools p>0.05 Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done

HEALTH EDUCATION & TRAINING INSTITUTE Assumptions ANOVA Normality, - observations for all groups are normally distributed, Variance in all groups are equal Independence – all groups are independent of each other

HEALTH EDUCATION & TRAINING INSTITUTE Extensions of one-way ANOVA Two way-ANOVA:  Multiple factors to be considered. Eg school and type of school (public/private) ANCOVA – Analysis of Covariance  Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables

HEALTH EDUCATION & TRAINING INSTITUTE Linear Regression Measures the association between two continuous variables (weight and height) Or one continuous variable and several continuous variables (multiple linear regression) What is the relationship between height and weight?

HEALTH EDUCATION & TRAINING INSTITUTE Scatter plot of weight and height Correlation between height and weight = 0.75

HEALTH EDUCATION & TRAINING INSTITUTE Scatter plot of body fat and height Correlation between body fat and height = -0.23

HEALTH EDUCATION & TRAINING INSTITUTE Linear regression Fits a straight line to describe the relationship Assumes 1. Independence for each measure (each person) 2. Linearity (check with scatter plots) 3. Normality (check residuals with a graph) 4. Homscedasticity  Variability in weight does not change as height changes, ie

HEALTH EDUCATION & TRAINING INSTITUTE Multiple Linear Regression Extends the simple linear regression Adjusts for confounding variables Example: Does smoking while pregnant affect infant birth weight?  Outcome variable: infant birth weight  Exposure variable: maternal smoking  Covariates (other variables of interest):  Sex of the baby, gestational age

HEALTH EDUCATION & TRAINING INSTITUTE Confounding variables A variable (factor) associated with both the outcome and exposure variables Gestational age is associated with both smoking (exposure) and the outcome (birth weight) Confounders can be assessed by checking the correlation between the variable of interest and the outcome variable Correlation coefficient : -1.0 <r<1.0 Rule of thumb: >0.5 or <-0.5 should be considered a confounder

HEALTH EDUCATION & TRAINING INSTITUTE Summary for continuous outcomes Comparing means from two group  Use t- tests (paired for same person comparison, independent for independent groups comparison) Comparing means for more than two groups  One-way ANOVA Comparing means for two or more groups and adjusting for other variables (ANCOVA)

HEALTH EDUCATION & TRAINING INSTITUTE Summary for continuous outcomes Assessing the relationship between two continuous variables  Simple linear regression Assessing the relationship between two or more variables  Multiple linear regression

Analysing Categorical Outcomes 59

Chi-square tests What can a chi-square test answer?

HEALTH EDUCATION & TRAINING INSTITUTE Chi-Square tests 2x2 tables: Low birth weight (<2500 grams) Total <2500 grams >2500 grams SmokingNo5100105 Yes2575100 Total30175205

HEALTH EDUCATION & TRAINING INSTITUTE Chi-square tests Can be used for paired (same person under two different conditions) or independent samples (unrelated people in different groups) Used often in case-control studies where the outcome is categorical (or dichotomous) Tests no association between row and column factors  Smoking and low birth weight association The study design defines the appropriate measure of effect

HEALTH EDUCATION & TRAINING INSTITUTE Extensions of Chi-square Small sample sizes  Fisher’s exact test  Recommended when n<20 or 20 <n<40 and the smallest expected cell count is <5 Paired data  Exact binomial test for small sample sizes  McNemar’s test Multiple regression:  Logistic regression

HEALTH EDUCATION & TRAINING INSTITUTE Cohort studies Exposure is determined by  Randomisation to different groups  followed over time Outcome is determined at the end of follow up Rate of outcome can be estimated

HEALTH EDUCATION & TRAINING INSTITUTE Cohort studies continued Eg. Rate of low birth weight in:  Smokers: rate = 25/100 = 0.25 = 25%  Non-smokers: = 5/105 = 5% Relative risk (RR) = 25/5=5 times higher risk of low birth rate in smokers relative to non-smokers Risk Difference (RD) = 25-5 = 20 No relative difference between the low birth rate in smokers and non-smokers RR =1.0

HEALTH EDUCATION & TRAINING INSTITUTE Cross-Sectional Studies People observed at one point in time (questionnaire) Exposure and outcome are measured at the same time Causal associations cannot be deduced Rate ratio (RR) = 25/5=5 times higher risk of low birth rate in smokers relative to non-smokers Rate Difference (RD) = 25-5 = 20 No relative difference between the low birth rate in smokers and non-smokers RR =1.0

HEALTH EDUCATION & TRAINING INSTITUTE Case-control studies Use for rare outcomes (example: child prodigies) Children are selected based on being a prodigy  Eg. 100 child prodigies and 100 children with normal intelligence Determine exposure retrospectively Cannot obtain a rate Must obtain the odds of the outcome and compare using an odds ratio

HEALTH EDUCATION & TRAINING INSTITUTE Case-Control studies Child prodigyTotal YesNo Fish oil supplements during pregnancy No3050105 Yes7050100 Total100 205

HEALTH EDUCATION & TRAINING INSTITUTE Case-control studies Odds of being a prodigy:  In exposed: 70/50 = 1.4  In unexposed: 0.6  Odds ratio:  1.4/0.6 = 2.3  2.3 times more likely to have a child prodigy if maternal fish oil supplements were taken during pregnancy  Null hypothesis  No association between the exposure and the outcome  Odds Ratio = 1

HEALTH EDUCATION & TRAINING INSTITUTE Summary of RR and OR Both compare the relative likelihood of an outcome between 2 groups RR=1 or OR = 1  Outcome is as likely in the exposed and unexposed groups RR>1 or OR >1  The outcome is more likely in the exposed group compared to the unexposed group  The exposure is a risk factor

HEALTH EDUCATION & TRAINING INSTITUTE Summary of RR and OR RR<1 or OR<1  The outcome is less likely in the exposed group compared to the unexposed group  The exposure is protective RR cannot be calculated for a case-control study OR ~ RR when the outcome is rare

HEALTH EDUCATION & TRAINING INSTITUTE Example of multivariate analyses data table

HEALTH EDUCATION & TRAINING INSTITUTE Non-parametric tests Parametric testNon parametric tests Independent samples t-testWilcoxon-Mann-Whitney test Paired t-testWilcoxon signed rank sum test One-way ANOVAKruskal Wallis Chi-square test?

Spurious statistics

HEALTH EDUCATION & TRAINING INSTITUTE Fact or Fiction Vaccines and autism? Cell phones and brain tumours?

HEALTH EDUCATION & TRAINING INSTITUTE Common errors Reporting measurements with unnecessary precision  60.182 kg or 61kg? Continues variables divided into categories  Not explaining why or how  Certain boundaries may be chosen to favour certain results Presenting Means and SD for non-normal data  What should be presented instead?

HEALTH EDUCATION & TRAINING INSTITUTE Common Errors Unspecific wording for results  “The effect of more exercise was significant”  “The effect of 40 minutes of exercise per day was statistically significant for decreasing weight (p<0.05)”  “40 minutes of exercise per day lowered the mean weight of the group from 95 kg to 89 kg, (95% CI = 75-105 kg, p= 0.03) Failing to check the distribution of the data to determine the appropriate statistical test  Using parametric tests when data is not normal  Using tests for independent data when the data is paired

HEALTH EDUCATION & TRAINING INSTITUTE Common Errors Using linear regression without confirming linearity Not reporting what happened to all patients  Leads to bias of the results Data dredging  Multiple statistical comparisons until a significant result is found Not accounting for the denominator or adjusting for baseline

HEALTH EDUCATION & TRAINING INSTITUTE Example

HEALTH EDUCATION & TRAINING INSTITUTE Common Errors Selection Bias  Sampling from a bag of candy where the larger candies are more likely to be chosen  On November 13, 2000, Newsweek published the following poll results:

HEALTH EDUCATION & TRAINING INSTITUTE Selection Bias

HEALTH EDUCATION & TRAINING INSTITUTE Common Errors Other biases (measurement bias, intervention bias) Using cross sectional studies to infer causality  Survey on any given day will change the conclusions  Example: More likely to have a c-section if attending a private hospital instead of a public hospital

HEALTH EDUCATION & TRAINING INSTITUTE Practical example Working in groups quickly read the article provided Summarise  What data they used  What test  Do you believe their findings?  Can you explain why?

HEALTH EDUCATION & TRAINING INSTITUTE Summary Statistics must be understood in the context of the whole article Statistical tests must fit the data type Findings should be presented appropriately Beware flashy stats! It’s the author’s job to justify their choices If you don’t believe it- can you base your practice on it?

HEALTH EDUCATION & TRAINING INSTITUTE Questions?

Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio.

Similar presentations

Presentation on theme: "Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio.

Similar presentations

Presentation on theme: "Can I Believe It? Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio."— Presentation transcript:

Similar presentations

About project

Feedback