Introduction to statistics in medicine – Part 1 Arier Lee
Introduction Who am I Who am I Who do I work with Who do I work with What do I do What do I do
Why do we need statistics Population Sample
The important role of statistics in medicine Statisticians pervades every aspect of medical research Statisticians pervades every aspect of medical research Medical practice and research generates lots of data Medical practice and research generates lots of data Research involves asking lots of questions with strong statistical aspects Research involves asking lots of questions with strong statistical aspects The evaluation of new treatments, procedures and preventative measures relies on statistical concepts in both design and analysis The evaluation of new treatments, procedures and preventative measures relies on statistical concepts in both design and analysis Statisticians are consulted at early stage of a medical study Statisticians are consulted at early stage of a medical study
Research process Research question Primary and secondary endpoints Study design Sampling and/or randomisation scheme Power and sample size calculation Pre-define analyses methods Analyse data Interpret results Disseminate
A form of systematic error that can affect scientific research A form of systematic error that can affect scientific research Selection bias – well defined inclusion / exclusion criteria, randomisation Selection bias – well defined inclusion / exclusion criteria, randomisation Assessment bias – blinding Assessment bias – blinding Response bias, lost-to-follow-up bias – maximise response Response bias, lost-to-follow-up bias – maximise response Questionnaire bias – careful wording and good interviewer training Questionnaire bias – careful wording and good interviewer training Bias
Continuous Continuous age, weight, height, blood pressure Percentages Percentages % of households owning a dog Counts Counts Number of pre-term babies Binary Binary yes/no, male/female, sick/healthy Ordinal Ordinal taste of biscuits: strongly dislike, dislike, neutral, like, strongly like Nominal categorical Nominal categorical Ethnicity: European, Maori, Pacific Islander, Chinese etc. Some common data types
Descriptive statistics for continuous data – the average Mean Mean (sum of values)/(number in group) Median Median The middle value, 50 th percentile Mode Mode The value that occurs the most often medianmode=8 mean=11.54
Descriptive statistics for continuous data – the spread Range Range Minimum and maximum numbers Interquartile range Interquartile range Quartiles divide data into quarters Standard deviation Standard deviation A statistic that tells us how far away from the mean the data is spread (95% of the data lies between 2 SD) √ (x i - x) 2 /(n-1) 0, 1, 2, 5, 8, 8, 9, 10, 12, 14, 18, 20 21, 23, 25, 27, 34, numbers Q1 Q2 Q3
– Estimation: determine value of a variable and its likely range (ie. 95% confidence intervals) Statistical inference is a process of generalising results calculated from a sample to a population Statistical inference is a process of generalising results calculated from a sample to a population We are interested in some numerical characteristic of a population (called a parameter). e.g. the mean height or the proportion of pregnant women with hypertension We are interested in some numerical characteristic of a population (called a parameter). e.g. the mean height or the proportion of pregnant women with hypertension We take a sample from the population and calculate an estimate of this parameter We take a sample from the population and calculate an estimate of this parameter Estimation
We want to estimate the mean height of 10 years old boys We want to estimate the mean height of 10 years old boys Take a random sample of 100 ten years old boys and calculate the sample mean Take a random sample of 100 ten years old boys and calculate the sample mean The mean height of my random sample is 141cm The mean height of my random sample is 141cm Based on our random sample, we estimate the mean height of 10 years old boys is 141cm Based on our random sample, we estimate the mean height of 10 years old boys is 141cm Estimation – a simple example
It is essential to know the distribution of your data so you can choose the appropriate statistical method to analyse the data It is essential to know the distribution of your data so you can choose the appropriate statistical method to analyse the data Data can be distributed (spread out) in different ways Data can be distributed (spread out) in different ways Continuous data: There are many cases when the data tends to be around a central value with no bias to the left or right – normal distribution Continuous data: There are many cases when the data tends to be around a central value with no bias to the left or right – normal distribution Distribution of Data
Many parametric methods assumes data is normally distributed Many parametric methods assumes data is normally distributed Bell curve Bell curve Peak at a central value Peak at a central value Symmetric about the centre Symmetric about the centre Mean=median=mode Mean=median=mode The distribution can be described by two parameters – mean and standard deviation The distribution can be described by two parameters – mean and standard deviation Distribution of data – Normal distribution
Standard deviation – shows how much variation or ‘dispersion’ exists in the data. Standard deviation – shows how much variation or ‘dispersion’ exists in the data. 95% of the data are contained within 2 standard deviations 95% of the data are contained within 2 standard deviations Standard deviation
A simulated example – Birth weight Mean=3250gSD=550g Histogram of birth weight
Some common distributions Some common distributions – Binomial distribution – gestational diabetes (Yes/No) – Uniform distribution - throwing a die, equal (uniform) probability for each of the six sides – And many many more… Some other common distributions
Because of random sampling, the estimated value will be just an estimate – not exactly the same as the true value Because of random sampling, the estimated value will be just an estimate – not exactly the same as the true value If repeated samples are taken from a population then each sample and hence sample mean and standard deviation is different. This is known as Sampling Variability If repeated samples are taken from a population then each sample and hence sample mean and standard deviation is different. This is known as Sampling Variability Sampling variability
In practice we do not repeat the sampling to measure sampling variability we endeavour to obtain a random sample and use statistical theory to quantify the error In practice we do not repeat the sampling to measure sampling variability we endeavour to obtain a random sample and use statistical theory to quantify the error Fundamental principle to justify our estimate is reasonable: If it were possible to repeat a study over and over again, in the long run the estimates of each study would be distributed around the true value Fundamental principle to justify our estimate is reasonable: If it were possible to repeat a study over and over again, in the long run the estimates of each study would be distributed around the true value If we have a random sample then the sampling variability depends on the size of the sample and the underlying variability of the variable being measured If we have a random sample then the sampling variability depends on the size of the sample and the underlying variability of the variable being measured Sampling variability