Basic statistics 11/09/13
Topics to cover Averages: Mean, Median, Mode, Range, Confidence intervals, Standard deviation Incidence and prevalence Screening tests – positive and negative predictive values, sensitivity and specificity
Central Tendencies and Spread of Data Basic statistics Central Tendencies and Spread of Data
Measures of central tendency 15 patients with epilepsy were recruited into a trial. They were asked to record the number of seizures they had in a six month period. The results are presented below. 4 2 1 1 13 1 9 2 1 1 2 3 7 5 8
Measures of central tendency Calculate the: Mean Median Mode Range
Measures of central tendency Calculate the: Mean = 4 Median = 2 Mode = 1 Range = 12
Measures of central tendency Mean: sum of the observations divided by the number of observations (1 + 1 + 1 + 1 + 1 + 2 + 2 + 2 + 3 + 4 + 5 + 7 + 8 + 9 + 13) / 15 = 4 Median: the middle value when the total observations are arranged in order of increasing value 1 1 1 1 1 2 2 2 3 4 5 7 8 9 13 Mode: the most commonly occurring value Range: the difference between the highest and lowest values in a set of data 13 – 1 = 12
Standard deviation A measure of the spread of the data Can be used to calculate confidence intervals
Normal Distribution
Confidence intervals Used to assess statistical significance Provides a measure of the extent to which a sample estimate is likely to differ from the true population value Indicates with a standard level of certainty (usually 95%), the range of values within which the true population mean is likely to lie e.g. 25±5
Confidence intervals contd. For a given level of confidence: a narrow interval indicates that the sample estimate has good (high) precision a wide interval indicates that the sample estimate has poor (low) precision Confidence intervals become narrower as: the sample size increases the variability of the data decreases the degree of confidence required for the population mean decreases e.g. 90%, 95%, 99%
Incidence and Prevalence Basic statistics Incidence and Prevalence
Incidence In the last year there have been 24 new cases of colorectal carcinoma in your practice (list size 10276). What is the incidence of colorectal carcinoma? Incidence = 24 / 10276 = 0.0023 Incidence per 1000 = 2.30 per 1000
Incidence Number of new cases diagnosed in a population per unit of time Incidence rate = (number of new cases diagnosed in a given period of time / population size) x 100, 000 (or 1000 etc.)
Prevalence 2593 of the 8725 patients registered with your practice have a BMI of 30 or more. What is the prevalence of obesity? Prevalence = (2593 / 8725) x 100 = 29.7% Prevalence = (2593 / 8725) x 1000 = 297.2 per 1000
Prevalence Total number of cases per population at a particular point in time (e.g. number per 100,000 population) Prevalence rate = (number of cases in population / total size of population) x 100,000 (or 1000 etc.) Prevalence = incidence x duration of condition
Relationship between incidence and prevalence Prevalence = incidence x duration of condition Increase incidence → increase prevalence Cure more patients → lower prevalence More patient die → lower prevalence Enhance survival → increase prevalence
Screening Test Statistics Basic statistics Screening Test Statistics
Screening tests Cervical cancer present Cervical cancer absent New test positive 100 True positives (TP) 50 False positives (FP) New test negative 10 False negatives (FN) 840 True negatives (TN) A blood test to help diagnose cervical cancer has been developed. A study is done on 1000 patients comparing this test to the standard technique
Positive and negative predictive values Positive predictive value (PPV): proportion of people who test positive who actually have the disease PPV = TP / (TP + FP) Negative predictive value (NPV): proportion of people told they don’t have the disease that really don’t have it NPV = TN / (TN + FN) Give an indication of the reliability of a positive or negative test result
PPV and NPV 100 50 10 840 PPV = 100 / (100+50) NPV = 840 / (840+10) Cervical cancer present Cervical cancer absent New test positive 100 50 New test negative 10 840 PPV = 100 / (100+50) = 0.67 = 67% NPV = 840 / (840+10) = 0.99 = 99% The higher the PPV, the more likely it is that a patient with a positive test result does have the disease The higher the NPV, the more likely it is that someone who has tested negative really doesn’t have the disease
Sensitivity and specificity Sensitivity: the proportion of people with a disease who are detected by the test (proportion of positives found) Sensitivity = TP / (TP + FN) Specificity: the proportion of people who don’t have a disease who test negative (proportion of negatives found) Specificity = TN / (TN + FP) Indicate the proportion of the population with/without the disease which will be detected by the test
Sensitivity and specificity Cervical cancer present Cervical cancer absent New test positive 100 50 New test negative 10 840 Sensitivity = 100 / (100+10) = 0.91 = 91% Specificity = 840 / (840+50) = 0.94 = 94% High sensitivity = few missed diagnoses High specificity = few false positives