LIES, MORE LIES AND STATISTICS AUTHORSHIP WORKSHOP: TRANSLATING YOUR THESIS INTO A PUBLICATION 7th December 2016 Dr James Kigera University of Nairobi
Data Flow Data management Study data Analysis base Clinical data Trial report Study data base Clinical data Source document CRF Publication Analysis Data base closure Checked & cleaned
Useful description Some people are young and some people are a bit older, and other people are even older and very few people are extremely old, and I forgot .. Some people are extremely young when they finish medical school ???
VARIABLES DISCRETE CONTINOUS BINARY/DICHOTOMOUS CONSTANT SCALES OF MEASUREMENT NOMINAL ORDINAL RATIO INTERVAL
DESCRIPTIVE CENTRAL TENDANCY MODE MEDIAN MEAN VARIATION RANGE QUARTILES VARIANCE STANDARD DEVIATION SHAPE SKEWNESS KURTOSIS
Epidemiological measures Measures of frequency Prevalence (point prevalence, period prevalence) Incidence (risk / cumulative incidence, odds, incidence rate) Quantifying the size of the problem, descriptive Measures of association Absolute measures (attributable risk, population attributable risk) Quantifying public health impact Relative measures (risk ratio, rate ratio, odds ratio) Quantifying strength of association between exposure and outcome, use for evaluation of causality
INFERENTIAL STATITICS CONFIDENCE INTERVALS HYPOTHESIS TESTING RELATIONSHIP BETWEEN GROUPS T TEST ANOVA CHI SQUARE FISCHERS EXACT SIGN TEST RANK SUM TEST SIGNED RANK TEST MANN-WHITENEY KRUSKAL-WALLIS
Choice of test Fisher’s exact test if exp value < 5
It must be alpha! Beautiful knight alpha? Ugly knight beta? He saw a relationship where there was none –alpha error He failed to see a relation where there was one – beta error It must be alpha!
P Hypothesis testing The P- value, what is it? The probability of obtaining a value of the test statistic that is at least as extreme as the one that was actually observed, given that the null hypothesis is true [Cave : in essence, the p-value is an attribute of the test statistic, NOT of the effect measure. This is often wrongly explained in reference text books , including Fletcher & Fletcher, the LSHTM intro book and the Gao Smith & Smith book. @check what Petrie and Hennekens say@ Who can tell me what it is? For example, the OR for the association between crowding and TB acquisition is 3.5, the associated X2 is 4.1 and the p-value is 0.10. What does this p-value of 0.10 mean? That if in reality there is no association between crowding and TB, the probability of obtaining a X2 of 4.1 or higher is 0.10. Or: if we would repeat the study an innumerable amount of times, in 1 out of 10 times we will incorrectly conclude that there is a difference in TB acquisition between the crowding groups while in reality there isn’t any difference. Notice that the size of the effect measure is not at all considered with hypothesis testing. This is a major objection against the use of p-values alone. 12 12
Applied Clinical Research Course, M2, Kigali, 7-11 July 2008 Hypothesis testing Caution with p-values! Black /white , yes / no result Always state the probability , NEVER write P <0.05 or P >0.05 Lends a false air of authority or objectivity If results are “not significant “ they may not be published Confusion of statistical significance with clinical significance Hypothesis testing is appealing because it seems to simplify the hard work of interpreting data . However, there are a number of poblems involved with hypothesis testing: [see slide] Applied Clinical Research Course, M2, Kigali, 7-11 July 2008 13 13
COMPARATIVE STATISTICS RELATIONSHIP BETWEEN VARIABLES VISUAL DIRECTION SHAPE CORRELATION COEFICIENT
Case study Location: KNH Study type: Intervention study Randomized clinical trial Subjects: consecutive CHOLELITHIASIS PATIENTS aged >=18 Outcome: PAIN AND SCAR APPEARANCE ASSESSED BY BLINDED NURSE
Two Interventions Laparascopic cholecystectomy and scar massage Randomization Patients underwent Open Cholecystectomy Patients underwent Laparascopic Cholecystectomy Patients underwent Open Cholecystectomy and Scar Massage
Study 3 N = 412 patients randomized Results: 223 patients had cholelithiasis Pain and Scar Composite Scores at 6 months follow-up Open group (n=75) 7200 Median (IQR: 1200,0000 to 21120) Laparascopic Group (n =75) 28900 Median (IQR: 15312 to 43812) Laparascopy and Massage group (n= 73) 34800 Median (IQR 18800 to 53350)
Study 3: Question 1 How many groups do we have? Is data normally distributed? No How many groups do we have? 3 groups. Is the data paired or unpaired (independent)? unpaired groups – independent from each other What type of data is this? Numerical/ continuous Is the sample size large (>20)? Yes (n = 223)
Choice of test Fisher’s exact test if exp value < 5
SUMMARY DATA ANALYSIS Collect Useful Data Process Appropriately Understand your Variables Use the correct Test Statistical vs Clinical Significance