Critical Appraisal Dr. Chris Hall – Facilitator Dr. Dave Dyck R3 March 20/2003
Objectives: Review study design and the advantages/ disadvantages of each Review key concepts in hypothesis, measurement, and analysis Article appraisal –Treatment articles –Diagnosis articles –Harm articles –Overviews/meta-analysis Survive the next hour and still be able to smile
Study Design: Ecological studies Case Reports Case Series Cross-Sectional Studies Case Control and Retrospective Cohort Studies Prospective Cohort Studies Randomized Controlled Trials
Ecological Studies: Studies of a group rather than individual subjects Supplies data on exposure and disease as a summary measure of the total population as an aggregate eg. Incidence studies Berkson’s Bias: ie. The correlation between the variables is not the same on the individual level as it is for the group. Therefore you cannot link exposures to disease on an individual basis Also, difficult to account for confounding variables
Case Reports Submission of individual cases with rare or interesting findings ++++ subject to bias (selection / submission and publication) Should not infer causality or suggest practice change
Case Series: A group of “consecutive cases” with unifying features Selection bias = what constitutes a case, is it truly consecutive, response bias Publication bias Measurement bias (presence of ‘disease’ or exposure may be variable)
Cross Sectional Studies: Ie. Prevalence study Presence or absence of a specific disease compared with one or several variables within a defined population at a specific point in time
Cross Sectional Studies disadvantages: Subject to selection bias (see HO) Cause and effect cannot be determined (see HO) (ie. Don’t know whether the exposure occurs before the outcome or the outcome occurs before the exposure) Temporal trends may be missed (seasonal variations) Previous deaths, drop-outs, and migration are not counted; and short lived, transient outcomes are underrepresented. Thus, CSS are best suited to study chronic, non-fatal conditions.
Cross sectional studies – advantages: Can do quickly May provide enough of an association between an exposure/outcome to generate a hypothesis which can be studied by another method. Useful for descriptive/analytical studies
Case Control Studies: Starts now and goes back in time Start with the outcome and ask or find out about prior exposure Specific hypothesis usually tested Select all cases of a specific disease during a certain time and select a number of controls who represent general population then determine exposure to factor in each odds ratio May match controls to patients (but can never be sure of similar baseline states)
Case Control Study:
CCS cont Odds ratio provides an estimate of the relative risk (esp when disease is rare) Thus, use CCS only when disease is rare (< 10% of population) As OR increases (>1) greater risk As OR decreases (<1) reduced risk
CCS advantages: Small # needed (good for rare diseases or when outcomes are rare or delayed) Quick Inexpensive Can study many factors
CCS disadvantages: Problems selecting/matching controls Only an estimate of relative risk No incidence rates Biases (? Unequal ascertainment of exposure between cases and controls) –Ie recall bias= cases are more likely to remember exposure than controls –Selection bias = cases and controls should be selected according to predetermined, strict, objective criteria
Cohort Study (prospective) Start with 2 groups free of disease and follow forward for a period of time 1 group has the factor (eg. Smoking) the other group does not Define 1 or more outcomes (eg. Lung CA) Tabulate the # of persons who develop the outcome Provides estimates of incidence, relative risk, and attributable risk
Relative risk / Attributable risk Relative risk = measures the strength of association between exposure and disease Attributable risk = measures the number of cases of disease that can be attributed to exposure Given a constant relative risk, attributable risk rises with incidence of the disease in members of the population who are not exposed
Cohort Study Cannot by itself establish causation, but can show an association between a factor and an outcome Generally provides stronger evidence for causation than case control studies
Cohort Study advantages: Lack of bias in factor Uncovers natural history Can study many diseases Yields incidence rates, relative, and attributable risk Allows for more control of confounding variables
Cohort Study Disadvantages: Possible bias in ascertainment of disease. Need large numbers and long follow-up Easy to lose patients in follow-up (attrition of subjects). This may introduce bias if lost subjects are different from those who continue to be followed Hard to maintain comparable follow-up for all levels of exposure
Cohort Study disadvantages cont. Expensive Locked into the factor(s) measured Measurement bias (eg. Unblinded physician who looks harder for + outcomes in the exposed pt) Confounding variables still present
Randomized Control Trials: To test the hypothesis that an intervention (treatment or manipulation) makes a difference. An experimental group is manipulated while a control group receives a placebo or standard procedure All other conditions are kept the same between the groups
RCTs Goals= –Prevention (to decrease risk of disease or death) –Therapeutic (decrease symptoms, prevent recurrences, decrease mortality) –Diagnostic (evaluate new diagnostic procedures)
RCT problems: Ethical issues Difficulty to test an intervention that is already widely used Randomization Blinding techniques (may be difficult due to common SE of drugs) Control group (placebo, conventional tx, specific tx) Subject selection and issues of generalizability Are refusers different in some way
Causation:
Key Terms for diagnostic tests: Sensitivity= proportion with the disease identified by the test Specificity= proportion without the disease with a negative test
Sensitivity= a/a+c Specificity=d/b+d
Other key terms: Positive Predictive Value= This is the probability of having the disease given a positive test (a/a+b) Negative Predictive Value= The probability of not having the disease given a negative test (d/c+d)
Statistical Hypothesis: Null Hypothesis –Hypothesis of no difference between a test group and a control group (ie. There is no association between the disease and the risk factor in the population) Alternative Hypothesis –Hypothesis that there is some difference between a test group and control group
Measurements and Analysis: Sampling bias = selecting a sample that does not truly represent the population Sampling size = contributes to the credibility of “positive” studies and the power of “negative studies”. Increasing the sample size decreases the probability of making type I and type II errors.
Errors Type I Error (alpha error) = the probability that a null hypothesis is considered false when it is actually true. (ie. Declaring an effect to be present when it is not) This probability is represented by the p value or alpha; the probability the difference is due to chance alone.
Errors cont. Type II Error (Beta Error) = the probability of accepting a null hypothesis as true when it is actually false (ie. Declaring a difference/effect to be absent when it is present) –The probability that a difference truly exists –Reflects the power (1-Beta) of a study
Significance: Statistical Significance: determination by a statistical test that there is evidence against the null hypothesis. The level of significance depends on the values chosen for alpha error Usually alpha 80%)
Significance cont. Clinical Significance: statistical significance is necessary but not sufficient for clinical significance which reflects the meaningfulness of the difference (eg. A statistically significant 1mm Hg BP reduction is not clinically significant) Also includes such factors as cost, SE.
Other terms: Accuracy= how closely a measurement approaches the true value Reliability= how consistent or reproducible a measurement is when performed by different observers under the same conditions or the same observer under different conditions Validity= describes the accuracy and reliability of a test (ie. The extent to which a measurement approaches what it is designed to measure)
Validity and Reliability
Appraising an article (JAMA): 3 basic stages –1) the validity – are the conclusions justified? –2) the message – what are the results? –3) the utility – can I generalise the findings to my patients?
Are the results valid? – (therapy article) Primary guides –Was the assignment of patients to treatment randomized? –Were all patients who entered the trial properly accounted for and attributed at its conclusion? –Was follow-up complete? –Were patients analyzed in the groups to which they were randomized? Ie. Intention to treat analysis
Are the results valid? Secondary guides: –Were patients, their clinicians, and study personnel “blind” to treatment? (avoids bias) –Were the groups similar at the start of the trial? (randomization not always effective if sample size small) –Aside from the experimental intervention, were the groups treated equally? (ie. Cointerventions)
What are the results? How large was the treatment effect? –Relative risk reduction vs absolute risk reduction
Eg. Baseline risk of death without therapy=20/100 =.20 = 20% (X =.20) Risk with therapy reduced to 15/100 =.15 = 15% (Y =.15) Absolute Risk Reduction = (X-Y) = =.05 (5%) Relative Risk = (Y/X) =.15/.20 =.75 Relative Risk Reduction = [1-(Y/X)] x 100% = [1-(.75)] x 100% = 25%
Number needed to treat = NNT To calculate simply take the inverse of the absolute risk reduction In last example= 1/.05 = 20 is the NNT
What are the results? Cont. How precise was the estimate of treatment effect? –Use confidence intervals (CI) = a range of values reflecting the statistical precision of an estimate (eg. A 95% CI has a 95% chance of including the true value) –CI narrow as sample size increases eg. In last example of 100 patients with 20 pts dying in the control group and 15 in the tx group the 95%CI for the RRR was -38% - 59%. If 1000 patients were enrolled in each group with 200 dying in the controls and 150 in the tx group the 95% CI for the RRR is 9%-41%.
CI cont If CI cross 0 they are generally unhelpful in making conclusions When is the sample size big enough? –If the lower boundary of the CI is still clinically significant to you (in + studies) –(or if the upper CI boundary is not clinically significant in negative studies)
What if no CI reported? 1) use the p value = as the p value decreased below.05, the lower bound of the 95% confidence limit for the RRR rises above 0 2) If the standard error (SE) of the RRR is presented it is easy to calculate the CI as 2xSE +/- point estimate (RRR) 3) Calculate CI yourself or with a statistician
Will the results help me in caring for my patients? Can the results be applied to my patient population? Were all clinically important outcomes considered? Ie. Mortality, morbitity, quality of life endpoints Are the likely treatment benefits worth the potential harm and costs? Ie. What is the patient’s baseline risk if left untreated. (NNT is helpful here)
Article about a diagnostic test:
Are the results valid? Primary guides: –Was there an independent, blind comparison with a reference standard? (ie. Gold standard) –Did the patient sample include an appropriate spectrum of patients to whom the diagnostic test will be applied in clinical practice?
Are the results valid? Secondary guides –Did the results of the test being evaluated influence the decision to perform the reference standard? Ie verification bias eg. Pioped = normal, near normal, low prob V/Q scans had only 69% going on for pulmonary angiogram whereas more positive V/Q scans had 92% going on for angiograms –Were the methods for performing the test described in sufficient detail to permit replication?
What are the results? Are likelihood ratios for the test results presented or data necessary for their calculation included? Likelihood ratio = the ratio between the likelihoods of having the disease, and not having the disease, with a + test
Likelihood Ratios: LR>10 and <.1 generate large and often conclusive changes from pretest to posttest probability LR of 5-10 and.1-.2 generate moderate shifts in pretest and posttest probability LR of 2-5 and.5-.2 generate small (but sometimes important) changes in probability LR of 1-2 and.5-1 are generally insignificant
Bayesian analysis Makes use of LR to change pretest probabilities to posttest probabilities. (can use Fagan’s nomogram):
Will the results help me in caring for my patients? Will the reproducibility of the test result and its interpretation be satisfactory in my setting? Are the results applicable to my patient? Will the results change my management? Will patients be better off as a result of the test?
Articles about Harm? 1 st – what is the study design (RCT, cohort, case control, case series, etc) –Most important is that there is an appropriate control population
Are the results valid? Were the exposures and outcomes measured in the same way in the groups being compared? (minimize recall/interviewer bias) Was follow-up sufficiently long and complete? Is the temporal relationship correct? Is there a dose response gradient?
What are the results? How strong is the association between exposure and outcome? Ie. Relative risk (if >1= increase in risk associated with exposure and <1= decrease in risk associated with exposure) How precise is the estimate of risk? Ie. CI
What are the implications for my practice? Are the results applicable to my practice? What is the magnitude of the risk? Should I attempt to stop the exposure?
Overviews, Systemic Reviews, and Meta-analysis Did the overview address a focussed clinical question? Were the criteria used to select articles for inclusion appropriate? - these should be revealed in the paper Is it unlikely that important, relevant studies were missed? (avoids publication bias- a higher likelihood for studies with positive results to be published) Was the validity of the included studies appraised? (peer review does not guarantee the validity of published research)
Cont. Were assessments of studies reproducible? (better if there are more reviewers who are deciding which articles to include) Were the results similar from study to study? (can use “tests of homogeneity” statistical analysis)
Results? What are the overall results of the overview? (are studies weighted according to their size?) There should be a summary measure which clearly conveys the practical importance of the result – eg. RRR, LR, NNT etc. How precise were the results? CI still very helpful
Will the results help me in caring for my patients? Can the results be applied to my patient care? (subgroup analysis should be critiqued closely) Were all clinically important outcomes considered? ( a clinical decision will require considering all outcomes both good and bad) Are the benefits worth the harms and costs?
Does your brain ache? THE END