Evidence-Based Medicine: Effective Use of the Medical Literature Edward G. Hamaty Jr., D.O. FACCP, FACOI
Appraising Prognosis Articles
Prognosis TYPES OF REPORTS ON PROGNOSIS best systematic review of prognosis Several types of studies can provide information on the prognosis of a group of individuals with a defined problem or risk factor. The best evidence with which to answer our clinical question would come from a systematic review of prognosis studies. A systematic review that searches for and combines all relevant prognosis studies would be particularly useful for retrieving information about relevant patient subgroups. When assessing the validity of a systematic review, we’d need to consider the guides in Table 1.Table 1 rare At this time, relevant systematic reviews of prognosis studies are rare and we’ll focus the discussion in this lecture on individual studies.
Prognosis (For the prognostic factors identified)
Prognosis best Cohort studies (in which investigators follow one or more groups of individuals with the target disorder over time and monitor for occurrence of the outcome of interest) represent the best design for answering prognosis questions. Example: PPD of cigarette smokers and incidence of lung cancer, or cholesterol levels and CAD. Randomized trials can also serve as a source of prognostic information (particularly since they usually include detailed documentation of baseline data), although trial participants may not be entirely representative of the population with a disorder. rare long selection and measurement Case–control studies (in which investigators retrospectively assess prognostic factors by determining the exposures of cases who have already suffered the outcome of interest and controls who have not) are particularly useful when the outcome is rare or the required follow-up is long. However, the strength of inference that can be drawn from these studies is limited because of the potential for selection and measurement bias.
Cohort Study Patients who have developed a disorder are identified and their exposure to suspected causative factors is compared with that of controls who do not have the disorder. odds ratiosnot This permits estimation of odds ratios (but not of absolute risks). only way of studying very rare disorders The advantages of case-control studies are that they are quick, cheap, and are the only way of studying very rare disorders or those with a long time lag between exposure and outcome. Disadvantages include the reliance on records to determine exposure, difficulty in selecting control groups, and difficulty in eliminating confounding variables.
Randomized Controlled Trial Similar subjects are randomly assigned to a treatment group and followed to see if they develop the outcome of interest. RCTs are the most powerful method of eliminating (known and unknown) confounding variables and permit the most powerful statistical analysis (including subsequent meta-analysis). RCTs are the most powerful method of eliminating (known and unknown) confounding variables and permit the most powerful statistical analysis (including subsequent meta-analysis). However, they are expensive, sometimes ethically problematic, and may still be subject to selection and observer biases.
Randomized Controlled Trial
Case Control Study A case-control study is an observational, retrospective study which "involves identifying patients who have the outcome of interest (cases) and control patients without the same outcome, and looking back to see if they had the exposure of interest."
Case Control Study Patients with and without the exposure of interest are identified and followed over time to see if they develop the outcome of interest, allowing comparison of risk. cheaper simplerthan RCTs more rigorous than case- control studies in eligibility and assessment Cohort studies are cheaper and simpler than RCTs, can be more rigorous than case- control studies in eligibility and assessment, can establish the timing and sequence of events, and are ethically safe. cannot exclude unknown confounders, blinding is difficult, and identifying a matched control group may also be difficult. However, they cannot exclude unknown confounders, blinding is difficult, and identifying a matched control group may also be difficult.
Prognosis - Validity Is the study valid? In asking questions about a patient’s likely prognosis over time, the best individual study type to look for would be longitudinal cohort study. 1. Is the Sample Representative? Does the study clearly define the group of patients, and is it similar to your patients? Were there clear inclusion and exclusion criteria?
Prognosis - Validity Were they recruited at a common point in their illness? early In any case, they should all be recruited at a consistent stage in the disease; if not, this will bias the results. The methodology should include a clear description of the stage and timing of the illness being studied. To avoid missing outcomes, study patients should ideally be recruited at an early stage in the disease. In any case, they should all be recruited at a consistent stage in the disease; if not, this will bias the results. Did the study account for other important factors? The study groups will have different important variables such as sex, age, weight and co-morbidity which could affect their outcome. The investigators should adjust their analysis to take account of these known factors in different sub-groups of patients. You should use your clinical judgment to assess whether any important factors were left out of this analysis and whether the adjustments were appropriate. This information will also help you in deciding how this evidence applies to your patient.
Prognosis - Validity Is the setting representative? 'referral bias Patients who are referred to specialist centers often have more illnesses and are higher risk than those cared for in the community. This is sometimes called 'referral bias'. 2 Was follow up long enough for the clinical outcome? You have to be sure that the study followed the patients for long enough for the outcomes to manifest themselves. Longer follow up may be necessary in chronic diseases.
Prognosis - Validity 3 Was follow up complete? Most studies will lose some patients to follow up; the question you have to answer is whether so many were lost that the information is of no use to you. You should look carefully in the paper for an account of why patients were lost and consider whether this introduces bias into the result. less than 80% If follow up is less than 80% the study's validity is seriously undermined. You can ask 'what if' all those patients who were lost to follow up had the outcome you were interested in, and compare this with the study to see if loss to follow up had a significant effect. With low incidence conditions, loss to follow up is more problematic.
Prognosis - Validity 5% 20% We suggest considering the simple “5 and 20” rule: fewer than 5% loss probably leads to little bias, greater than 20% loss seriously threatens validity, and in-between amounts cause intermediate amounts of trouble. bestworst While this may be easy to remember, it may over- simplify clinical situations in which the outcomes are infrequent. Alternatively, we could consider the “best” and “worst” case scenarios in an approach that we’ll call a “sensitivity analysis”.
Prognosis - Validity Imagine a study of prognosis wherein 100 patients enter the study, 4 die and 16 are lost to follow-up. A “crude” case-fatality rate would count the 4 deaths among the 84 with full follow-up, calculated as 4/84=4.8%. But what about the 16 who are lost? Some or all of them might have died too. In a “worst case” scenario, all would have died, giving a case-fatality rate of (4 known+16 lost)=20 out of (84 followed+16 lost) = 100, or 20/100 (i.e. 20%), which is four times the original rate that we calculated! worst caseadded the lost patients to both the numerator and the denominator Note that, for the “worst case” scenario, we’ve added the lost patients to both the numerator and the denominator of the outcome rate.
Prognosis - Validity best case best case just the denominator On the other hand, in the “best case” scenario, none of the lost 16 would have died, yielding a case-fatality rate of 4 out of (84 followed+16 lost), or 4/100 (i.e. 4%). Note that, for the “best case” scenario, we’ve added the missing cases to just the denominator. worst casedoes differ meaningfully While this “best case” of 4% may not differ much from the observed 4.8%, the “worst case” of 20% does differ meaningfully, and we’d probably judge that this study’s follow-up was not sufficiently complete and threatens the validity of the study. By using this simple sensitivity analysis, we can see what effect losses to follow-up might have on study results, which can help us judge whether the follow-up was sufficient to yield valid results
Prognosis - Validity 4 Were outcomes measured 'blind'? How did the study investigators tell whether or not the patients actually had the outcome? The investigators should have defined the outcome/s of interest in advance and have clear criteria which they used to determine whether the outcome had occurred. Ideally, these should be objective, but often some degree of interpretation and clinical judgment will be required. To eliminate potential bias in these situations, judgments should have been applied without knowing the patient's clinical characteristics and prognostic factors. OBJECTIVE and/or BLINDED Outcomes are OBJECTIVE and/or BLINDED.
Prognosis - Validity Are the results important? What is the risk of the outcome over time? Three Three ways in which outcomes might be presented are: percentage of survival as a percentage of survival at a particular point in time; median survival as a median survival (the length of time by which 50% of study patients have had the outcome); survival curve as a survival curve that depicts, at each point in time, the proportion (expressed as a percentage) of the original study sample who have not yet had a specified outcome. Survival curves advantage Survival curves provide the advantage that you can see how the patient's risk might develop over time.
Figure Prognosis shown as survival curves (dashed line indicates median survival). too short of a study A: Good prognosis (or too short of a study!). B: Poor prognosis early, then slower increase in mortality, with median survival of 3 months. C: Good prognosis early, then worsening, with median survival of 9 months. D: Steady prognosis.
Figure shows four survival curves, each leading to a different conclusion. In panel A of this figure, virtually no patients have had events by the end of the study, which could mean that either prognosis is very good for this target disorder (in which case the study is very useful to us) or the study is too short (in which case this study isn’t very helpful). panels B, C and D, the proportion of patients surviving to 1 year (20%) is the same in all three graphs In panels B, C and D, the proportion of patients surviving to 1 year (20%) is the same in all three graphs. And we could tell our patients that their chance of surviving for a year are 20%. median survival is very different However, the median survival (point at which half will have died—shown by the dashed line) is very different: 3 months for panel B, vs. 9 months for the disorder in panel C. The survival pattern is a steady, uniform decline only in panel D, and the median survival here is approximately 7.5 months. These examples highlight the importance of considering median survival and survival curves in order to fully inform our patient about prognosis. 20% 20% 20% Median Survival
Figure 1D-1 shows two survival curves-- one of survival after a myocardial infarction and the other depicting the results of hip replacement surgery in terms of when patients needed a revision because something had gone wrong after the initial surgery. Note that the chance of dying after a myocardial infarction is highest shortly after the event (reflected by an initially steep downward slope of the curve, which then becomes flat), whereas very few hip replacements require revision until much later (this curve, by contrast, starts out flat and then steepens MI Revision of Hip Surgery
Prognosis If subgroups with different prognoses are identified, was there adjustment for important prognostic factors and validation of these factors in an independent “test set” patients? Prognostic factors are demographic disease- specific co morbid variablesassociated Prognostic factors are demographic (e.g. age, gender), disease- specific (e.g. mitral valve prolapse with mitral regurgitation), or co morbid (e.g. hypertension) variables that are associated with the outcome of interest. Prognostic factors need not be causal—and in fact they are often not—but they must be strongly associated with the development of an outcome to predict its occurrence. For example, although mild hyponatremia does not cause death, serum sodium is an important prognostic marker in congestive heart failure (individuals with congestive heart failure and hyponatremia have higher mortality rates than heart failure patients with normal serum sodium). smokingrisk factor tumor stage prognostic factor Risk factors are often considered distinct from prognostic factors, and include lifestyle behaviors and environmental exposures that are associated with the development of a target disorder. For example, smoking is an important risk factor for developing lung cancer, but tumor stage is the most important prognostic factor in individuals who have lung cancer.
Prognosis - Validity How precise are the estimates? Any study looks at a sample of the population, so we would expect some variation between the sample and 'truth'. Prognostic estimates should be accompanied by Confidence Intervals to represent this. You should take account of this range when extracting estimates for your patient.
Prognosis - Validity If it is very wide, you would question whether the study had enough patients to provide useful information. The standard error for a proportion (p) is: – SE = √{[p(1-p)]/n} – Where p is the proportion and n is the number of subjects. Assuming a normal distribution, the 95% confidence interval is 1.96 times this value on either side of the estimate.
5 Yr Survival Rates Non SC Lung CA Meta-Analysis [70 ± 10 %] [Survival = 70% SE = 5.1%] ± 1.96 x 5.1 = ± or 10%
Therapy Articles
Is the study valid? 1 Was there a clearly defined research question? What question has the research been designed to answer? Was the question focused in terms of the population group studied, the intervention received and the outcomes considered? 2 Were the groups randomized? The major reason for randomization is to create two (or more) comparison groups which are similar at the start of the trial. To reduce bias as much as possible, the decision as to which treatment a patient receives should be determined by random allocation.
Therapy Articles Allocation Concealment As a supplementary point, clinicians who are entering patients into a trial may consciously or unconsciously distort the balance between groups of they know the treatments given to previous patients. For this reason, it is preferable that the randomization list be concealed from the clinicians. allocation concealment This is known as allocation concealment and is the most important thing to look for in appraising RCTs (Randomized Controlled Trials).
Therapy Articles 3 Were all patients accounted for at its conclusion? There are three major aspects to assessing the follow up of trials: Did so many patients drop out of the trial that its results are in doubt? Was the study long enough to allow outcomes to become manifest? Were patients analyzed in the groups to which they were originally assigned?
Therapy Articles Drop-out rates Undertaking a clinical trial is usually time-consuming and difficult to complete properly. If less than 80% of patients are adequately followed up then the results should be ignored. You look at the follow-up rate reported in the study and ask yourself 'what if everyone who dropped out had a bad outcome?' Length of study Studies must allow enough time for outcomes to become manifest. You should use your clinical judgment to decide whether this was true for the study you are appraising, and whether the length of follow up was appropriate to the outcomes you are interested in.
Therapy Articles 4 Were the research participants 'blinded'? Ideally, patients and clinicians should not know whether they are receiving the treatment. The assessors may unconsciously bias their assessment of outcomes if they are aware of the treatment. This is known as observer bias. idealblind patients, care givers, assessors analysts So, the ideal trial would blind patients, care givers, assessors and analysts alike. The terms 'single-', 'double-' and 'triple-blind' are sometimes used to describe these permutations. However, there is some variation in their usage and you should check to see exactly who was blinded in a trial. Of course, it may have been impossible to blind certain groups of participants, depending on the type of intervention. Researchers should endeavor to get around this, for example by blinding outcomes assessors to the patients' treatment allocation.
Therapy Articles Placebo control A placebo control should be used Patients do better if they think they are receiving a treatment than if they do not. A placebo control should be used so that patients can't tell if they're on the active treatment or not.
Therapy Articles 5 Equal treatment It should be clear from the article that, for example, there were no co-interventions which were applied to one group but not the other and that the groups were followed similarly with similar check-ups. 6Did randomization produce comparable groups at the start of the trial? The purpose of randomization is to generate two (or more) groups of patients who are similar in all important ways. The authors should allow you to check this by displaying important characteristics of the groups in tabular form.
Therapy Articles Are the results important? Two things you need to consider are how large is the treatment effect and how precise is the finding from the trial. In any clinical therapeutic study there are three explanations for the observed effect: 1 bias; 2 chance variation between the two groups; 3 the effect of the treatment. Could this result have happened if there was no difference between the groups? Once bias has been excluded (by asking if the study is valid), we must consider the possibility that the results are a chance effect. Alongside the results, the paper should report a measure of the likelihood that this result could have occurred if the treatment was no better than the control.
Therapy Articles p values The p value is a commonly used measure of this probability. Conventionally, the value of 0.05 is set as the threshold for statistical significance. If the p value is below 0.05, then the result is statistically significant; it is unlikely to have happened if there was no difference between the groups.
Therapy Articles Look to see if the confidence interval crosses the 'line of no difference' between the interventions. If so, then the result is not statistically significant. The confidence interval is better than the p value because it shows you how much uncertainty there is around the stated result.
Therapy Articles Quantifying the risk of benefit and harm Once chance and bias have been ruled out, we must examine the difference in event rates between the control and experimental groups to see if there is a significant difference. These event rates can be calculated as shown below.
Therapy Articles
Relative risk or risk ratio (RR) RR is the ratio of the risk in the experimental group divided by the risk in the control group. Absolute risk reduction (ARR) ARR is the difference between the event rates in the two groups. Relative risk reduction (RRR) Relative risk reduction is the ARR as a percentage of the control group risk
Therapy Articles ARR relative measures 'factor out' the baseline risk ARR is a more clinically relevant measure to use than the RR or RRR. This is because relative measures 'factor out' the baseline risk, so that small differences in risk can seem significant when compared to a small baseline risk-see example below.
Therapy Articles Stroke Risk Reduction secondary to Statins. 7 million 72 The benefits of ARR (NNT of 72 vs 7 Million) vs RRR (25% in both).
Therapy Articles 1. What is the magnitude of the treatment effect? There are a variety of methods that we can use to describe results; we’ve included the most important ones in Table 5.3, and we’ll illustrate them with the help of the statin study.Table 5.3 As you can see from the actual trial results in Table 5.3, at a mean of 5 years’ follow-up, stroke occurred among 5.7% of patients randomized to the control group (we’ll call this the “control event rate”, CER), and in 4.3% of the patients assigned to receive statin therapy (we’ll call this the “experimental event rate”, EER).Table 5.3 This difference was statistically significant, but how can it be expressed in a clinically useful way? Most often we see this effect reported in clinical journals as the relative risk reduction (RRR) calculated as (|CER − EER|/CER). In this example, the RRR is (5.7% − 4.3%)/5.7% (i.e. 25%), and we can say that statin therapy decreased the risk of stroke by 25% relative to those who received placebo. In a similar way, we can describe the situation in which the experimental treatment increases the risk of a good event as the “relative benefit increase” (RBI; also calculated as |CER − EER|/CER). Finally, if the treatment increases the probability of a bad event, we can use the same formula to generate the “relative risk increase” (RRI).
Therapy Articles One of the disadvantages of the RRR, which makes it unhelpful for our purposes, is revealed in the hypothetical data outlined in the bottom row of Table 5.3. The RRR doesn’t reflect the risk of the event without therapy (the CER, or baseline risk), and therefore cannot discriminate huge treatment effects from small ones. For example, if the stroke risk was trivial ( %) in the control group and similarly trivial ( %) in the experimental group, the RRR remains 25%!Table 5.3
Therapy Articles One measure that overcomes this lack of discrimination between small and large treatment effects looks at the absolute arithmetic difference between the rates in the two groups. absolute risk reduction This is called the “absolute risk reduction” (ARR) (or the risk difference) and it preserves the baseline risk. In the statin trial, the ARR is 5.7% − 4.3%=1.4%. ARR is a more meaningful measure of treatment effects than is the RRR. In our hypothetical case where the baseline risk is trivial, the ARR is trivial too, at %. Thus, the ARR is a more meaningful measure of treatment effects than is the RRR. When the experimental treatment increases the probability of a good event, we can generate the “absolute benefit increase” (ABI), which is also calculated by finding the absolute arithmetic difference in event rates. Similarly, when the experimental treatment increases the probability of a bad event, we can calculate the “absolute risk increase” (ARI).
Therapy Articles Number needed to treat (NNT) Number needed to treat is the most useful measure of benefit, as it tells you the absolute number of patients who need to be treated to prevent one bad outcome. It is the inverse of the ARR: The confidence interval of an NNT is 1/the CI of its ARR:
Therapy Articles The inverse of the ARR (1/ARR) is a whole number and has the useful property of telling us the number of patients that we need to treat (NNT) with the experimental therapy for the duration of the trial in order to prevent one additional bad outcome. In our example, the NNT is 1/1.4%=72, which means we would need to treat 72 people with a statin (rather than placebo) for 5 years to prevent one additional person from suffering a stroke. In our hypothetical example, in the bottom row of Table 5.3, the clinical usefulness of the NNT is underscored, for this tiny treatment effect means that we would have to treat over 7 million patients for 5 years to prevent one additional bad event!Table 5.3
Therapy Articles
Should we be impressed with an NNT of 72? smaller We can get an idea by comparing it with NNTs for other interventions and durations of therapy, tempered by our own clinical experience and expertise. The smaller the NNT is, the more impressive the result. seriousness of the outcome However, we should also consider the seriousness of the outcome that we are trying to prevent. We’ve provided some examples of NNTs in Table 5.4. For example, we’d only need to treat 7 people with mild-to-moderate Alzheimer’s dementia with donepezil to prevent one person from experiencing functional decline at 1 year. In contrast, we’d have to treat over 100 people with hypertension for 5.5 years to prevent one death, stroke or myocardial infarction.Table 5.4
Therapy Articles
adverse effects of therapy number needed to cause harm to one more patient (NNH) We can describe the adverse effects of therapy in an analogous fashion, as the number needed to cause harm to one more patient (NNH) from the therapy. The NNH is calculated as 1/ARI. (Absolute Risk Increase) In the statin study, 0.03% of the control group experienced rhabdomyolysis compared with 0.05% of patients who experienced this in the group that received a statin. This absolute risk increase of |0.03% − 0.05%|=0.02% generates an NNH over 5 years of This means that we’d need to treat 5000 patients with a statin for 5 years to cause one additional patient to have rhabdomyolysis. effort:yield ratio Thus, the NNT and NNH provide us with a nice measure of the effort we and our patients have to expend to prevent or cause one more bad outcome, and their attractiveness as an effort:yield ratio (or “poor clinicians’ cost-effectiveness analysis”) is easily recognized. Treat 72 patients for 5 years to prevent 1 stroke at the risk of giving one person in 5000 rhabdomyolysis in the same 5 year interval. i.e. Treat 72 patients for 5 years to prevent 1 stroke at the risk of giving one person in 5000 rhabdomyolysis in the same 5 year interval.
Therapy Articles To understand NNTs, we need to consider some additional features. always have a dimension of follow-up time First, they always have a dimension of follow-up time associated with them. Quick reference to Table 5.4 reminds us that the NNT of 10 to prevent one more major stroke or death by performing endarterectomy on patients with symptomatic high- grade carotid stenosis refers to outcomes over a 2-year period (in this case, from an operation that is over in minutes).Table 5.4 “time adjustment” One consequence of this time dimension is that, if we want to compare NNTs for different follow-up times, we have to make an assumption about them and a “time adjustment” to at least one of them. Say that we wanted to compare the NNTs to prevent one additional stroke, myocardial infarction or death with drugs among patients with mild vs. severe hypertension. 1.5 years 5.5 years Another quick look at Table 5.4 gives us an NNT at 1.5 years of just 8 for severe hypertensives (who already have a lot of target organ damage), and an NNT at 5.5 years of 128 for milder hypertensives (most of whom are free of target organ damage).Table 5.4 To compare their NNTs, we need to adjust at least one of them so that they relate to the same follow-up time. The assumption that we make here is that the RRR from antihypertensive therapy is constant over time (i.e. we assume that antihypertensive therapy exerts the same relative benefit in year 1 as it does over the next 4 years). If we are comfortable with that assumption (it appears safe for hypertension), we can then proceed to make the time adjustment.
Therapy Articles
Let’s adjust the NNT for the mild hypertensives (128 over the “observed” 5.5 years) to an NNT corresponding to a “hypothetical” 1.5 years. This is done by multiplying the NNT for the “observed” follow-up time by a fraction with the “observed” time in the numerator and the “hypothetical” time in the denominator. In this case, adjusting the NTT of 128 for mild hypertensives to its hypothetical value for 1.5 years becomes: (By convention, we round any decimal NNT upwards to the next whole number.) Now we can appreciate the vast difference in the yield of clinical efforts to treat mild vs. severe hypertensives: we need to treat 470 of the former, but only 3 of the latter for 1.5 years in order to prevent one additional bad outcome. The explanation lies in the huge difference in CERs –Control Event Rate (far higher in severe hypertensives followed for just 1.5 years than in mild hypertensives followed for 5.5 years).
Therapy Articles Is there some quick way of incorporating patient values that doesn’t do too much violence to the truth? Returning to our stroke patient and using the data in Table 5.3, we found that the ARR was 1.4% and the NNT was 72. We could use this to tell our patient that he has a 1 in 72 chance of being helped by a statin and a stroke being prevented.Table 5.3 Similarly, looking at his risk of harm from Table 5.3, we could tell him that he has a 1 in 5000 chance of experiencing harm (e.g. rhabdomyolysis) with statin therapy.Table 5.3 likelihood of being helped vs. harmed then becomes: Our first approximation of his likelihood of being helped vs. harmed then becomes: LHH = (1/NNT):(1/NNH) = (1/72):(1/5000) = 70* We could then tell our patient that statin therapy is 70 times more likely to help him than to harm him.
Therapy Articles
PRACTICING EBM IN REAL-TIME a short-cut: Calculating the measures of treatment effect: a short-cut: Rather than memorizing the formula described above, we could instead use an EBM calculator whenever we need to calculate the measure of the treatment effect (i.e. if the results of the study aren’t presented in the article using these measures). This tool saves us time and decreases the risk of a mathematical error. From the DOMedEd website and on the accompanying CD you can download an EBM calculator for palm/pocket pc that we’ve modified from ( this calculator can be loaded onto your PDA. There is also an online (browser-based) calculator as well as an Excel spreadsheet for download for desktop use.DOMedEdwww.cebm.utoronto.ca
Therapy Articles Calculates 95% CI Mortality in Acute MI with and without Captopril
Therapy Articles
Parametric (Data where there is the assumption of an underlying normal distribution-usually continuous) vs. Non-parametric (“binary” data i.e. alive/dead) Articles
Therapy Articles
(Not Independent)
Therapy Articles
Summary An evidence-based approach to deciding whether a treatment is effective for your patient involves the following steps: 1 Frame the clinical question. 2 Search for evidence concerning the efficacy of the therapy. 3 Assess the methods used to carry out the trial of the therapy. 4 Determine the NNT of the therapy. 5 Decide whether the NNT can apply to your patient, and estimate a particularized NNT. 6 Incorporate your patient's values and preferences into deciding on a course of action.
The Randomized Controlled Trial Evaluation
Randomized Controlled Trial A randomized controlled trial is an experimental, prospective study in which "participants are randomly allocated into an experimental group or a control group and followed over time for the variables/outcomes of interest."experimental Study participants are randomly assigned to ensure that each participant has an equal chance of being assigned to an experimental or control group, thereby reducing potential bias. Outcomes of interest may be death (mortality), a specific disease state (morbidity), or even a numerical measurement such as blood chemistry level. Now let’s look at a diagram of a typical RCT that represents the flow of participants from the start of the study through the study outcome. Notice in all diagrams the study start; studies progressing from left to right represent prospective studies, “collecting data about a population whose outcome lies in the future”
Randomized Controlled Trial
Similar subjects are randomly assigned to a treatment group and followed to see if they develop the outcome of interest. RCTs are the most powerful method of eliminating (known and unknown) confounding variables and permit the most powerful statistical analysis (including subsequent meta-analysis). RCTs are the most powerful method of eliminating (known and unknown) confounding variables and permit the most powerful statistical analysis (including subsequent meta-analysis). However, they are expensive, sometimes ethically problematic, and may still be subject to selection and observer biases.
Five steps in EBM 1.Formulate an answerable question 2.Track down the best evidence 3.Critically appraise the evidence for: – Relevance – Validity – Impact (size of the benefit) – Applicability 4.Integrate with clinical expertise and patient values 5.Evaluate our effectiveness and efficiency – keep a record; improve the process
A CHECKLIST FOR APPRAISING RANDOMIZED CONTROLLED TRIALS 1.Was the objective of the trial sufficiently described? 2.Was a satisfactory statement given of the diagnostic criteria for entry to the trial? 3.Were concurrent controls used (as opposed to historical controls)? 4.Were the treatments well defined? 5.Was random allocation to treatments used? 6.Was the potential degree of blindness used? 7.Was there a satisfactory statement of criteria for outcome measures? Was a primary outcome measure identified? 8.Were the outcome measures appropriate? 9.Was a pre-study calculation of required sample size reported? 10.Was the duration of post-treatment follow-up stated? 11.Were the treatment and control groups comparable in relevant measures? 12.Were a high proportion of the subjects followed up? 13.Were the drop-outs described by treatment and control groups? 14.Were the side-effects of treatment reported? 15.How were the ethical issues dealt with? 16.Was there a statement adequately describing or referencing all statistical procedures used? 17.What tests were used to compare the outcome in test and control patients? 18.Were 95% confidence intervals given for the main results? 19.Were any additional analyses done to see whether baseline characteristics (prognostic factors) influenced the outcomes observed? 20.Were the conclusions drawn from the statistical analyses justified? Searching for critical appraisal checklists randomized controlled trials. 11,100 articles (0.40 seconds)
Clinical Question In people who take long-haul flights does wearing graduated compression stockings prevent DVT?
Participants Intervention Group (IG) & Comparison Group (CG) Outcome QUESTION: VALIDITY
Participants Intervention Group (IG) & Comparison Group (CG) Outcome IGIGIGIG CGCGCGCG DC BA VALIDITY QUESTION:
Participants Intervention Group (IG) & Comparison Group (CG) Outcome IGIGIGIG CGCGCGCG DC BA Recruitment VALIDITY QUESTION:
Participants Intervention Group (IG) & Comparison Group (CG) Outcome IGIGIGIG CGCGCGCG DC BA Recruitment VALIDITY QUESTION: Allocation concealment? comparable groups?
Participants Intervention Group (IG) & Comparison Group (CG) Outcome IGIGIGIG CGCGCGCG DC BA Recruitment VALIDITY treated equally? compliant? Maintenance QUESTION: Allocation concealment? comparable groups?
Participants Intervention Group (IG) & Comparison Group (CG) Outcome IGIGIGIG CGCGCGCG DC BA Recruitment VALIDITY treated equally? compliant? Maintenance Measurements blind? OR objective? QUESTION: Allocation concealment? comparable groups?
The RAMMbo Method
Appraisal checklist - RAMMbo Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintenance – Were the groups treated equally? – Were outcomes ascertained & analysed for most patients? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Scurr et al, Lancet 2001; 357: Randomization Volunteers were randomized by sealed envelope to one of two groups. Envelopes Passengers were randomly allocated to one of two groups: one group wore class-I below-knee graduated elastic compression stockings, the other group did not.
Please open your envelopes Blue Bunnies Pink Bunnies Been to New York Argued with your boss
Ensuring Allocation Concealment BEST – most valid technique Central computer randomization DOUBTFUL Envelopes, etc NOT RANDOMIZED Date of birth, alternate days, etc
Were the groups similar at the trial’s start? By chance a greater proportion of women were included in the stocking group p <0.01 Page 96
Appraisal checklist - RAMMbo Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintenance – Were outcomes ascertained & analysed for most patients? – Were the groups treated equally? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Effects of non-equal treatment Apart from actual intervention - groups should receive identical care! – Trial of Vitamin E in pre-term infants (1949) – Vit E "prevented" retrolental fibroplasia removal from Oxygen – (By removal from Oxygen to give the frequent doses of Vit E!) Rx: Give placebo in an identical regime, and a standard protocol
Equal treatment in DVT study? Table 3: All drugs taken by volunteers who attended for examination before and after air travel*
Follow-up in DVT study? 200 of 231 analyzed (87%) 27 were unable to attend for subsequent ultrasound 2 were excluded from analysis because they were upgraded to business class 2 were excluded from analysis because they were taking anticoagulants Scurr et al, Lancet 2001; 357:
Losses-to-follow-up How many is too many? “5-and-20 rule of thumb” 5% probably leads to little bias >20% poses serious threats to validity Depends on outcome event rate and comparative loss rates in the groups Loss to follow-up rate not exceed outcome event ra Loss to follow-up rate should not exceed outcome event rate and should not be differential
How important are the losses? Equally distributed? Stocking group: 6 men, 9 women - 15 No stocking group: 7 men, 9 women - 16 Similar characteristics? No information provided
Intention-to-Treat Principle Maintaining the randomization Principle: Once a patient is randomized, s/he should be analyzed in the group randomized to - even if they discontinue, never receive treatment, or crossover. Exception: If patient is found on BLIND reassessment to be ineligible based on pre-randomization criteria.
Appraisal checklist Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintainence – Were outcomes ascertained & analysed for most patients? – Were the groups treated equally? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Measures in DVT study? Blood was taken from all participants before travel All participants had US once before travel (30 had US twice) All participants were seen within 48 hr of return flight, were interviewed and completed a questionnaire, had repeat US Scurr et al, Lancet 2001; 357:
Measurement Bias - minimizing differential error Blinding – Who? – Participants? – Investigators? – Outcome assessors? – Analysts? Most important to use "blinded" outcome assessors when outcome is not objective! Papers should report WHO was blinded and HOW it was done Schulz and Grimes. Lancet, 2002
Evaluation Most passengers removed their stockings on completion of their journey. The nurse removed the stockings of those passengers who had continued to wear them. A further duplex examination was then undertaken with the technician unaware of the group to which the volunteer had been randomized.
Appraisal checklist Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintainence – Were the groups treated equally? – Were outcomes ascertained & analysed for most patients? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? 5. Placebo Effect 6. Chance 7. Real Effect Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Placebo effect Trial in patients with chronic severe itching Cyproheptadine HCL Trimeprazine tartrate No treatment Treatment vs no treatment for itching
Placebo effect Trial in patients with chronic severe itching Cyproheptadine HCL Trimeprazine tartrate Placebo No treatment Treatment vs no treatment vs placebo for itching Placebo effect - attributable to the expectation that the treatment will have an effect
Appraisal checklist Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintainence – Were the groups treated equally? – Were outcomes ascertained & analysed for most patients? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? 5. Placebo Effect 6. Chance 7. Real Effect Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Results DVT = 12/100 for No Stockings and 0/100 for those with Stockings
Using the Online Calculator The original article presents only confidence intervals—you can get more meaningful data using the calculator 9 Remember NNT need to be “whole” people, thus NNT = 9
Two methods of assessing the role of chance P-values (Hypothesis Testing) – use statistical test to examine the ‘null’ hypothesis – associated with “p values” - if p<0.05 then result is statistically significant Confidence Intervals (Estimation) – estimates the range of values that is likely to include the true value
P-values (Hypothesis Testing) - in DVT study Incidence of DVT – Stocking group - 0 – No Stocking group (ARR) Risk difference (ARR) Risk difference = = 0.12 (P=0.002) The probability that this result would only occur by chance is 2 in 1000 statistically significant
Confidence Intervals (Estimation) - in DVT study Incidence of DVT – Stocking group - 0 – No Stocking group Risk difference = = 0.12 (95% CI, ) The true value could be as low as or as high as but is probably closer to 0.12 Since the CI does not include the ‘no effect’ value of ‘0’ the result is statistically significant
Appraisal checklist Study biases 1. Recruitment Who did the subjects represent? 2. Allocation – Was the assignment to treatments randomised? – Were the groups similar at the trial’s start? 3. Maintainence – Were the groups treated equally? – Were outcomes ascertained & analysed for most patients? 4. Measurements – Were patients and clinicians “blinded” to treatment? OR – Were measurements objective & standardised? 5. Placebo Effect 6. Chance 7. Real Effect Study statistics (p-values & confidence intervals) Guyatt. JAMA, 1993
Causes of an "Effect" in a controlled trial Who would now consider wearing stockings on a long haul flight? Because we should never change our practice patterns based on one study, we ask ourselves—have other studies been done, are comparable, and demonstrate the same outcome? Onward to the Cochrane Library!!
M Clarke, S Hopewell, E Juszczak, A Eisinga, M Kjeldstrøm Compression stockings for preventing deep vein thrombosis in airline passengers Cochrane Database of Systematic Reviews 2006 Issue 4 10 RCTs (n = 2856); nine (n = 2821) compared wearing stockings on both legs versus not wearing them, and one (n = 35) compared wearing a stocking on one leg for the outbound flight and on the other leg on the return flight. Of the nine trials, seven included people judged to be at low or medium risk (n = 1548) and two included high risk participants (n = 1273). All flights lasted at least seven hours. Fifty of 2637 participants in the trials of wearing stockings on both legs had a symptomless DVT; three wore stockings, 47 did not (or 0.10, 95% CI 0.04 to 0.25, P < ). No deaths, pulmonary emboli or symptomatic DVTs were reported. Wearing stockings had a significant impact in reducing oedema (based on six trials). No significant adverse effects were reported.
M Clarke, S Hopewell, E Juszczak, A Eisinga, M Kjeldstrøm Compression stockings for preventing deep vein thrombosis in airline passengers Cochrane Database of Systematic Reviews 2006 Issue 4 Heterogeneity check Statistical Significance
The Cochrane Collaboration
Up Date: M Clarke, S Hopewell, E Juszczak, A Eisinga, M Kjeldstrøm Compression stockings for preventing deep vein thrombosis in airline passengers -This is a reprint of a Cochrane review, prepared and maintained by The Cochrane Collaboration and published in The Cochrane Library 2009, Issue 3