Presentation is loading. Please wait.

Presentation is loading. Please wait.

Plan GRADE background certainty in evidence (quality, confidence evidence) evidence profiles strength of recommendation exercises in applying GRADE.

Similar presentations


Presentation on theme: "Plan GRADE background certainty in evidence (quality, confidence evidence) evidence profiles strength of recommendation exercises in applying GRADE."— Presentation transcript:

1 Plan GRADE background certainty in evidence (quality, confidence evidence) evidence profiles strength of recommendation exercises in applying GRADE

2 experience participating guideline panels?
clin epi methodology course? is grading recommendations a good idea? If so, why? experience with grading systems used?

3 Grading good idea, but which grading system to use?
many available Australian National and MRC Oxford Center for Evidence-based Medicine Scottish Intercollegiate Guidelines (SIGN) US Preventative Services Task Force American professional organizations AHA/ACC, ACCP, AAP, Endocrine society, etc.... cause of confusion, dismay

4

5 Common international grading system?
GRADE (Grades of recommendation, assessment, development and evaluation) international group Australian NMRC, SIGN, USPSTF, WHO, NICE, Oxford CEBM, CDC, CC ~ 35 meetings over last 14 years (~10 – 70 attendants)

6 GRADE GUIDANCE 2004 BMJ, first description 2008 BMJ six part series
for guideline users , 21 part series, 15 published for systematic review authors, HTA practitioners, guideline developers

7 Grading system – for what?
interventions management strategy 1 versus 2 what grade is not about individual studies (body of evidence)

8 What GRADE is not primarily about
diagnostic accuracy questions in patients with a sore leg, what is the accuracy of a blood test (D-Dimer) in sorting out whether a deep venous thrombosis is the cause of the pain prognosis what it is about: diagnostic impact are patients better off (improved outcomes) when doctors use the d-dimer test

9 80+ Organizations 2005 2006 2007 2008 2009 2010 2011 9 9

10 GRADE uptake

11 What are we grading? two components
certainty in estimate of effect adequate to support decision (quality of body of evidence) high, moderate, low, very low

12 Likelihood of and confidence in an outcome
We can look at this as depicted in this cartoon. The likelihood of and the confidence in an outcome. In the cartoon one meteorologist is saying to another, I figure there is a 40% chance of showers and a 10% chance we know what we are talking about. Once again, this expresses our confidence in an estimate of effect and the likelihood that it actually occurs. For instance, the confidence intervals around the 404 chance of showers estimate may be very tight. They may in fact be based on modeling that has come up with confidence intervals that range from 35 – 45 %. However, the development of the model or the application of the model from one setting to another may leave us with very little confidence that the estimate is actually correct for the particular setting. Just imagine that model being developed in Australia and applied to North America. Once again, this is similar to how we look at the confidence in evidence in the GRADE approach. 12

13 Semantic Issue: Label for trustworthiness
Quality Initial choice, defined as confidence natural to clinicians, but confusion with risk of bias Confidence what we actually mean, but confusion with confidence intervals, and experts always confident Certainty avoids confusion of others, experts might acknowledge uncertainty - Current preferred term

14 What are we grading? two components
certainty in evidence adequate to support decision (quality of body of evidence) high, moderate, low, very low strength of recommendation strong and weak weak alternatives conditional, contingent, discretionary

15 Generate an estimate of effect for each outcome
Studies S1 S2 S3 S4 S5 Health Care Question (PICO) Systematic reviews Outcomes OC1 OC2 OC3 OC4 Important outcomes OC1 OC2 Critical outcomes OC3 OC4 Generate an estimate of effect for each outcome Rate the quality of evidence for each outcome, across studies RCTs start high, observational studies start low (-) Study limitations Imprecision Inconsistency of results Indirectness of evidence Publication bias likely Final rating of quality for each outcome: high, moderate, low, or very low (+) Large magnitude of effect Dose response Plausible confounders would ↓ effect when an effect is present or ↑ effect if effect is absent Rate overall quality of evidence (lowest quality among critical outcomes) Decide on the direction (for/against) and grade strength (strong/weak*) of the recommendation considering: Quality of the evidence Balance of desirable/undesirable outcomes Values and preferences Decide if any revision of direction or strength is necessary considering: Resource use *also labeled “conditional” or “discretionary” 15

16 Structured question patients: intervention, testosterone
Males over 50 presenting with fatigue, malaise and erecticle dysfunction with laboratory evidence of decreased testosterone intervention, testosterone comparator no testosterone outcomes?

17 Rating certainty Where to start RCTs and observational studies (High, moderate, low, very low)? Recall antioxidant vitamins Observational studies less cancer, CV outcomes RCTs no difference Result observed repeatedly What went wrong?

18 Determinants of confidence
RCTs start high observational studies start low what can lower confidence? risk of bias inconsistency indirectness imprecision publication bias

19 Risk of Bias - RCTs what to consider? well established more recent
concealment intention to treat principle observed blinding completeness of follow-up more recent selective outcome reporting bias Stopping early for benefit

20 RoB – Observational Studies
what to consider? accurate assessment of exposure adjusted analysis for all important prognostic factors, accurately measures accurate assessment of outcome completeness of follow-up

21 Risk of Bias differs – what to do?
6 studies, 100 patients each 3 studies low risk of bias, 3 high rate down for risk of bias?

22 Consistency

23 Consistency

24 Consistency of results
How did you decide? Similarity of point estimates less similar, less happy Overlap of confidence intervals less overlap, less happy

25 -40 -24 -8 8 24 40 56 RRR (95% CI)

26 Homogenous test for heterogeneity what is the p-value?
what is the null hypothesis for the test for heterogeneity? Ho: RR1 = RR2 = RR3 = RR4 p=0.99 for heterogeneity

27 Heterogeneous test for heterogeneity what is the p-value?
p-value for heterogeneity < 0.001 p-value for heterogeneity < 0.001

28 Only a little concerned
I2 Interpretation 100% Why are we pooling? 75% Very concerned 25% Only a little concerned 50% Getting concerned 0% No worries

29 Homogenous What is the I2 ? p=0.99 for heterogeneity I2=0%

30 Heterogeneous What is the I2 ? I2=89%
p-value for heterogeneity < 0.001 I2=89%

31 Relative Risk with 95% CI for Vitamin D Non-vertebral Fractures

32 Relative Risk with 95% CI for Vitamin D
(Non-Vertebral Fractures, Dose >400)

33 Relative Risk with 95% CI for Vitamin D
(Non-Vertebral Fractures, Dose = 400)

34 Should we believe sub-group analysis?
within-study comparison? No unlikely chance Yes, p = 0.006 consistent across studies Yes one of small number a priori hypothesis with direction Yes biologically compelling Yes shall we believe sub-group analysis?

35 Credibility of sub-group analysis
no way sure thing 100

36 Confidence judgments: Directness
populations older, sicker or more co-morbidity interventions warfarin in trials vs clinical practice outcomes important versus surrogate outcomes glucose control versus CV events

37 Hierarchy of outcomes according to their patient-importance effect of phosphate lowering drugs in patients with renal failure and hyperphophatemia Importance of endpoints Surrogates of declining importance Mortality 9 Critical for decision making Important, but not critical for decision making Of low patient- importance Coronary calcification Ca2+/P- Product Myocardial infarction 8 Bone density Ca2+/P- Product Fractures Pain due to soft tissue Calcification / function 6 Soft tissue calcification Ca2+/P- Product 5 4 Lower by one level for indirectness 3 Flatulence 2 Lower by two levels for indirectness 1

38 Directness Alendronate Risedronate Placebo interested in A versus B
available data A vs C, B vs C Alendronate Risedronate Placebo

39 Imprecision small sample size wide confidence intervals
small number of events wide confidence intervals uncertainty about magnitude of effect how do you decide what is too wide? primary criterion: would decisions differ at ends of CI

40 Precision atrial fib at risk of stroke
warfarin increases serious gi bleeding 3% per year 1,000 patients 1 less stroke 30 more bleeds for each stroke prevented 1,000 patients 100 less strokes 3 strokes prevented for each bleed where is your threshold? how many strokes in 100 with 3% bleeding?

41 1.0%

42 1.0%

43 1.0%

44 1.0%

45 Example: clopidogrel or ASA?
pts with threatened stroke RCT of clopidogrel vs ASA 19,185 patients ischaemic stroke, MI, or vascular death compared 939 events (5·32%) clopidogrel 1021 events (5·83%) with aspirin RR 0.91 (95% CI 0.83 – 0.99) (p=0·043) rate down for precision?

46 Clopidogrel or ASA for threatened vascular events
RCT 19,185 patients 1.7% – 0.1% RR 0.91 (95% CI 0.83 – 0.99) 1.0%

47 Non-inferiority

48 Non-inferiority

49 Non-inferiority

50 small trials, large effect analogy to stopping early
likely to be overestimate analogy to stopping early lack of prognostic balance solution: optimal information size # of pts from conventional sample size calculation specify control group risk, α, β, Δ

51 Fluoroquinolone prophylaxis in neutropenia:
infection-related mortality Total number of events: 47

52 sample size 1,002 α 0.05, β 0.20, Δ 0.25 RRR, CER 7% N = 6,000
Fluoroquinolone prophylaxis in neutropenia: infection-related mortality sample size 1,002 α 0.05, β 0.20, Δ 0.25 RRR, CER 7% N = 6,000

53 Publication bias high likelihood could lower quality when to suspect
number of small studies industry sponsored

54

55

56 Funnel Plot Fish oil on mortality

57 What can raise confidence?
What do you do high certainty, no RCTs? common criteria everyone used to do badly almost everyone does well quick action insulin for diabetic ketoacidosis? thyroxine for thyroid deficiency? hydrocortisone for adrenal insufficiency?

58 Dose-response gradient
childhood lymphoblastic leukemia risk for CNS malignancies 15 years after cranial irradiation no radiation: 1% (95% CI 0% to 2.1%) 12 Gy: 1.6% (95% CI 0% to 3.4%) 18 Gy: 3.3% (95% CI 0.9% to 5.6%).

59 Cetainty assessment criteria

60 Overall level of evidence
What to do when certainty differs across outcomes? options ignore all but primary previous approach least certainty of any outcome some blended approach least certainty of critical outcomes

61 Trading off desirable and undersirable
what do patients/clinicians need to know relative risk reduction? absolute risk difference? Toxic treatment, 50% RRR mortality? OK? 1% to 1/2% OK? 40% to 20%, OK? body of evidence how do we get risk difference?

62 How to get absolute? meta-analysis get pooled relative risk
obtain baseline risk and multiply BR 10%, RRR 50%, RD 5% why not get risk difference directly?

63 RR 0.67 RD 10% RR 0.67 RD 3.3% RR 0.67 RD 1%

64

65 High versus low PEEP in ALI and ARDS
Population No. of participants (trials) † Higher PEEP Lower PEEP Adjusted Relative Risk (95% CI; P-value) ‡ Adjusted Absolute Risk Difference (95% CI) Quality Patients with ARDS 1892 (3) 324/951 (34.1%) 368/941 (39.1%) 0.90 (0.81 to 1.00; 0.049) -3.9% (-7.4% to -0.04%) High Patients without ARDS 404 (3) 50/184 (27.2%) 41/220 (18.6%) 1.37 (0.98 to 1.92; 0.065) 6.9% (-0.4% to 17.1%) Moderate (imprecision)

66 Strength of Recommendation
strong recommendation benefits clearly outweigh risks/hassle/cost risk/hassle/cost clearly outweighs benefit what can downgrade strength? low confidence in estimates close balance between up and downsides

67 Risk/Benefit tradeoff
aspirin after myocardial infarction 25% reduction in relative risk side effects minimal, cost minimal benefit obviously much greater than risk/cost warfarin in low risk atrial fibrillation warfarin reduces stroke vs ASA by 50% but if risk only 1% per year, ARR 0.5% increased bleeds by 1% per year Reason for clear recommendations in first example is that benefits moderate to large and risk or costs are minimal Reson that equivocal in second is that benefits slightly smaller, risks greater, and costs greater; means that close call (or at least some might think so)

68 Strength of Recommendations
Aspirin after MI – do it Warfarin rather than ASA in Afib -- probably do it -- probably don’t do it

69

70

71

72 Significance of strong vs weak
variability in patient preference strong, almost all same choice (> 90%) weak, choice varies appreciably interaction with patient strong, just inform patient weak, ensure choice reflects values use of decision aid strong, don’t bother; weak, use the aid quality of care criterion strong, consider; weak, don’t consider

73 When evidence is low confidence
choice more preference dependent risk aversion steroids for pulmonary fibrosis low quality evidence in support of benefit high quality evidence of toxicity

74 When confidence is low recommendation to the hopeful patient
I’m likely to deteriorate if something might work, let’s try it damn the torpedoes recommendation to the fearful patient doctor, you mean you know it’s toxic diabetes, skin changes, body habitus, infection, osteoporosis you don’t know for sure it works? are you crazy? weak recommendation mandated

75 Presentation strong weak never “we suggest…”
“we recommend”… weak “we suggest…” never we recommend (or suggest) you consider…

76 Challenge Comparator often not clear
Children with suspected or confirmed tuberculous meningitis should be treated with a four-drug regimen (HRZE) for 2 months, followed by a two-drug regimen (HR) for 10 months Offer and promote postpartum and post-abortion contraception to adolescents through multiple home visits and/or clinic visits 76

77 Strong recommendations, Low certainty: Discordant recs
Experts use often Why? What are the possibilities?

78 Why all the inappropriate strong recommendations?
panels don’t believe their own confidence ratings personal conviction trumps evidence believe weak recommendations ignored influence funders

79 Discordant recommendations: What are the possibilities?
good practice mistaken judgment inappropriate exceptional situation they got it right

80 Good Practice Statements
For patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excess Wealth of indirect linked evidence High confidence in net benefit Benefit clear Minimal harms or costs Poor use of guideline panel time effort summarize

81 Summarizing evidence poor use of time
symptoms and signs appear not infrequently Collect cohort studies of incidence Studies of accuracy of symptoms and signs patients suffer if clinicians fail to recognize Reports of untreated glucocorticoid excess clinical action can ameliorate the problem Evidence supporting therapy describe how evidence is linked

82 Questions panels considering good practice statement should ask
Is the statement clear and actionable? Is the message really necessary? Is the net benefit large and unequivocal? Is the evidence difficult to collect and summarize? If a public health guideline, are there specific issues that should be considered (e.g. equity) Have you made the rationale explicit? Is this better to be formally GRADEd?

83 Clear and actionable For patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excess Monitor how often? Nature of monitoring What to do if signs of excess found

84 Really necessary? For patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excess Really plausible that clinicians won’t monitor? If not, not necessary

85 Provide Rationale relevant symptoms and signs appear not infrequently
patients will suffer if clinicians fail to recognize these signs clinical action can ameliorate the problem.

86 1 2 3 4 5 LQE in a life-threatening situation
Fresh frozen plasma and intracranial bleed 2 LQoE benefit and HQoE suggests harm Head-to-toe CT/MRI screening for cancer. 3 LQoE suggests equivalence, HQoE less harm for one alternative Helicobacter pylori eradication early stage gastric MALT lymphoma 4 HQoE suggests equivalence, LQoE suggests harm in one alternative ACEI in hypertension in women planning conception and in pregnancy. 5 HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valued outcome Testosterone in males with or at risk of prostate cancer 86

87 1 2 3 4 5 LQE in a life-threatening situation
Fresh frozen plasma and intracranial bleed 2 LQoE benefit and HQoE suggests harm Head-to-toe CT/MRI screening for cancer. 3 LQoE suggests equivalence, HQoE less harm for one alternative Helicobacter pylori eradication early stage gastric MALT lymphoma 4 HQoE suggests equivalence, LQoE suggests harm in one alternative ACEI in hypertension in women planning conception and in pregnancy. 5 HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valued outcome Testosterone in males with or at risk of prostate cancer 87

88 1 2 3 4 5 LQE in a life-threatening situation
Fresh frozen plasma and intracranial bleed 2 LQoE benefit and HQoE suggests harm Head-to-toe CT/MRI screening for cancer. 3 LQoE suggests equivalence, HQoE less harm for one alternative Helicobacter pylori eradication early stage gastric MALT lymphoma 4 HQoE suggests equivalence, LQoE suggests harm in one alternative ACEI in hypertension in women planning conception and in pregnancy. 5 HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valued outcome Testosterone in males with or at risk of prostate cancer 88

89 1 2 3 4 5 LQE in a life-threatening situation
Fresh frozen plasma and intracranial bleed 2 LQoE benefit and HQoE suggests harm Head-to-toe CT/MRI screening for cancer. 3 LQoE suggests equivalence, HQoE less harm for one alternative Helicobacter pylori eradication early stage gastric MALT lymphoma 4 HQoE suggests equivalence, LQoE suggests harm in one alternative ACEI in hypertension in women planning conception and in pregnancy. 5 HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valued outcome Testosterone in males with or at risk of prostate cancer 89

90 1 2 3 4 5 LQE in a life-threatening situation
Fresh frozen plasma and intracranial bleed 2 LQoE benefit and HQoE suggests harm Head-to-toe CT/MRI screening for cancer. 3 LQoE suggests equivalence, HQoE less harm for one alternative Helicobacter pylori eradication early stage gastric MALT lymphoma 4 HQoE suggests equivalence, LQoE suggests harm in one alternative ACEI in hypertension in women planning conception and in pregnancy. 5 HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valued outcome Testosterone in males with or at risk of prostate cancer 90

91 Methods systematic survey of all published ES guidelines between 2005 and 2011 screening and extraction in duplicate for each recommendation: confidence in estimates, strength of recommendation strong recommendations based on LQE taxonomy for paradigmatic recommendations applied

92 If they did not fit one of 5 paradigms
Condition Example 1 Best practice statements For patients with Congenital Adrenal Hyperplasia, we recommend monitoring patients for signs of glucocorticoid excess 2 Additional research We recommend additional investigation using rodents and primates to further define the specific targets of androgen action 3 Mistaken judgment For overweight and obese children and adolescents, intensive lifestyle modification for the patient and entire family 4 Inappropriate strong recommendation In patients with primary aldosteronism who are unable or unwilling to undergo laparascopic adrenalectomy, we recommend medical treatment with mineralocorticoids

93 Guidelines Strenght and Confidence
Strong recommendations (n=206): n (%) Weak recommendations (n=151): High/moderate confidence in estimates 85 (41%) High/moderate confidence in estimates 16 (8%) Very low/low confidence in estimates 121 (59%) Very low/ low confidence in estimates 135 (92%) Totals (%) 206 (100%) 151 93

94 Condition N - 35 1 LQE in a life-threatening situation 13
LQoE benefit and HQoE suggests harm or a very high cost 7 LQoE suggests equivalence, HQoE less harm for one of the competing alternatives. 5 HQoE suggests equivalence of two alternatives and LQoE suggests harm in one alternative 9 HQoE suggests modest benefits and LQoE suggests possibility of catastrophic harm

95 Inappropriate strong recommendation
Condition 43 Best practice 5 Mistaken judgment Additional research 33 Inappropriate strong recommendation

96 Summary majority ES recommendations strong 121 (59%) discordant
35/121 (29%) of discordant appropriate of 86 inappropriate, 43 (50%) best practice statements 33/86 inappropriate, should have been weak recommendations

97 WHO recommendation mental health guideline 2013
8 of 9 confidence low, strong “On a number of occasions, the GDG decided to give a strong recommendation despite a GRADE assessment of the available evidence on effect as being of “very low quality”.

98 Why? “This occurred only when the following conditions applied: (a) there was certainty about the balance of benefits versus harms and burdens; (b) the expected values and preferences were clearly in favour of the recommendation; and (c) there was certainty about the balance between benefits and resources being consumed.”

99 Value and preference statements
underlying values and preferences always present sometimes crucial important to make explicit

100 Values and preferences
Stroke guideline: patients with TIA clopidogrel over aspirin (Grade 2B). Underlying values and preferences: This recommendation to use clopidogrel over aspirin places a relatively high value on a small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug expenditures.

101 Values and preferences
peripheral vascular disease: aspirin be used instead of clopidogrel (Grade 2A). Underlying values and preferences: This recommendation places a relatively high value on avoiding large expenditures to achieve small reductions in vascular events.

102 Values and preferences
Consider UpToDate style of values and preferences Weak recommendation low certainty evidence for trial of testosterone in men with apparent testosterone deficiency and cardiovascular disease Men who place a high value on minimizing risk of an adverse cardiovascular event and a relatively low value in ameliorating the symptoms of testosterone deficiency are likely to choose against testoserone use

103 Flavanoids for Hemorrhoids
venotonic agents mechanism unclear, increase venous return popularity 90 venotonics commercialized in France none in Sweden and Norway France 70% of world market possibilities French misguided rest of world missing out

104 Systematic Review 14 trials, 1432 patients key outcome
risk not improving/persistent symptoms 11 studies, 1002 patients, 375 events RR 0.4, 95% CI 0.29 to 0.57 minimal side effects is France right? what is the certainty of evidence?

105 What can lower confidence?
risk of bias lack of detail re concealment questionnaires not validated indirectness – no problem inconsistency, need to look at the results

106

107 Publication bias? size of studies 40 to 234 patients, most around 100
all industry sponsored

108

109 What can lower confidence?
risk of bias lack of detail re concealment questionnaires not validated inconsistency almost all show positive effect, trend heterogeneity p < 0.001; I2 65.1% indirectness imprecision RR 0.4, 95% CI 0.29 to 0.57 publication bias 40 to 234 patients, most around 100

110 Is France right? recommendation yes no against use strength strong
weak

111 Beta blockers in non-cardiac surgery
Quality Assessment Summary of Findings Quality Relative Effect (95% CI) Absolute risk difference Outcome Number of participants (studies) Risk of Bias Consistency Directness Precision Publication Bias Myocardial infarction 10,125 (9) No serious limitations No serious imitations Not detected High 0.71 (0.57 to 0.86) 1.5% fewer (0.7% fewer to 2.1% fewer) Mortality 10,205 (7) No serious limiations Imprecise Moderate 1.23 (0.98 – 1.55) 0.5% more (0.1% fewer to 1.3% more) Stroke 10,889 (5) No serious limitaions 2.21 (1.37 – 3.55) (0.2% more to 1.3% more0

112 Where to from here? GRADE values and preferences GRADE diagnosis
Aspirin for primary prevention Culprit only vs complete revascularization in STEMI Management of esophageal varices


Download ppt "Plan GRADE background certainty in evidence (quality, confidence evidence) evidence profiles strength of recommendation exercises in applying GRADE."

Similar presentations


Ads by Google