Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,

Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena, Rome

When to make a recommendation?   never   patient values differ   just lay out benefits and risks   when evidence strong enough   when very weak, too uncertain   clinicians need guidance intense study demands decision

Two examples from Clinical Evidence Aspirin for acute myocardial infarction “One systematic review in people with acute myocardial infarction has found that aspirin reduces mortality, reinfarction, and stroke at 1 month compared with placebo.”

Two examples from Clinical Evidence Systemic thrombolysis for acute stroke “One systematic review in people with confirmed ischaemic stroke found that thrombolysis reduced the risk of the composite outcome of death or dependency... However, it increased the risk of death from intracranial haemorrhage... The excess in deaths was offset by fewer people being alive but dependent 6 months after stroke onset, and the net effect was a reduction in people who were dead or dependent.... Results of the reviews may not extrapolate to people with the mildest or most severe strokes.”

Two examples from Clinical Evidence Systemic thrombolysis for acute stroke “We found little evidence about which people are most and least likely to benefit from thrombolysis. A subgroup analysis suggested that thrombolysis may be more beneficial if given within 3 hours of symptom onset, but the duration of the “therapeutic time window” could not be determined reliably”

What do users want from recommendations? Users are looking for different things Just tell me what to do (recommendation) What to do, and on strong or weak grounds recommendation and grade Recommend, grade, evidence summary, values systematic review, value statement Evidence from individual studies

How can we judge the extent of our confidence that adherence to a recommendation will do more good than harm?

Why Grade Recommendations? strong recommendations – –high quality methods – –large precise effect – –few down sides of therapy weak recommendations – –low quality methods – –imprecise estimate – –small effect – –substantial down sides

Why bother about grading? People draw conclusions about the – –quality of evidence – –strength of recommendations Systematic and explicit approaches can help – –protect against errors – –resolve disagreements – –facilitate critical appraisal – –communicate information

Why bother about grading? Alternate practitioner behavior Strong: apply uniformly – –just do it Weak: think about it – –examine evidence yourself – –consider patient circumstances – –explore with the patient However, there is wide variation in currently used approaches

Which grading system? Evidence Recommendation II-2B C+ 1 StrongStrongly recommended Organization   USPSTF   ACCP   GCPS

Still not confused? EvidenceRecommendation BClass I C+ 1 IVC Organization   AHA   ACCP   SIGN Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease

Grading System Current profusion: can there be consensus?

GRADE G rades of R ecommendation A ssessment, D evelopment and E valuation

What is GRADE*? o o Began as informal working group in 2000 o o Researchers/guideline developers with interest in methodology o o Aim: to develop a common system for grading the quality of evidence and the strength of recommendations that is sensible and to explore the range of interventions and contexts for which it might be useful* o o 13 meetings (~10 – 35 attendants) o o Evaluation of existing systems and reliability* o o Workshops at Cochrane Colloquia, WHO, GIN and various conferences since 2000 *Grade Working Group. CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005

BMJ 2004; 328:1490

GRADE Working Group David Atkins, chief medical officer a Dana Best, assistant professor b Martin Eccles, professor d Francoise Cluzeau, lecturer x Yngve Falck-Ytter, associate director e Signe Flottorp, researcher f Gordon H Guyatt, professor g Robin T Harbour, quality and information director h Margaret C Haugh, methodologist i David Henry, professor j Suzanne Hill, senior lecturer j Roman Jaeschke, clinical professor k Regina Kunx, Associate Professor Gillian Leng, guidelines programme director l Alessandro Liberati, professor m Nicola Magrini, director n James Mason, professor d Philippa Middleton, honorary research fellow o Jacek Mrukowicz, executive director p Dianne O’Connell, senior epidemiologist q Andrew D Oxman, director f Bob Phillips, associate fellow r Holger J Schünemann, associate professor g,s Tessa Tan-Torres Edejer, medical officer t David Tovey, Editor y Jane Thomas, Lecturer, UK Helena Varonen, associate editor u Gunn E Vist, researcher f John W Williams Jr, professor v Stephanie Zaza, project director w a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) Fédération Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Università di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) National Cancer Institute, Italy t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA x) University of London, UK Y) BMJ Clinical Evidence, UK

a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) Fédération Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Università di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) National Cancer Institute, Italy t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA x) University of London, UK Y) BMJ Clinical Evidence, UK

Grading System trade-off benefits and risks do it (or don’t do it) probably do it (or probably don’t do it) quality of underlying evidence high quality (well done RCT) moderate (quasi-RCT) low (well done observational) very low (anything else)

Moving quality down poor (RCT) design, implementation randomization, blinding, concealment, follow-up, intention to treat principle inconsistency indirect patients, interventions, outcomes A vs B, but have A to C, B to C reporting bias

Reporting bias high likelihood of reporting bias could lower quality reporting of outcomes reporting of studies – –publication bias number of small studies industry sponsored funnel plots may help

Moving quality up magnitude of effect Strong association: RR > 2 or RR < 0.5 Very strong association: RR > 5 or RR < 0.2 dose-response biases favor control

Judgements about the overall quality of evidence Most systems not explicit Options: – –strongest outcome – –primary outcome – –benefits – –weighted – –separate grades for benefits and harms – –no overall grade – –weakest outcome Based on lowest of all the critical outcomes Beyond the scope of a systematic review

Quality assessment criteria

Categories of quality High: Further research is very unlikely to change our confidence in the estimate of effect. Moderate: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low: Any estimate of effect is very uncertain.

Strength of recommendation “The extent to which one can be confident that adherence to a recommendation will do more good than harm.” quality of the evidence translation of the evidence into practice in a specific setting uncertainty about baseline risk trade-offs (the relative value attached to the expected benefits, harms and costs)

Judgements about the balance between benefits and harms Before considering cost and making a recommendation

Judgment: Benefits vs Risks/Costs quality of evidence seriousness of outcome magnitude of effect precision of treatment effect risk of target event risk of adverse events cost of therapy values

Value and preference statements underlying values and preferences always present sometimes crucial important to make explicit

Values and preferences Stroke guideline: patients with TIA clopidogrel over aspirin (weak, moderate quality) Underlying values and preferences: This recommendation to use clopidogrel over aspirin places a relatively high value on a small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug expenditures.

Values and preferences Peripheral vascular disease: aspirin be used instead of clopidogrel (weak, high quality). Underlying values and preferences: This recommendation places a relatively high value on avoiding large expenditures to achieve small reductions in vascular events.

Judgements about recommendations

Observational studies – high or moderate quality? Strong association Dose response relationship Plausible confounders would have reduced the effect

Strong recommendation when evidence is weak? Balance of benefits and downsides clearly on one side Not frequent if quality is low or very low

Comparison of GRADE and other systems Explicit definitions Explicit, sequential judgements Components of quality Overall quality Relative importance of outcomes Balance between health benefits and harms Balance between incremental health benefits and costs Consideration of equity Evidence profiles International collaboration Software Consistent judgements? Communication?

Who is interested in GRADE WHO American Endocrine Society American College of Chest Physicians (ACCP) Norwegian Centre for Health Services UpToDate Close relationship with Cochrane Collaboration American Society of Clinical Oncology (ASCO) American Thoracic Society (ATS) Urologists worldwide (EBUro) NICE

Conclusion Challenges in grading – –judgment always required Must consider study design, execution, consistency, directness, reporting bias – –magnitude, precision, only when extreme Balance of benefits and risks/cost – –magnitude of effects; precision of effects; values and preferences Separation between recommendation and quality of the evidence GRADE working group active in obtaining feedback and dissemination

Further GRADE developments Diagnostic tests Costs (Equity) Empirical evaluations Adoption of some elements by Cochrane software application

The clinical question Population: In patients with chronic atrial fibrillation and no prior history of stroke Intervention: does oral anticoagulation (comparison)compared with no therapy Outcome:reduce the risk for embolic stroke, hemorrhage and death?

The evidence   Systematic Review*   5 RCTs   2,313 Patients randomised   Warfarin in all studies Studien   1.5 years mean follow-up   Outcomes: Ischemic Stroke, hemorrhage (major, including intracranial), death (vascular and all cause) and dependency *Systematischer Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3.

Population: In patients with chronic atrial fibrillation and no prior history of stroke Intervention: does oral anticoagulation (comparison)compared with no therapy Outcome:reduce the risk for embolic stroke, hemorrhage and death? Different risk groups: Low, moderate, high Other outcomes: Inconvenience, quality of life

  All disabling or fatal stroke (isch. and hemorrh.)   Major hemorrhage (non IC)   All cause mortality   Minor bleeding (hematoma, prolonged bleeding of minor wounds) *Systematic Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3. Outcomes/endpoints

How important is the endpoint for decision making? Judgment about the relative importance for each endpoint on a scale from 9 (most important) to 1 (least important): 7 – 9: the endpoint is critical for decision making. 4 – 6: the endpoint is important but not critical. 1 – 3: the endpoint is not important. Outcomes/endpoints

  All disabling or fatal stroke (isch. and hemorrh.)   Major hemorrhage (non IC)   All cause mortality   Minor bleeding (hematoma, prolonged bleeding of minor wounds) *Systematic Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3. Outcomes/endpoints 9 7 9 5

Quality assessment criteria

Disabling or fatal stroke Study design:   5 RCTs Quality of evidence for this endpoint:   High

Disabling or fatal stroke Detailed design and execution Concealment Follow-up   In two studies (CAFA; SPINAF) both patients and outcome assessors were blinded; in the other studies only outcomes assessors. Quality of evidence for this endpoint now: High (or -1  Moderate)

Disabling or fatal stroke Consistency: No inconsistency Quality of evidence for this endpoint now:   High

Directness of evidence indirect treatment comparisons – –interested in A versus B – –have A versus C and B versus C alendronate vs risedronate – –both versus placebo, no head-to-head

Directness - patients patients meet trials’ eligibility criteria not included, but no reason to question – –slight age difference, comorbidity, race some question, bottom line applicable – –valvular atrial fibrillation serious question about biology – –heart failure trials applicability to aortic stenosis

Directness - interventions same drugs and doses – –captopril 100 mg. tid in heart failure similar drugs and doses – –captopril in lower doses same class and biology – –other ACEI in heart failure questionable class and biology – –ARB in heart failure – –sigmoidoscopy for colon ca prevention

Directness - outcomes same outcomes – –alendronate over 3 years on fracture similar but questionable – –alendronate over long-term serious question – –surrogate outcomes – –bone density; arrhythmia suppression; laboratory exerice capacity; cardiac function

Disabling or fatal stroke Directness of the evidence: Population, Intervention, Outcomes Direct Quality of evidence for this endpoint now:   High

Disabling or fatal stroke Imprecise or sparse data:   Would few additional events or larger studies likely alter the results?

Disabling or fatal stroke Imprecise or sparse data:   No imprecise or sparse data Quality of evidence for this endpoint now:   High

Disabling or fatal stroke   Reporting bias: Not present Quality of evidence for this endpoint now:   High

Disabling or fatal stroke   Strong association? present (RR = 0.46) Quality of evidence for this endpoint now:   High [or +1  High (from moderate)] strong, no plausible confounder, consistent and direkt evidence

Major extracranial hemorhage Study design: 4 RCTs → Quality: High Study details and execution: No serious limitations No inconsistency and direct Imprecise or sparse data?

Imprecise or sparse data There is not an empirical basis for defining imprecise or sparse data. Two possible definitions are: Data are sparse if the results include just a few events or observations and they are uninformative Data are imprecise if the confidence intervals are sufficiently wide that an estimate is consistent with either important harms or important benefits. These different definitions can result in different judgments.

Major extracranial hemorhage Study design: 4 RCTs → Quality: High Study details and execution: no serious limitations No inconsistency and direct Imprecise or sparse data? Imprecise data (wide confidence intervals/few events)

Quality across all endpoints

Risk groups Risk for cardio-embolic stroke: High (prior TIA or stroke*, > 75 yrs,  LVEF/CHF, HTN or DM): 10%/year – –Benefits greater downsides: do it, high Moderate risk (65 to 75 years) or one risk factor: 3 to 4%/year – –Benefits greater downsides: do it, high Low risk (< 65 years): 0.5%/year – –Benefits smaller than downsides: values: probably do not do it, high

Judgment: Benefits vs Risks/Costs quality of evidence seriousness of outcome magnitude of effect precision of treatment effect risk of target event risk of adverse events cost of therapy values

Value and preference statements underlying values and preferences always present sometimes crucial important to make explicit

Values and preferences Stroke guideline: patients with TIA clopidogrel over aspirin (weak, moderate quality) Underlying values and preferences: This recommendation to use clopidogrel over aspirin places a relatively high value on a small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug expenditures.

Values and preferences Peripheral vascular disease: aspirin be used instead of clopidogrel (weak, high quality). Underlying values and preferences: This recommendation places a relatively high value on avoiding large expenditures to achieve small reductions in vascular events.

Observational studies – high or moderate quality? Strong association Dose response relationship Plausible confounders would have reduced the effect

Strong recommendation when evidence is weak? Balance of benefits and downsides clearly on one side Not frequent if quality is low or very low

Conclusion challenges in grading – –judgment always required must consider study design, execution, consistency, directness, reporting bias – –magnitude, precision, only when extreme balance of benefits and risks/cost – –magnitude of effects; precision of effects; values and preferences

GRADE Evidence Profile

Clinical Evidence questions concerns about what happens when there are 4 high quality RCTs for a given comparison and outcome and one poor one - can the latter drag the GRADE rating down? variability between observer rating of the various parameters, and how this might be dealt with concerns that our harms data, which uses both RCT and observational studies, might mean that every recommendation (categorisation in CE speak) is down graded how surgical treatments are covered extra work involved over and above what we presently do whether there are potential savings in respect of people time e.g if evidence falls into the very low quality rating, can we reasonably not extract the data or report the findings? Technology: extensibiltiy and flexibility of the software, can it be configured to suit our needs / processes? IPR of the software whether open access to the software source codes are allowed

GRADEpro©

GRADE Profiler

GRADEpro© Visual studio.net Windows based (Mac version coming) Easy installation Help file Will be integrated with Revman (trial) Free availability Beta version

Development of GRADE profiles

8. In two studies (CAFA; SPINAF) patients and outcome assessors were blind to OAC administration, while in the remaining trials treatment was given open label with outcomes verified by those unaware of treatment assignment.

Process of developing recommendations Prioritise Problems, establish panel  Systematic Review  Evidence Profile  Relative importance of outcomes  Overall quality of evidence  Benefit – downside evaluation  Strength of recommendation  Implementation and evaluation of guidelines

Prioritise Problems, establish panel  Systematic Review  Evidence Profile  Relative importance of outcomes  Overall quality of evidence  Benefit – downside evaluation  Strength of recommendation  Implementation and evaluation of guidelines Process of developing recommendations

GRADEpro Reproducible Transparent – –Footnotes – –Judgments GRADE profiles – –Summary Integration with Revman Real time

Judgements about recommendations “We recommend”…”should” …“Do it” “We suggest”…”may” … “Probably do it” “We suggest not”… “may not” …“Probably don’t do it” “We recommend not”…”should not”… “Don’t do it” No recommendation This could include considerations of costs; i.e. “Is the net gain (benefits-downsides) worth the costs?”

Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,

Similar presentations

Presentation on theme: "Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,

Similar presentations

Presentation on theme: "Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,"— Presentation transcript:

Similar presentations

About project

Feedback