Grading evidence and recommendations Holger Schünemann Gunn Vist Gordon Guyatt for the GRADE Working Group.

Slides:



Advertisements
Similar presentations
Introduction to the User’s Guide for Developing a Protocol for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research.
Advertisements

Overall and subgroup analysis If the OVERALL results show highly significant evidence of a worthwhile effect of treatment, but a few subgroups of the overview.
Grading evidence and recommendations 1 February 2005.
Holger Schünemann, MD, PhD From Evidence to EMS Practice: Building the National Model Washington, September 4,
Decision Analysis Prof. Carl Thompson
The Science of Guidelines The 7th ACCP Conference on Antithrombotic and Thrombolytic Therapy: Evidence-Based Guidelines Holger Schünemann, MD, PhD Italian.
Critically Evaluating the Evidence: Tools for Appraisal Elizabeth A. Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant.
Summarising findings about the likely impacts of options Judgements about the quality of evidence Preparing summary of findings tables Plain language summaries.
Grading of Recommendations Assessment, Development and Evaluation (GRADE) Methodology.
Michael Rawlins Chairman, National Institute for Health and Clinical Excellence, London Emeritus Professor, University of Newcastle upon Tyne Honorary.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2011.
By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.
Critical Appraisal of Clinical Practice Guidelines
Illustrating the GRADE Methodology: The Cather Associated-UTI Case Study TEACH Level II Workshop 5 NYAM August 9 th, 2013 Craig A Umscheid, MD, MSCE, FACP.
Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group
The Long Term Multi-Center Extension of Dabigatran Treatment in Patients with Atrial Fibrillation (RELY-ABLE) study To reviewers and moderators: These.
Holger Sch ü nemann Yngve Falck-Ytter NICE, London December 11, 2006 GRADE – An introduction and workshop.
Clinical Trials. What is a clinical trial? Clinical trials are research studies involving people Used to find better ways to prevent, detect, and treat.
Society of General International Medicine 32 nd Annual Meeting, May 14 th 2009 Elie A. Akl, MD, MPH, PhD David Atkins, MD, MPH Eric Bass, MD, MPH Yngve.
Holger Schünemann, MD, PhD Professor Utrecht, NL September ,
AASLD Practice Guidelines Committee Meeting, Chicago 1 May 2009 Yngve Falck-Ytter, M.D. Case Western Reserve University.
Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,
AGA Practice Guidelines Committee Meeting, Chicago May 31, 2009 Yngve Falck-Ytter, M.D. Assistant Professor of Medicine Case Western Reserve University.
Grading evidence and recommendations The GRADE approach Holger Schünemann, MD, PhD for the GRADE Working Group.
Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & Biostatistics Professor of Clinical Epidemiology, Biostatistics and Medicine McMaster.
Grading evidence and recommendations The GRADE approach
Brief summary of the GRADE framework Holger Schünemann, MD, PhD Chair and Professor, Department of Clinical Epidemiology & Biostatistics Professor of Medicine.
Harmonizing levels of evidence: The Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group Holger Schünemann, Andy Oxman,
Grading evidence and recommendations The GRADE initiative Holger Schünemann, MD, PhD Associate Professor Italian National Cancer Institute Regina Elena,
Grading evidence and recommendations The GRADE approach Holger Schünemann, MD, PhD for the GRADE Working Group.
Placebo-Controls in Short-Term Clinical Trials of Hypertension Sana Al-Khatib, MD, MHS Assistant Professor of Medicine Division of Cardiology Duke University.
AGA Clinical Practice & Quality Management Committee Teleconference 17 Oct 2008 Yngve Falck-Ytter, M.D. Assistant Professor of Medicine Case Western Reserve.
Plan GRADE backgroundGRADE background confidence in estimates (quality of evidence)confidence in estimates (quality of evidence) evidence profilesevidence.
Critical Appraisal Did the study address a clearly focused question? Did the study address a clearly focused question? Was the assignment of patients.
Clinical Writing for Interventional Cardiologists.
Vanderbilt Sports Medicine Chapter 5: Therapy, Part 2 Thomas F. Byars Evidence-Based Medicine How to Practice and Teach EBM.
Why Grade the Evidence? target audience for Cochrane reviewstarget audience for Cochrane reviews –clinicians interested in the question –policy makers,
Two questions in grading recommendations Are you sure?Are you sure? –Yes: Grade 1 –No: Grade 2 What is the methodological quality of the underlying evidenceWhat.
Grading evidence and recommendations Holger Schünemann Andy Oxman for the GRADE Working Group.
Stakeholder Summit on Using Quality Systematic Reviews to Inform Evidence-based Guidelines US Cochrane Center June 4 and 5, 2009 Yngve Falck-Ytter, M.D.
Why Grade Recommendations? strong recommendationsstrong recommendations –strong methods –large precise effect –few down sides of therapy weak recommendationsweak.
WHO GUIDANCE FOR THE DEVELOPMENT OF EVIDENCE-BASED VACCINE RELATED RECOMMENDATIONS August 2011.
Anne Matthews, Health & Society, School of Nursing and Human Sciences, DCU The paradox of ‘low quality evidence; strong recommendation’: An analysis of.
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator.
Why Grade Recommendations? strong recommendationsstrong recommendations –strong methods –large precise effect –few down sides of therapy weak recommendationsweak.
Developing evidence-based guidelines at WHO. Evidence-based guidelines at WHO | January 17, |2 |
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
Grading evidence and recommendations Workshop W-069 Congress Hall ABEF Oct
Holger Schünemann Yngve Falck-Ytter AHRQ Annual Conference Washington, September 8,
How Empty Are Empty Reviews? The first report on the Empty Reviews Project sponsored by the Cochrane Opportunities Fund and an invitation to participate.
GDG Meeting Wednesday November 9, :30 – 11:30 am.
Why Grade Recommendations? strong recommendationsstrong recommendations –strong methods –large precise effect –few down sides of therapy weak recommendationsweak.
EBM --- Journal Reading Presenter :林禹君 Date : 2005/10/26.
Considerations in grading a recommendation methodological quality of evidencemethodological quality of evidence likelihood of biaslikelihood of bias trade-off.
GRADE Grading of Recommendations Assessment, Development and Evaluation British Association of Dermatologists April 2014.
Providing evidence for healthcare decision Silvia Pregno Rome, October 15, 2012.
Clinical Practice Guidelines: Can we fix Babel? Eddy Lang Department Chair, Emergency Alberta Health Services Associate Professor University of Calgary.
Evidence-Based Mental Health PSYC 377. Structure of the Presentation 1. Describe EBP issues 2. Categorize EBP issues 3. Assess the quality of ‘evidence’
From evidence to Policy: Paediatric guideline development in Kenya Mercy Mulaku.
Systematic review of the potential adverse effects of caffeine consumption in healthy adults, pregnant women, adolescents, and children: Cardiovascular.
for Overall Prognosis Workshop Cochrane Colloquium, Seoul
Why this talk? you will be seeing a lot of GRADE
Conflicts of interest Major role in development of GRADE
Overview of the GRADE approach – selected slides
Chapter 7 The Hierarchy of Evidence
WHO Guideline development
Plan GRADE background two steps evidence profiles
GRADE – An introduction and workshop
Component 1: Introduction to Health Care and Public Health in the U.S.
Presenter Disclosure Information
Presentation transcript:

Grading evidence and recommendations Holger Schünemann Gunn Vist Gordon Guyatt for the GRADE Working Group

  Introduction to GRADE   Example of applying GRADE   Demonstration of the GRADE profiler sofware Today’s talk

Intro Exercise 0

Where would you prefer to live?

← Option 1 Option 2 →

← Option 1 (pink card) Option 2 → (green card)

Introduction to GRADE 1

When to make a recommendation?   never   patient values differ   guidelines should just lay out benefits and risks   when evidence strong enough   if very weak, too uncertain   clinicians need guidance intense study demands decision

What type of recommendations? strong recommendations – –high quality methods – –large precise effect – –few down sides of therapy weak recommendations – –low quality methods – –imprecise estimate – –small overall effect – –substantial down sides – –people react differently to same outcomes/circumstances

Should we grade recommendations? People draw conclusions about the – –quality of evidence – –strength of recommendations Systematic and explicit approaches can help – –protect against errors – –resolve disagreements – –facilitate critical appraisal – –communicate information

Why grade recommendations? Change practitioner behavior Strong: apply uniformly – –just do it Weak: think about it – –examine evidence yourself – –consider patient circumstances very carefully – –explore with the patient However, there is wide variation in currently used approaches

Which grading system? Evidence Recommendation II-2B C+ 1 StrongStrongly recommended Organization   USPSTF   ACCP   GCPS

Still not confused? EvidenceRecommendation BClass I C+ 1 IVC Organization   AHA   ACCP   SIGN Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease

Grading System Current profusion: can there be consensus?

GRADE G rades of R ecommendation A ssessment, D evelopment and E valuation

What do you know about GRADE? Have prepared a guideline Have prepared a guideline Read the BMJ paper Read the BMJ paper Have prepared a systematic review and a summary of findings table Have prepared a systematic review and a summary of findings table Have attended a GRADE meeting, workshop or talk Have attended a GRADE meeting, workshop or talk

About GRADE o Began as informal working group in 2000 o Researchers/guideline developers with interest in methodology o Aim: to develop a common system for grading the quality of evidence and the strength of recommendations that is sensible and to explore the range of interventions and contexts for which it might be useful* o 13 meetings (~10 – 35 attendants) o Evaluation of existing systems and reliability* o Workshops at various places including Cochrane Colloquia, WHO and GIN since 2000 *Grade Working Group. CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005

GRADE Working Group David Atkins, chief medical officer a Dana Best, assistant professor b Peter A Briss, chief c Martin Eccles, professor d Yngve Falck-Ytter, associate director e Signe Flottorp, researcher f Gordon H Guyatt, professor g Robin T Harbour, quality and information director h Margaret C Haugh, methodologist i David Henry, professor j Suzanne Hill, senior lecturer j Roman Jaeschke, clinical professor k Gillian Leng, guidelines programme director l Alessandro Liberati, professor m Nicola Magrini, director n James Mason, professor d Philippa Middleton, honorary research fellow o Jacek Mrukowicz, executive director p Dianne O ’ Connell, senior epidemiologist q Andrew D Oxman, director f Bob Phillips, associate fellow r Holger J Sch ü nemann, associate professor g,s Tessa Tan-Torres Edejer, medical officer/scientist t Helena Varonen, associate editor u Gunn E Vist, researcher f John W Williams Jr, associate professor v Stephanie Zaza, project director w a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) F é d é ration Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Universit à di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) National Cancer Institute, Italy t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA

How can we judge the extent of our confidence that adherence to a recommendation will do more good than harm?

Grading System Strength of the recommendation do it (or don’t do it)/recommend probably do it (or probably don’t do it)/suggest Quality of underlying evidence high quality (well done RCT) moderate (quasi-RCT) low (well done observational) very low (anything else)

Moving quality down poor (RCT) design, implementation → →randomization, blinding, concealment, follow-up, intention to treat principle, early stopping for benefit inconsistency Indirect evidence → →patients, interventions, outcomes → →A vs B, but have A to C, B to C sparse or imprecise data reporting bias

Reporting bias high likelihood of reporting bias can lower quality reporting of outcomes reporting of studies publication bias

Moving quality up Observational studies – high or moderate quality? Strong association → →strong association: RR > 2 or RR < 0.5 → →very strong association: RR > 5 or RR < 0.2 Dose response relationship – –bleeding risk associated with increasing INR (blood thinning with warfarin) Plausible confounders would have reduced the effect For example, plausible explanatory factors that were not adjusted for in studies comparing mortality rates of for-profit and not-for-profit hospitals would have reduced the observed effect. Thus, the evidence showing that for-profit hospitals have a higher risk of mortality is more convincing

Quality assessment criteria

Categories of quality High: Further research is very unlikely to change our confidence in the estimate of effect. Moderate: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low: Any estimate of effect is very uncertain. “The extent to which one can be confident that an estimate of effect or association is correct.”

Strength of recommendation “The extent to which one can be confident that adherence to a recommendation will do more good than harm.” quality of the evidence translation of the evidence into practice in a specific setting uncertainty about baseline risk trade-offs (the relative value attached to the expected benefits, harms and costs)

More practice using the voting instrument….

← Option 1 (pink card) Option 2 → (green card) Remember

You are hiking. Which of the following animals would you prefer to encounter?

← Option 1 (pink card) Option 2 → (green card)

You are buying an ice cream. Which flavor do you prefer?

← Option 1 (pink card) Option 2 → (green card) Chocolate Strawberry

You are buying a new car. Which one would you buy?

← Option 1 (pink card) Option 2 → (green card) Yellow fox Red Ferrari

What determines your choices?

Values and preferences underlying values and preferences always present sometimes crucial important to make explicit

Judgements about the balance between benefits and harms Before considering cost and making a recommendation

Judgment: Benefits vs Risks/Costs quality of evidence seriousness of outcome magnitude of effect precision of treatment effect risk of target event risk of adverse events cost of therapy values

Judgements about recommendations

Strong recommendation when evidence is weak? Balance of benefits and downsides clearly on one side Not frequent if quality is low or very low

Comparison of GRADE and other systems Explicit definitions Explicit, sequential judgements Components of quality Overall quality Relative importance of outcomes Balance between health benefits and harms Balance between incremental health benefits and costs Consideration of equity Evidence profiles International collaboration Software Consistent judgements? Communication?

Who is interested in GRADE American College of Chest Physicians (ACCP) WHO American Endocrine Society UpToDate Clinical Evidence American Society of Clinical Oncology (ASCO) American Thoracic Society (ATS) Urologists worldwide (EBUro) NICE

Conclusion Challenges in grading – –judgment always required Must consider study design, execution, consistency, directness, reporting bias – –magnitude, precision Balance of benefits and risks/cost – –magnitude of effects; precision of effects; values and preferences Separation of recommendation from quality of the evidence GRADE working group active in obtaining feedback and dissemination

Making judgments using GRADE 2

The clinical question Population: In patients with chronic atrial fibrillation and no prior history of stroke Intervention: does oral anticoagulation (comparison)compared with no therapy Outcome:reduce the risk for embolic stroke, hemorrhage and death?

The evidence   Systematic Review*   5 RCTs   2,313 Patients randomised   Warfarin in all studies Studien   1.5 years mean follow-up   Outcomes: Ischemic Stroke, hemorrhage (major, including intracranial), death (vascular and all cause) and dependency *Systematic Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3.

  All disabling or fatal stroke (isch. and hemorrh.)   Major hemorrhage (non IC)   All cause mortality   Minor bleeding (hematoma, prolonged bleeding of minor wounds) *Systematic Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3. Outcomes/endpoints

How important is the endpoint for decision making? Judgment about the relative importance for each endpoint on a scale from 9 (most important) to 1 (least important): 7 – 9: the endpoint is critical for decision making. 4 – 6: the endpoint is important but not critical. 1 – 3: the endpoint is not important. Outcomes/endpoints

  All disabling or fatal stroke (isch. and hemorrh.)   Major hemorrhage (non IC)   All cause mortality   Minor bleeding (hematoma, prolonged bleeding of minor wounds) *Systematic Review: Aguilar & Hart. Cochrane Database of Systematic Reviews 2005, Issue 3. Outcomes/endpoints

Quality assessment criteria

Disabling or fatal stroke Study design:   5 RCTs Quality of evidence for this endpoint:   High

Disabling or fatal stroke Detailed design and execution Concealment Follow-up   In two studies (CAFA; SPINAF) both patients and outcome assessors were blinded; in the other studies only outcomes assessors. Quality of evidence for this endpoint now: High (or -1  Moderate)

Disabling or fatal stroke Consistency:

Disabling or fatal stroke Consistency: No inconsistency Quality of evidence for this endpoint now:   High

Directness of evidence indirect treatment comparisons – –interested in A versus B – –have A versus C and B versus C alendronate vs risedronate (biphosponates) – –both versus placebo, no head-to-head

Directness - patients patients meet trials’ eligibility criteria not included, but no reason to question – –slight age difference, comorbidity, race some question, bottom line applicable – –valvular atrial fibrillation serious question about biology – –heart failure trials applicability to aortic stenosis

Directness - interventions same drugs and doses – –captopril 100 mg. tid in heart failure similar drugs and doses – –captopril in lower doses same class and biology – –other ACEI in heart failure questionable class and biology – –ARB in heart failure

Directness - outcomes same outcomes – –alendronate over 3 years on fracture similar but questionable – –alendronate over long-term serious question – –surrogate outcomes – –bone density; arrhythmia suppression; cholesterol levels (clofibrate)

Disabling or fatal stroke Directness of the evidence: Population, Intervention, Outcomes Direct Quality of evidence for this endpoint now:   High

Disabling or fatal stroke Imprecise or sparse data:   Would few additional events or larger studies likely alter the results?

Disabling or fatal stroke Imprecise or sparse data:   No imprecise or sparse data Quality of evidence for this endpoint now:   High

Disabling or fatal stroke   Reporting bias: Not present Quality of evidence for this endpoint now:   High

Disabling or fatal stroke   Strong association? present (RR = 0.46) Quality of evidence for this endpoint now:   High [or +1  High (from moderate)] strong, no plausible confounder, consistent and direkt evidence

Endpoint: Major extracranial hemorhage Study design: 4 RCTs → Quality: High Study details and execution: No serious limitations No inconsistency and direct Imprecise or sparse data?

Imprecise or sparse data There is not an empirical basis for defining imprecise or sparse data. Two possible definitions are: Data are sparse if the results include just a few events or observations and they are uninformative Data are imprecise if the confidence intervals are sufficiently wide that an estimate is consistent with either important harms or important benefits. These different definitions can result in different judgments.

Major extracranial hemorhage Study design: 4 RCTs → Quality: High Study details and execution: no serious limitations No inconsistency and direct Imprecise or sparse data? Imprecise data (wide confidence intervals benefit and harm uncertain)

Judgements about the overall quality of evidence Most systems not explicit Options: – –strongest outcome – –primary outcome – –benefits – –weighted – –separate grades for benefits and harms – –no overall grade – –weakest outcome Based on lowest of all the critical outcomes Beyond the scope of a systematic review

Quality across all endpoints

Risk groups Risk for cardio-embolic stroke: High (prior TIA or stroke*, > 75 yrs,  LVEF/CHF, HTN or DM): 10%/year Moderate risk (65 to 75 years) or one risk factor: 3 to 4%/year Low risk (< 65 years): 0.5%/year

Risk groups Risk for cardio-embolic stroke: High (prior TIA or stroke*, > 75 yrs,  LVEF/CHF, HTN or DM): 10%/year – –Benefits greater downsides: do it, high Moderate risk (65 to 75 years) or one risk factor: 3 to 4%/year – –Benefits greater downsides: do it, high Low risk (< 65 years): 0.5%/year – –Benefits smaller than downsides: values: probably do not do it, high

Value and preference statements underlying values and preferences always present sometimes crucial important to make explicit

Observational studies – high or moderate quality? Strong association Dose response relationship – –bleeding risk associated with increasing INR (blood thinning with warfarin) Plausible confounders would have reduced the effect For example, plausible explanatory factors that were not adjusted for in studies comparing mortality rates of for-profit and not-for-profit hospitals would have reduced the observed effect. Thus, the evidence that for-profit hospitals have a higher risk of mortality is more convincing

GRADEpro 3

Guideline development process Prioritise Problems, establish panel  Systematic Review  Evidence Profile  Relative importance of outcomes  Overall quality of evidence  Benefit – downside evaluation  Strength of recommendation  Implementation and evaluation of guidelines GRADE

Guideline development process Prioritise Problems, establish panel  Systematic Review  Evidence Profile  Relative importance of outcomes  Overall quality of evidence  Benefit – downside evaluation  Strength of recommendation  Implementation and evaluation of guidelines GRADE Summary of Findings

GRADE Profiler

GRADEpro© Visual studio.net Windows based (Mac version coming) Simple installation Help file Will be integrated with Revman (trial) Free availability Beta version

Development of GRADE profiles

8. In two studies (CAFA; SPINAF) patients and outcome assessors were blind to OAC administration, while in the remaining trials treatment was given open label with outcomes verified by those unaware of treatment assignment.

GRADEpro Reproducible Transparent – –Footnotes – –Judgments GRADE profiles – –Summary Integration with Revman Real time

Judgements about recommendations “We recommend”…”should” …“Do it” “We suggest”…”may” … “Probably do it” “We suggest not”… “may not” …“Probably don’t do it” “We recommend not”…”should not”… “Don’t do it” No recommendation This could include considerations of costs; i.e. “Is the net gain (benefits-downsides) worth the costs?”

Questions?

What are we concerned about? methodological quality of evidence – –likelihood of bias – –high, moderate, low and very low quality strength of recommendations – –must do to might do – –strong recommendation or weak

Strong recommendation when evidence is weak? recommendations against – –uncertainty of benefit – –confidence in down sides whole body CT or MRI screening – –maybe benefit, maybe not – –true positives some harm – –false positive some harm

Strong recommendation when evidence is weak? known benefit, strong recommendation for one of two alternatives – –CABG for left main stem disease – –but venous or mammary artery graft benefit: weak evidence suggests appreciable benefit for mammary artery harm: strong evidence that little difference

Population: In patients with chronic atrial fibrillation and no prior history of stroke Intervention: does oral anticoagulation (comparison)compared with no therapy Outcome:reduce the risk for embolic stroke, hemorrhage and death? Different risk groups: Low, moderate, high Other outcomes: Inconvenience, quality of life

Strong recommendation when evidence is weak? known benefit, strong recommendation for one of two alternatives benefit: strong evidence of equivalence harm: weak evidence that harm differs appreciably