Grading Strength of Evidence Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide.

Slides:



Advertisements
Similar presentations
Katrina Abuabara, MD, MA1 Esther E Freeman MD, PhD2;
Advertisements

How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Synthesizing the evidence on the relationship between education, health and social capital Dan Sherman, PhD American Institutes for Research 25 February,
Study Objectives and Questions for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Comparator Selection in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Grading the Strength of a Body of Evidence on Diagnostic Tests Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for.
Introduction to the User’s Guide for Developing a Protocol for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research.
When To Select Observational Studies as Evidence for Comparative Effectiveness Reviews Prepared for: The Agency for Healthcare Research and Quality (AHRQ)
Reading the Dental Literature
Rattan Juneja MD¹; Michael E. Stuart, MD 2,3 ; Sheri A. Strite 3 Indiana University School of Medicine, Indianapolis, Indiana¹ University of Washington,
Critically Evaluating the Evidence: Tools for Appraisal Elizabeth A. Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant.
Summarising findings about the likely impacts of options Judgements about the quality of evidence Preparing summary of findings tables Plain language summaries.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Grading of Recommendations Assessment, Development and Evaluation (GRADE) Methodology.
Clinical Policy / Practice Guideline Development Andy Jagoda, MD, FACEP Professor of Emergency Medicine Mount Sinai School of Medicine New York, New York.
Critical Appraisal Dr Samira Alsenany Dr SA 2012 Dr Samira alsenany.
Chapter 7. Getting Closer: Grading the Literature and Evaluating the Strength of the Evidence.
By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.
Introduction to evidence based medicine
Critical Appraisal of an Article by Dr. I. Selvaraj B. SC. ,M. B. B. S
Analytic Frameworks Prepared for: Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide
Critical Appraisal of Clinical Practice Guidelines
Are the results valid? Was the validity of the included studies appraised?
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Brief summary of the GRADE framework Holger Schünemann, MD, PhD Chair and Professor, Department of Clinical Epidemiology & Biostatistics Professor of Medicine.
Systematic Reviews.
Research Techniques Made Simple: Evaluating the Strength of Clinical Recommendations in the Medical Literature: GRADE, SORT, and AGREE Mayra Buainain de.
Evidence-Based Public Health Nancy Allee, MLS, MPH University of Michigan November 6, 2004.
Systematic Review Module 7: Rating the Quality of Individual Studies Meera Viswanathan, PhD RTI-UNC EPC.
EBC course 10 April 2003 Critical Appraisal of the Clinical Literature: The Big Picture Cynthia R. Long, PhD Associate Professor Palmer Center for Chiropractic.
Plymouth Health Community NICE Guidance Implementation Group Workshop Two: Debriding agents and specialist wound care clinics. Pressure ulcer risk assessment.
Deciding how much confidence to place in a systematic review What do we mean by confidence in a systematic review and in an estimate of effect? How should.
Clinical Writing for Interventional Cardiologists.
Systematic Review Module 11: Grading Strength of Evidence Interactive Quiz Kathleen N. Lohr, PhD Distinguished Fellow RTI International.
VSM CHAPTER 6: HARM Evidence-Based Medicine How to Practice and Teach EMB.
Evidence-Based Medicine – Definitions and Applications 1 Component 2 / Unit 5 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
Uncertainty Management in Rule-based Expert Systems
Evidence Based Practice RCS /9/05. Definitions  Rosenthal and Donald (1996) defined evidence-based medicine as a process of turning clinical problems.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Moving the Evidence Review Process Forward Alex R. Kemper, MD, MPH, MS September 22, 2011.
WHO GUIDANCE FOR THE DEVELOPMENT OF EVIDENCE-BASED VACCINE RELATED RECOMMENDATIONS August 2011.
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator.
Sifting through the evidence Sarah Fradsham. Types of Evidence Primary Literature Observational studies Case Report Case Series Case Control Study Cohort.
Grading Strength of Evidence Interactive Quiz Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews.
Making childbirth safer: Promoting Evidence-based Care Name of presenter Prevention of Postpartum Hemorrhage Initiative (POPPHI) Project.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Anne Matthews, Health & Society, School of Nursing and Human Sciences, DCU The paradox of ‘low quality evidence; strong recommendation’: An analysis of.
Finding, Evaluating, and Presenting Evidence Sharon E. Lock, PhD, ARNP NUR 603 Spring, 2001.
Developing evidence-based guidelines at WHO. Evidence-based guidelines at WHO | January 17, |2 |
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Systematic Review and Meta-Analysis.
RTI International is a trade name of Research Triangle Institute Nancy Berkman, PhDMeera Viswanathan, PhD
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Clinical Practice Guidelines and Clinical Prediction Rules.
How Empty Are Empty Reviews? The first report on the Empty Reviews Project sponsored by the Cochrane Opportunities Fund and an invitation to participate.
GDG Meeting Wednesday November 9, :30 – 11:30 am.
GRADE Grading of Recommendations Assessment, Development and Evaluation British Association of Dermatologists April 2014.
Clinical Practice Guidelines: Can we fix Babel? Eddy Lang Department Chair, Emergency Alberta Health Services Associate Professor University of Calgary.
April Center for Open Fostering openness, integrity, and reproducibility of scientific research.
CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.
Why this talk? you will be seeing a lot of GRADE
8. Causality assessment:
Supplementary Table 1. PRISMA checklist
Overview of the GRADE approach – selected slides
Lecture 4: Meta-analysis
Chapter 7 The Hierarchy of Evidence
Critical Reading of Clinical Study Results
Grading Strength of Evidence
WHO Guideline development
Evidence Based Practice
Meta-analysis, systematic reviews and research syntheses
Systematic Reviews and Meta-Analysis -Part 2-
Presentation transcript:

Grading Strength of Evidence Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide

Systematic Review Process Overview

 To define what “grading strength of evidence (SOE)” is  To describe why grading SOE is important  To distinguish between grading SOE and rating the quality of individual articles  To list primary and additional domains for grading SOE  To describe options for scoring SOE domains  To describe how to score and present SOE grades Learning Objectives

 Is distinct from rating the quality of individual studies  Is generally used only to assess:  Major outcomes (benefits and harms)  Major comparisons, when relevant Grading Strength of Evidence

 To facilitate use of systematic reviews by diverse decisionmakers and stakeholders  To give decisionmakers:  A comprehensive evaluation of the evidence  A sense of how much confidence they can place in the evidence  To foster transparency and documentation Why Grade Strength of Evidence?

1.Scoring four required domains a.Risk of bias b.Consistency c.Directness d.Precision 2.Considering, and possibly scoring, four additional domains a.Dose-response association b.Plausible confounders c.Strength of association d.Publication bias 3.Combining scores from required domains into a single strength-of- evidence score, taking scores on additional domains into account as needed Three Steps to Grading Strength of Evidence

 Concerns both study design and study conduct for individual studies, rated by usual methods  Assesses the aggregate quality of studies within each major study design and integrates those assessments into an overall risk-of-bias score  Risk-of-bias scores:  High — lowers strength-of-evidence grade  Medium  Low — raises strength-of-evidence grade Four Required Domains: Risk of Bias

 Defined as the degree of similarity in the effect sizes of different studies within an evidence base  Consistent evidence bases:  Have the same direction of effect (same side of “no effect”)  Have a narrow range of effect sizes  Inconsistent evidence bases:  Have nonoverlapping confidence intervals  Have significant unexplained clinical or statistical heterogeneity Four Required Domains: Consistency

 Only three possible scores for consistency:  Consistent (i.e., no inconsistency)  Inconsistent  Unknown or not applicable (single study cannot be assessed)  Meta-analysis:  Use appropriate tests, such as Cochran’s Q test or I2 statistics Four Required Domains: Consistency Scores

 Defined as whether the evidence being assessed:  Reflects a single, direct link between the interventions of interest and the ultimate health outcome under consideration  Relies on multiple links in a causal chain  If multiple links are involved, strength of evidence can be only as strong as the weakest link  Using analytic frameworks* is important Four Required Domains: Directness *See the “Analytic Frameworks” module

 Intermediate or surrogate outcomes instead of health or patient-centered outcomes  Example: laboratory test results or radiographic findings versus patient- reported functional outcomes or death  Indirect comparisons rather than direct, head-to-head comparisons  Direct (e.g., A vs. B, A vs. C, and B vs. C):  Head-to-head studies in the evidence base  Generally assumes use of health outcomes, not surrogate/proxy outcomes  Better strength of evidence  Indirect (e.g., A vs. B, B vs. C, but not A vs. C):  No head-to-head studies that cover all interventions or outcomes of interest  Problematic situation for all types of comparisons  Strength-of-evidence grades not as strong as with direct evidence Four Required Domains: Aspects of Indirectness

 Applicability is evaluated separately from directness for the Evidence-based Practice Center (EPC) program.  For decisionmakers, the applicability of evidence depends on the different interests of diverse groups.  A PICOS framework (patient populations, interventions, comparators, outcomes, and settings) is used for applicability assessment in the EPC program.  Although the EPC program separates applicability from strength-of- evidence grading, other systems that work with one decisionmaker may incorporate applicability issues into their evaluations of directness. Related Issue of Applicability* *See the “Assessing Applicability” module

 Only two possible scores for directness:  Direct:  Evidence is based on a single link between the intervention and health outcomes  Indirect:  Evidence relies on:  Surrogate/proxy outcomes  More than one body of evidence  Both situations Four Required Domains: Directness Scores

 Defined as the degree of certainty for estimate of effect with respect to a specific outcome  Is a complicated concept that:  Asks the question:  What can decisionmakers conclude about whether one treatment is, clinically speaking, inferior, superior, or equivalent (neither inferior nor superior) to another?  Includes considerations of:  Statistical significance for effect estimates  Confidence intervals for those effect estimates Four Required Domains: Precision

 Are rated separately for each important outcome or comparison, including for any summary estimate of effect size  Only two scores are possible  Precise: estimate allows a clinically useful conclusion  Imprecise: confidence interval is so wide it could include clinically distinct (even conflicting) conclusions Four Required Domains: Precision Scores

 Four “discretionary” domains:  Dose-response association  Plausible confounders  Strength of association  Publication bias  Use when they are:  Applicable  Helpful in reaching conclusions about overall grades for strength of evidence Additional Domains

 Pattern of a larger effect with greater exposure (dose, duration, adherence) either across or within studies  Rate if studies give levels of exposure Additional Domains: Dose-Response Association

 Three scores are possible for dose-response:  Present: dose-response pattern observed  In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence.  Not present: no dose-response pattern observed (dose-response relationship not present)  Not applicable or not tested Additional Domains: Dose-Response Scores

 In an observational study, sometimes plausible confounding factors work in the direction opposite that of the observed effect.  Had such “effect-weakening” confounders not been present, the observed effect would have been even larger than the one observed.  In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence.  Consider whether or not plausible confounding exists that would decrease the observed effect. Additional Domains: Plausible Confounding

 Two scores are possible for plausible confounding:  Present: confounding factors that would decrease the observed effect may be present  Absent: confounding factors that would decrease the observed effect are not likely to be present Additional Domains: Plausible Confounding Scores

 Magnitude of effect:  Defined as the likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors  Consider when effect size is particularly large Additional Domains: Strength of Association

 Two scores are possible for strength of association:  Strong: large effect size that is unlikely to have occurred in the absence of a true effect of the intervention  In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence.  Weak: small enough effect size that it could have occurred solely as a result of bias from confounding factors Additional Domains: Strength of Association Scores

 Studies may have been published selectively.  Example: only a small proportion of relevant trials or other studies has been published.  Estimated effects of an intervention that are based on published studies do not reflect true effect.  Publication bias may undermine the overall robustness of a body of evidence. Additional Domains: Publication Bias

 Publication bias scores:  Need not be formally computed but can influence ratings of required domains  Should take these possible publication bias factors into account:  Rating for consistency  Calculating a summary confidence interval for an effect  Add comments on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are not otherwise available. Additional Domains: Publication Bias Scores

 Use two or more reviewers with the appropriate clinical and methodological expertise.  Assess separately:  Each required domain (or each optional domain, as relevant)  Each major outcome, including benefits and harms  Resolve differences by consensus or mediation by an additional expert; consensus scores should appear in tables.  Record and maintain records of each reviewer's individual judgments about domains as background documentation. Procedures for Assessing Domains

 Reflect a global assessment that:  Takes the required domains directly into account  Incorporates judgments about the additional domains as needed  Aim to:  Provide “actionable” information for a variety of different users, readers, and stakeholders  Be transparent in how the strength-of-evidence grades are reached Strength of Evidence Grades (I)

 For each comparison of interest, rate the strength of evidence for:  Each major benefit (e.g., positive effects on health outcomes such as physical function or quality of life, or effects on laboratory measures or other surrogate variables)  Each major harm (ranging from rare, serious, or life-threatening adverse events to common but bothersome effects)  For both benefits and harms:  Focus on the outcomes most relevant to patients, clinicians, and policymakers Strength of Evidence Grades (II)

 High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.  Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.  Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate.  Insufficient: Evidence either is unavailable or does not permit a conclusion. Strength of Evidence Grades and Definitions

 Using the high, moderate, or low strength-of- evidence grade:  Implies that a body of evidence actually exists  Is intended to convey how confident reviewers are about decisions that may be made based on evidence graded one way or another  Requires the use of only one designation, not a range (e.g., not “low to moderate”) Strength of Evidence Grades: Additional Points (I)

 The insufficient strength-of-evidence grade:  Is applied when:  Reviewers cannot draw conclusions about an outcome, comparison, or other question  Is appropriate when:  No evidence is available at all  Evidence is too insubstantial to permit conclusions to be drawn (e.g., opposing results from studies with a similar risk of bias; wide and overlapping confidence intervals) Strength of Evidence Grades: Additional Points (II)

 Use different approaches to incorporate multiple domains into an overall strength-of-evidence grade  GRADE algorithm  Weighting system of the Evidence-based Practice Center  Some qualitative approach  Use (at least) two reviewers  Assess resulting interrater reliability for each domain score, and keep records Scoring and Reporting: General Guidance

 Risk of bias (given design and conduct of available studies) is the essential component in determining the strength-of-evidence grade.  First, consider which study design is most appropriate to reduce bias for each question.  Next, consider the risk of bias from available studies. Guiding Principles: Risk of Bias

 Drug comparisons in randomized controlled trials (RCTs), with either placebo or an active comparator as an appropriate design:  Evidence from well-conducted RCTs will have less risk of bias than evidence based on observational studies.  For RCTs, reviewers can start with a rating of low for risk of bias and change the assessment if the RCTs have important flaws.  For observational data, reviewers can start with a rating of high for risk of bias and change the assessment, depending upon how well studies were conducted. Guiding Principles: Risk of Bias Example

 Be explicit about how the evidence grade will be determined.  A point system for combining ratings of the domains  A qualitative consideration of the domains  Carefully document procedures.  Keep records of procedures and results for each review so that they may contribute to the overall expertise of the Evidence-based Practice Center and the science of grading evidence. Further Guidance: Principles for Scoring

 Explain the rationale for the approach used and identify which domains were important in upgrading or downgrading the strength of evidence.  Explain judgments about the degree to which any additional domains altered the overall strength-of-evidence grade.  Provide enough detail within the report to ensure that users can grasp the methods. Further Guidance: Principles for Reporting (I)

 Use the terms high, moderate, low, or insufficient.  Do not use Roman numerals or other symbols.  Use or adapt the illustrative tabular approach to reporting (see the publications listed below for examples).  Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions. In: Methods Guide for Comparative Effectiveness Reviews. Rockville, MD: Agency for Healthcare Research and Quality, Posted August Available at: ahrq.gov/ ehc/products/60/318/2009_0805_grading.pdf.  Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions —Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol 2010;63: Further Guidance: Principles for Reporting (II)

Grading Strength of Evidence: Presentation of Results — Moderate and High Grades CI = confidence interval; RCT = randomized controlled trial Number of Studies (Subjects)Domains Pertaining to Strength of Evidence Magnitude of Effect and Strength of Evidence (SOE) Risk of Bias; Design/QualityConsistencyDirectnessPrecision Absolute Risk Difference per 100 Patients Severe DiarrheaModerate SOE 4 (256)RCT/FairConsistentDirectImprecise  4 (95% CI – 8 to +1) 14 (28,400)Cohort/FairConsistentDirectPrecise  5 (95% CI  8 to  2) Improved Quality of LifeHigh SOE 6 (265)RCTs/GoodConsistentDirectPrecise  5 (95% CI  1 to  7)

Grading Strength of Evidence: Presentation of Results — Insufficient and Low Number of Studies (Subjects)Domains Pertaining to Strength of Evidence Magnitude of Effect and Strength of Evidence (SOE) Risk of Bias; Design/Quality ConsistencyDirectnessPrecision Absolute Risk Difference per 100 Patients MortalityInsufficient SOE 1 (80)RCT/FairUnknownDirectImprecise  1 (95% CI  4 to +3) 14 (384) Retrospective cohort/Fair InconsistentDirectImprecise  7 to +5 (range) Myocardial InfarctionLow SOE 7 (625) Retrospective cohort/Low ConsistentDirectImprecise  3 (95% CI  5 to  1) CI = confidence interval; RCT = randomized controlled trial

 The grading system used by the Evidence-based Practice Centers (EPCs) is similar to the GRADE system.  The EPC grading system reflects the needs of AHRQ stakeholders for reviews on a wide variety of topics and not for recommendations or guidelines.  The main differences between the two grading systems:  The definitions of domains differ slightly; in the EPC system “directness” excludes “applicability,” which is handled separately.  In the EPC system, observational studies are considered to have less risk of bias for outcomes such as harms, which can raise the initial grade to “moderate.”  The definition of overall grade differs; the EPC system emphasizes confidence in estimate, whereas the GRADE system emphasizes effect of future research.  The EPC system permits three different ways to reach an overall strength-of - evidence grade; the GRADE formula has one. Comparison With the GRADE System

 Is a critical last step in analysis and presentation  Is done after the quality of articles is rated by at least two independent reviewers  Helps users of systematic reviews understand the body of evidence and how much confidence they can have in making decisions based on that evidence  Uses scores on four primary (mandatory) domains and four additional (discretionary) domains  Focuses on major outcomes and comparisons  Is denoted in terms of high, moderate, or low strength or insufficient evidence  Presents strength-of-evidence grades in tabular form Summary: Grading Strength of Evidence

 Atkins D, Best D, Briss PA, et al, for the GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490.  Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions. In: Agency for Healthcare Research and Quality. Methods Guide for Comparative Effectiveness Reviews [posted July 2009]. Rockville, MD. Available at: ahrq.gov/healthInfo.cfm?infotype=rr&ProcessID=60.  Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions — Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol 2010;63: References

 This presentation was prepared by Kathleen N. Lohr, Ph.D., a Distinguished Fellow at RTI International.  This module is based on an update of chapter 11 in version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (updated chapter available at: /318/2009_0805_ grading.pdf ). Author