GRADE – An introduction and workshop Holger Schünemann CADTH, Ottawa March 27 & 28, 2007
Grading evidence and recommendations Why grade? Grading the quality of evidence Grading the strength of recommendations Judgements about importance Comparison of GRADE & other systems
Professional good intentions and plausible theories are insufficient for selecting policies and practices for protecting, promoting and restoring health. Iain Chalmers
How can we judge the extent of our confidence that adherence to a recommendation will do more good than harm?
Why Grade Recommendations? strong recommendations high quality methods large precise effect few down sides of therapy weak recommendations low quality methods imprecise estimate small effect substantial down sides
Why grade recommendations? People draw conclusions about the quality of evidence strength of recommendations Systematic and explicit approaches can help protect against errors resolve disagreements facilitate critical appraisal communicate information However, there is wide variation in currently used approaches and grading can be misused or misunderstood
Which grading system? Evidence Recommendation II-2 B C+ 1 Strong Strongly recommended Organization USPSTF ACCP GCPS
Which grading system? Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease Evidence Recommendation B Class I C+ 1 IV C Organization AHA ACCP SIGN
Current profusion: can there be consensus? Grading Systems Current profusion: can there be consensus?
Grades of Recommendation Assessment, Development and Evaluation
About GRADE Working group since 2000 Researchers/guideline developers with interest in methodology Aim: to develop a common system for grading the quality of evidence and the strength of recommendations that is sensible and transparent and to explore the range of interventions and contexts for which it might be useful* Evaluation of existing systems and reliability* Adopted by ATS, ACCP, ACP, WHO, NICE, Cochrane, several other organizations. *Grade Working Group. CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005
GRADE Working Group David Atkins, chief medical officera Dana Best, assistant professorb Peter A Briss, chiefc Martin Eccles, professord Yngve Falck-Ytter, associate directore Signe Flottorp, researcherf Gordon H Guyatt, professorg Robin T Harbour, quality and information director h Margaret C Haugh, methodologisti David Henry, professorj Suzanne Hill, senior lecturerj Roman Jaeschke, clinical professork Gillian Leng, guidelines programme directorl Alessandro Liberati, professorm Nicola Magrini, directorn James Mason, professord Philippa Middleton, honorary research fellowo Jacek Mrukowicz, executive directorp Dianne O’Connell, senior epidemiologistq Andrew D Oxman, directorf Bob Phillips, associate fellowr Holger J Schünemann, associate professorg,s Tessa Tan-Torres Edejer, medical officer/scientistt Helena Varonen, associate editoru Gunn E Vist, researcherf John W Williams Jr, associate professorv Stephanie Zaza, project directorw a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) Fédération Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Università di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) National Cancer Institute, Italy t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA
a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) Fédération Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Università di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) National Cancer Institute, Italy t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA x) University of London, UK Y) BMJ Clinical Evidence, UK
Guideline development process Prioritise Problems, establish panel Systematic Review Evidence Profile Relative importance of outcomes Overall quality of evidence Benefit – downside evaluation Strength of recommendation Implementation and evaluation of guidelines GRADE
GRADE Quality of evidence The extent to which one can be confident that an estimate of effect or association is correct. Although the degree of confidence is a continuum, we suggest using four categories: High Moderate Low Very low
Judgements about the quality of evidence The quality of the evidence (i.e. our confidence) depends on: study design (e.g. RCT, case-control study) study quality/limitations (protection against bias; e.g. concealment of allocation, blinding, follow-up) consistency of results directness of the evidence including the populations (those of interest versus similar; for example, older, sicker or more co-morbidity) interventions (those of interest versus similar; for example, drugs within the same class) outcomes (important versus surrogate outcomes) comparison (A - C versus A - B & C - B)
Moving quality down poor (RCT) design, implementation randomization, blinding, concealment, follow-up, intention to treat principle, early stopping for benefit inconsistency Indirect evidence patients, interventions, outcomes A vs B, but have A to C, B to C sparse or imprecise data reporting bias
Moving quality up Observational studies – high or moderate quality? Strong association strong association: RR > 2 or RR < 0.5 very strong association: RR > 5 or RR < 0.2 Dose response relationship bleeding risk associated with increasing INR (blood thinning with warfarin) Plausible confounders would have reduced the effect
Categories of quality High: Further research is very unlikely to change our confidence in the estimate of effect. Moderate: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low: Any estimate of effect is very uncertain.
Judgements about the overall quality of evidence Most systems not explicit Options: Benefits Primary outcome Highest Lowest Based on lowest of all the critical outcomes Beyond the scope of a systematic review
Levels of evidence: SIGN
Grading of recommendations
Problems with other systems Oversimplified hierarchy based on study design Inadequate consideration of other factors Distinction between study design, quality of evidence and strength of recommendation blurred Systematic reviews included in the hierarchy rather than viewed as the basis for making judgements Expert opinion included in the hierarchy rather than explicitly considering the evidence underlying expert opinions Balance between desirable and undesirable effects Not reflected in the grade Not considered transparently Inadequate consideration of other factors that affect confidence in a recommendation Grading misused when recommendation not separated from the quality of the evidence
Example WHO Avian Influenza guidelines - key clinical questions: Population: H5N1 infected individuals Intervention: Neuraminidase Inhibitors, M2 Inhibitors, other pharmacological agents Comparison: no therapy/alternative management Outcomes: ?
Example WHO Avian Influenza guidelines - key clinical questions: Population: H5N1 infected individuals Intervention: Neuraminidase Inhibitors, M2 Inhibitors, other pharmacological agents Comparison: no therapy/alternative management Outcomes: Mortality?, Hospitalizations? Resource use?, Adverse outcomes?
Clinical Question Refinement Survey of panel members Outcome definition: List of potential outcomes circulated Feedback from panel Concealed rating of importance Consultation with Cochrane Consumers network A list of potential outcomes for each question and intervention that the panel considered was initially developed by two reviewers for each question. The team preparing the evidence summaries independently scored the relative importance of each considered outcome from 1-9, where 7-9 indicated the outcome was critical for a decision or recommendation, 4-6 indicated it was important, and 1-3 indicated it was not important. Because the relative importance of some outcomes depended on whether a drug was being used for treatment or chemoprophylaxis, this was done for two scenarios. The individual scores were discussed and disagreements were resolved by consensus. Outcomes were included in the approximate order of their relative importance in evidence tables and outcomes that were considered not important (a score of 3 or less) were not included. Panel members were also asked before the panel meeting to provide additional outcomes that should be addressed and to rate the relative importance of the outcomes. The panel repeated a concealed rating exercise during the meeting for questions that dealt with chemoprophylaxis because the panel reconsidered whether mortality is a critical outcome under these circumstances. The Cochrane Consumers network was consulted through their electronic discussion list for identification of additional important outcomes not included in the list of potential outcomes developed by the reviewers and panel. This involvement did not reveal additional important outcomes.
Outcomes/endpoints Judgment about the relative importance for each endpoint (scale from 9 to 1): 7 – 9: the endpoint is critical for decision making. 4 – 6: the endpoint is important but not critical. 1 – 3: the endpoint is not important. Only critical (and important) outcomes included Treatment: mortality, duration of hospitalization, incidence of lower respiratory tract complications, antiviral drug resistance and serious adverse events.
Evidence Summary and Quality Ratings Draft summaries sent to panel members for review and identification of gaps Restricted additional evidence at meeting Evidence profiles using GRADE methodology and GRADEpro software (v1.12)
Oseltamivir for treatment of H5N1 infection: Evidence Profiles Oseltamivir for treatment of H5N1 infection: - -
Evidence Summary Summary of findings No clinical trial of oseltamivir for treatment of H5N1 patients. 4 systematic reviews and health technology assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Healthy adults, high risk adults or children for treatment of seasonal influenza Duration of treatment up to 5 days Several countries in the northern and southern hemispheres (no resource poor countries) 3 published case series describing H5N1 patients treated with oseltamivir. Many in vitro and animal studies. < 400 cases; mortality > 50% worldwide
Strength of recommendation “The extent to which we can be confident that desirable effects of an intervention outweigh undesirable effects.” quality of the evidence translation of the evidence into practice in a specific setting uncertainty about baseline risk trade-offs (the relative value attached to the expected benefits, harms and costs)
Strength of recommendation Undesirable consequences harms more burden costs Desirable consequences health benefits less burden savings
Strength of recommendation Desirable consequences: reduction in morbidity and mortality improvement in quality of life reduction in the burden of treatment (such as having to take medication or the inconvenience of blood tests) reduced resource expenditures Undesirable consequences: adverse effects that have a deleterious impact on these broad outcome categories
Categories of recommendations Although the degree of confidence is a continuum, we suggest using two categories: strong and weak. Strong recommendation: the panel is confident that the desirable effects of adherence to a recommendation outweigh the undesirable effects. Weak recommendation: the panel concludes that the desirable effects of adherence to a recommendation probably outweigh the undesirable effects, but is not confident. Recommend Suggest ? ?
Judgements about recommendations
Judgements about the strength of a recommendation Reasons for not being confident can include: absence of high quality evidence imprecise estimates uncertainty or variation in how different individuals value the outcomes small net benefits uncertainty whether the net benefits are worth the costs (including the costs of implementing the recommendation)
Judgements about the strength of a recommendation No precise threshold for going from a strong to a weak recommendation The presence of important concerns about one or more of the above factors make a weak recommendation more likely. Panels should consider all of these factors and make the reasons for their judgements explicit. Recommendations should specify the perspective that is taken (e.g. individual patient, health system) and which outcomes were considered (including which, if any costs).
Implications of a strong recommendation Patients: Most people in your situation would want the recommended course of action and only a small proportion would not Clinicians: Most patients should receive the recommended course of action Policy makers: The recommendation can be adapted as a policy in most situations
Implications of a weak recommendation Patients: The majority of people in your situation would want the recommended course of action, but many would not Clinicians: Be prepared to help patients to make a decision that is consistent with their own values Policy makers: There is a need for substantial debate and involvement of stakeholders
Recommendations Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Recommendations Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).
Comparison of GRADE and other systems Explicit definitions Explicit, sequential judgements Components of quality defined Quality by outcome and overall quality Relative importance of outcomes Balance between health benefits and harms Balance between incremental health benefits and costs Evidence profiles International collaboration Consistent judgements? Communication?
Summary GRADE system harmonizes grading of recommendations and quality of (overall) evidence Two levels of strength (strong and weak) Four levels of quality (high, moderate, low and very low) Quality for each recommendation based on overall quality of evidence Transparency is key