Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov.

Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov

Systematic Review Process Overview

 To describe why data extraction is important  To identify challenges in data extraction  To describe the general layout of a data extraction form  To suggest methods for collecting data accurately and efficiently  To discuss the pros and cons for querying original authors Learning Objectives

 To summarize studies in a common format to facilitate synthesis and coherent presentation of data  To identify numerical data for meta-analyses  To obtain information to assess more objectively the risk of bias in and applicability of studies  To identify systematically missing or incorrectly assessed data, outcomes that are never studied, and underrepresented populations Why Is Data Extraction Important?

 Extracted data should:  Accurately reflect information reported in the publication  Remain in a form close to the original reporting, so that disputes can be easily resolved  Provide sufficient information to understand the studies and to perform analyses  Extract only the data needed, because the extraction process:  Is labor intensive  Can be costly and error prone  Different research questions may have different data needs On Data Extraction (I)

 Data extraction involves more than copying words and numbers from the publication to a form.  Clinical domain, methodological, and statistical knowledge is needed to ensure the right information is captured.  Interpretation of published data is often needed.  What is reported is sometimes not what was carried out.  Data extraction and evaluation of risk of bias and of applicability typically occur at the same time. On Data Extraction (II)

“It is an eye-opening experience to attempt to extract information from a paper that you have read carefully and thoroughly understood only to be confronted with ambiguities, obscurities, and gaps in the data that only an attempt to quantify the results reveals.” — Gurevitch and Hedges (1993) Gurevitch J, Hedges LV. In: Design and analysis of ecological experiments; 1993. Data Extraction: A Boring Task?

 In the Evidence-based Practice Center Program, we often refer to two types of tables:  Evidence Tables  Essentially are data extraction forms  Typically are study specific, with data from each study extracted into a set of such tables  Are detailed and typically not included in main reports  Summary Tables  Are used in main reports facilitate the presentation of the synthesis of the studies  Typically contain context-relevant pieces of the information included in study-specific evidence tables  Address particular research questions Comparative Effectiveness Reviews: Clarifying Research Terminology

 Use key questions and eligibility criteria as a guide  Anticipate what data summary tables should include:  To describe studies  To assess outcomes, risk of bias, and applicability  To conduct meta-analyses  Use the PICOTS framework to choose data elements:  Population  Intervention (or exposure)  Comparator (when applicable)  Outcome (remember numerical data)  Timing  Study design (study setting) What Data To Collect?

 Population-generic elements may include patient characteristics, such as age, gender distribution, and disease stage.  More specific items may be needed, depending upon the topic.  Intervention or exposure and comparator items depend upon the extracted study.  Study types include randomized trial, observational study, diagnostic test study, prognostic factor study, family-based or population-based genetic study, et cetera. Data Elements: Population, Intervention, and Comparator

 Outcomes should be determined a priori with the Technical Expert Panel.  Criteria often are unclear about which outcomes to include and which to discard.  Example: mean change in ejection fraction versus the proportion of subjects with an increase in ejection fraction by ≥5 percent  Record different definitions of “outcome” and consult with content experts before making a decision about which definition to use. Data Elements: Outcome (I)

 Apart from outcome definitions, quantitative data are needed for meta-analysis:  Dichotomous variables (e.g., deaths, patients with at least one stroke)  Count data (e.g., number of strokes, counting multiple ones)  Continuous variables (e.g., mm Hg, pain score)  Survival data  Sensitivity, specificity, receiver operating characteristic  Correlations  Slopes Data Elements: Outcome (II)

 The data elements to be extracted vary by type of study.  Consider collecting this information when recording study characteristics for randomized trials:  Number of centers (multicenter studies)  Method of randomization (adequacy of allocation concealment)  Blinding  Funding source  Whether or not an intention-to-treat analysis was used Data Elements: Timing and Study Design

 Provide “operational definitions” (instructions) indicating exactly what should be extracted in each field of the form.  Make sure that all data extractors understand the operational definitions the same way.  Pilot-test the forms on several published papers.  Encourage communication to clarify even apparently mundane questions. Always Provide Instructions

 Independent extraction of data by at least two experienced reviewers is ideal but is also resource intensive.  There is a tradeoff between cost and the quality of data extraction.  Data extraction often takes longer than 2 hours per paper.  A reduction in the scope of the work may be necessary if independent data extraction is desired.  Careful single extraction by experienced reviewers, with or without crosschecking of selected items by a second reviewer, is a good compromise. Single Versus Double Extraction

 To address all needs, a generic data extraction form will have to be very comprehensive.  Although there are common generic elements, forms need to be adapted to each topic or study design to be most efficient.  Organization of information in the PICOTS (population, intervention, comparator, outcome, timing, and setting) format is highly desirable.  Balance the structure of the form with the flexibility of its use.  Anticipate the need to capture unanticipated data.  Use an iterative process and have several individuals test the form on multiple studies. Developing Data Extraction Forms (Evidence Tables)

 Forms have to be constructed before any serious data extraction is underway.  Original fields may turn out to be inefficient or unusable when coding begins.  Reviewers must:  be as thorough as possible in the initial set-up,  reconfigure the tables as needed, and  use a dual review process to fill in gaps. Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (I)

First DraftSecond Draft Evidence Tables: Example (I)

Final Draft Evidence Tables: Example (II)

 Lack of uniformity among outside reviewers:  No matter how clear and detailed are the instructions, data will not be entered identically by one reviewer to the next.  Solutions:  Develop an evidence table guidance document—instructions on how to input data.  Limit the number of core members handling the evidence tables to avoid discrepancies in presentation. Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (II)

 In the “country, setting” field, data extractors should list possible settings that could be encountered in the literature:  Academic medical center(s), community, database, tertiary care hospital(s), specialty care treatment center(s), substance abuse center(s), level I trauma center(s), et cetera.  In the “study design” field, data extractors should list one of the following:  Randomized controlled trial, cross-sectional study, longitudinal study, case-control study, et cetera. Sample Fields From a Table Guidance Document: Vanderbilt University Evidence-based Practice Center

Reviewer AReviewer B Example: Two Reviewers Extract Different Data

Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf.  For evidence reports or technology assessments that have many key questions, data extraction forms may be several pages long.  The next few slides are examples of data extraction forms.  Remember, there is more than one way to structure a data extraction form. Samples of Final Data Extraction Forms (Evidence Tables)

Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf. Examples: Differential Data Extraction by Two Reviewers

Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf. Characteristics of the Index Test and Reference Standard

Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf. Results (Concordance/Accuracy)

Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf. Results (Nonquantitative)

 Pencil and paper  Word processing software (e.g., Microsoft Word)  Spreadsheet (e.g., Microsoft Excel)  Database software (e.g., Microsoft Access, Epi Info™)  Dedicated off-the-shelf commercial software  Homegrown software Tools Available for Data Extraction and Collection

Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997;350:185-6.  Who should extract the data?  Domain experts versus methodologists  What extraction method should be used?  Single or double independent extraction followed by reconciliation versus single extraction and independent verification  Should data extraction be blinded (to authors, journal, results)? Extracting the Data

 Problems in data reporting  Inconsistencies in published papers  Data reported in graphs Challenges in Data Extraction

“Data for the 40 patients who were given all 4 doses of medications were considered evaluable for efficacy and safety. The overall study population consisted of 10 (44%) men and 24 (56%) women, with a racial composition of 38 (88%) whites and 5 (12%) blacks.” Examples of Data Reporting Problems (I)

Examples of Data Reporting Problems (II)

Examples of Data Reporting Problems (III)

 Let us extract the number of deaths in two study arms, at 5 years of followup... Inconsistencies in Published Papers

PCI(205)MED(203) Dead2425 Overall Mortality […] 24 deaths occurred in the PCI group, […] and 25 in the MT group […] MED and MT = medical treatment; PCI = percutaneous coronary intervention Results Text

PCI(205)MT(203) Dead2425 28 35 MT = medical treatment PCI = percutaneous coronary intervention Overall Mortality Figure

PCI (205) MT (203) Dead242832253533 CABG = coronary artery bypass graft MT = medical treatment PCI = percutaneous coronary intervention Clinical Events Table

Green BF, Hall JA. Annu Rev Psychol 1984;35:37-53.  Because so few research reports give effect size, standard normal deviates, or exact p-values, the quantitative reviewer must calculate almost all indices of study outcomes.  Little of this calculation is automatic, because results are presented in a bewildering variety of forms and are often obscure. Why Do Such Problems Exist?

Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.  Engauge Digitizer, an open-source software:  Each data point is marked with an “X,” and the coordinates are given in a spreadsheet. Using Digitizing Software

 Missing information in published papers  Publications with at least partially overlapping patient subgroups  Potentially fraudulent data Additional Common Issues

 Data extraction is laborious and tedious.  To err is human: data extractors will identify errors and will err themselves.  Interpretation and subjectivity are unavoidable.  Data are often not reported in a uniform manner (e.g., quality, location in paper, metrics, outcomes, numerical value vs. graphs). Conclusions

 Key questions will guide reviewers in choosing which information to extract.  There is no single correct way to record extracted data.  Extracting data requires familiarity with the content and knowledge of epidemiological principles and statistical concepts.  Be persistent:  Often, one can extract more information than the paper initially appears to contain (e.g., by digitizing graphs).  Be comprehensive:  Try to verify the same piece of information from different places in the same article.  Sometimes there are surprising inconsistencies.  Inconsistencies indicate suboptimal reporting quality at least. Key Messages

Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6. Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53. Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments. New York: Chapman & Hall; 1993. p. 347- 70. Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge %20Digitizer/. References (I)

 Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome. Technology Assessment (Prepared by Tufts–New England Medical Center Evidence- based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/do wnloads/ id48TA.pdf. References (II)

 This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts–New England Medical Center Evidence-based Practice Center, and Melissa L. McPheeters, Ph.D., M.P.H., and Jeff Seroogy, B.S., members of the Vanderbilt University Evidence-based Practice Center.  The information in this module is currently not included in Version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (available at: http://www.effective healthcare.ahrq.gov/repFiles/2007_10DraftMethodsG uide.pdf). Authors

Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov.

Similar presentations

Presentation on theme: "Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov.

Similar presentations

Presentation on theme: "Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov."— Presentation transcript:

Similar presentations

About project

Feedback