Data quality & VALIDATION Catherine Bauer-Martinez, Indiana University Alvaro Andres Alvarez, Stanford Heather Eng, University of Pittsburgh New York City Tuesday, August 15, 2017
Before Data Collection Starts 2. During Data Collection (internal) 3. 1. Before Data Collection Starts 2. During Data Collection (internal) 3. During Data Collection (external)
Before Data Collection Starts 1. Before Data Collection Starts
Data [information] that provides information about other data What is Metadata? Data [information] that provides information about other data
GOAL: Design data collection forms to meet study needs and ensure complete/correct quality data (Data Dictionary and Project Setup). Good data quality starts with a good database design… Benefits! Reduce number of issues during data capture phase. Reduce the REDCap administrator future support burden. Reduce time on data cleaning process. Data sharing. What are your recommendations and good practices before moving a project to production mode?
Inconsistencies in coding for yes/no questions.
Forms not assigned to an event.
Forms not assigned to an event.
LOGIC FIELDS… Calculated Fields. Branching Logic. Automated Invitation Logic –ASI Survey Queue
LOGIC FIELDS… Calculated Fields. Branching Logic. Automated Invitation Logic –ASI Survey Queue
The project is sufficiently tested. We recommend the creation of at least three test records and at least one export in development mode. This allows you to preview the type of results expected from the project. It is also highly recommended reviewing project's design with a statistician prior to entering production mode to ensure your data capture is configured properly.
MOST COMMON ISSUES we FOUND AT STANFORD
Quality control Before Going TO PRODUCTION (Stanford) If research, PI name and last name. If research, IRB Information. % of validated fields. Forms with more fields than recommended. Calculations using "Today". No fields tagged as identifiers. Inconsistencies in coding for positive/negative questions. Date format inconsistencies. “99” or “98” recommended coding of “other”, “unknown” or similar values in drop- down lists, radio-buttons or check-boxes. "My First Instrument" form name presence. Agree? Which other recommendations would you add to the list?
… We created a tool for this- Why NOT Automate this? … We created a tool for this- Demo Time!
During Data Collection 2. During Data Collection (internal)
Don’t underestimate how important it is to do your data right.
It pays to be patient….
Data validation in REDCap
Data Validation in REDCap
Data Quality Tool REDCap has 8 pre-defined data quality rules that you can execute following data entry. Missing values (excluding missing values due to branching logic) Missing values for required fields only Incorrect data type Out-of-range values Outliers for numerical fields Hidden fields that contain values Multiple choice fields with invalid values Incorrect values for calculated fields You can create customized rules as well.
Data Exports, Reports and Stats
Data Exports, Reports and Stats Create reports to view all your data in a spreadsheet without having to export from the system. Serves as the search engine of the REDCap project Use reports to check your data quality Queries database in real time and displays results in table format. Choose selected variables Use filters to create reports Reports are saved in left navigation panel Updates every time you click on defined report Edit reports as needed
Best Practices Avoid “free” text fields Define data type for each variable Use standard measures and codes Do not mix data types (e.g., “428.0 heart failure patient had pneumonia”) put code and comment in separate fields Use REDCap validation rules (set minimum and maximum values) Reduce the amount of missing data (!) Avoid blanks Be consistent throughout the study by using the same codes Set up your database with the end in mind
During Data Collection 3. During Data Collection (external)
Using analysis software for complex data quality programs Automated overnight process -> SAS Research Repository cURL+API export: form-specific .CSV files from REDCap “DBLOAD.sas” import: form-specific SAS datasets Additional external data (lab, specimen tracking, EMR) Other related REDCap projects Relate by keys (ID, date, timepoint, …) “EDITS.sas” quality control programs “REPORTS.sas” administrative reports
Using analysis software for complex data quality programs “EDITS.sas” quality control programs Confirm REDCap point-of-entry validations Complex longitudinal checks Logical checks between multiple REDCap projects Consistency checks with non-REDCap data, e.g. laboratory specimen tracking self-reported medications vs EHR Reports emailed to coordinators for correction in REDCap
Using analysis software for complex data quality programs “REPORTS.sas” High-level administrative reports Accrual and retention Forms and Visit completeness Summary of outstanding QC issues Reports emailed to PIs and posted on study website
Using analysis software for complex data quality programs “LogScanner.sas” Opens log file before cURL+API export Closes log file after Reports emailed and posted Scans log file for errors, warnings, unexpected events Sends email to DM each morning: Errors found … <details> All is well!
New York City Tuesday, August 15, 2017 THANK YOU! Breakout Session New York City Tuesday, August 15, 2017