Download presentation
Presentation is loading. Please wait.
1
Data quality & VALIDATION
Catherine Bauer-Martinez, Indiana University Alvaro Andres Alvarez, Stanford Heather Eng, University of Pittsburgh New York City Tuesday, August 15, 2017
2
Before Data Collection Starts 2. During Data Collection (internal) 3.
1. Before Data Collection Starts 2. During Data Collection (internal) 3. During Data Collection (external)
3
Before Data Collection Starts
1. Before Data Collection Starts
4
Data [information] that provides information about other data
What is Metadata? Data [information] that provides information about other data
5
GOAL: Design data collection forms to meet study needs and ensure complete/correct quality data (Data Dictionary and Project Setup). Good data quality starts with a good database design… Benefits! Reduce number of issues during data capture phase. Reduce the REDCap administrator future support burden. Reduce time on data cleaning process. Data sharing. What are your recommendations and good practices before moving a project to production mode?
6
Inconsistencies in coding for yes/no questions.
7
Forms not assigned to an event.
8
Forms not assigned to an event.
9
LOGIC FIELDS… Calculated Fields. Branching Logic.
Automated Invitation Logic –ASI Survey Queue
10
LOGIC FIELDS… Calculated Fields. Branching Logic.
Automated Invitation Logic –ASI Survey Queue
11
The project is sufficiently tested.
We recommend the creation of at least three test records and at least one export in development mode. This allows you to preview the type of results expected from the project. It is also highly recommended reviewing project's design with a statistician prior to entering production mode to ensure your data capture is configured properly.
12
MOST COMMON ISSUES we FOUND AT STANFORD
13
Quality control Before Going TO PRODUCTION (Stanford)
If research, PI name and last name. If research, IRB Information. % of validated fields. Forms with more fields than recommended. Calculations using "Today". No fields tagged as identifiers. Inconsistencies in coding for positive/negative questions. Date format inconsistencies. “99” or “98” recommended coding of “other”, “unknown” or similar values in drop- down lists, radio-buttons or check-boxes. "My First Instrument" form name presence. Agree? Which other recommendations would you add to the list?
14
… We created a tool for this-
Why NOT Automate this? … We created a tool for this- Demo Time!
15
During Data Collection
2. During Data Collection (internal)
16
Don’t underestimate how important it is to do your data right.
17
It pays to be patient….
18
Data validation in REDCap
19
Data Validation in REDCap
20
Data Quality Tool REDCap has 8 pre-defined data quality rules that you can execute following data entry. Missing values (excluding missing values due to branching logic) Missing values for required fields only Incorrect data type Out-of-range values Outliers for numerical fields Hidden fields that contain values Multiple choice fields with invalid values Incorrect values for calculated fields You can create customized rules as well.
21
Data Exports, Reports and Stats
22
Data Exports, Reports and Stats
Create reports to view all your data in a spreadsheet without having to export from the system. Serves as the search engine of the REDCap project Use reports to check your data quality Queries database in real time and displays results in table format. Choose selected variables Use filters to create reports Reports are saved in left navigation panel Updates every time you click on defined report Edit reports as needed
24
Best Practices Avoid “free” text fields
Define data type for each variable Use standard measures and codes Do not mix data types (e.g., “428.0 heart failure patient had pneumonia”) put code and comment in separate fields Use REDCap validation rules (set minimum and maximum values) Reduce the amount of missing data (!) Avoid blanks Be consistent throughout the study by using the same codes Set up your database with the end in mind
25
During Data Collection
3. During Data Collection (external)
26
Using analysis software for complex data quality programs
Automated overnight process -> SAS Research Repository cURL+API export: form-specific .CSV files from REDCap “DBLOAD.sas” import: form-specific SAS datasets Additional external data (lab, specimen tracking, EMR) Other related REDCap projects Relate by keys (ID, date, timepoint, …) “EDITS.sas” quality control programs “REPORTS.sas” administrative reports
27
Using analysis software for complex data quality programs
“EDITS.sas” quality control programs Confirm REDCap point-of-entry validations Complex longitudinal checks Logical checks between multiple REDCap projects Consistency checks with non-REDCap data, e.g. laboratory specimen tracking self-reported medications vs EHR Reports ed to coordinators for correction in REDCap
28
Using analysis software for complex data quality programs
“REPORTS.sas” High-level administrative reports Accrual and retention Forms and Visit completeness Summary of outstanding QC issues Reports ed to PIs and posted on study website
29
Using analysis software for complex data quality programs
“LogScanner.sas” Opens log file before cURL+API export Closes log file after Reports ed and posted Scans log file for errors, warnings, unexpected events Sends to DM each morning: Errors found … <details> All is well!
30
New York City Tuesday, August 15, 2017
THANK YOU! Breakout Session New York City Tuesday, August 15, 2017
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.