Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit October 25, 2016 rpogash@psu.edu 717-531-7689
Data Management Topics Data collection instruments Common data elements/standardization Data dictionary or codebook Database structure Data capture Data entry Data verification Data quality Data errors Data error resolution Audit trail Data quality monitoring Data integrity Participant confidentiality/privacy Database security Data audits External data transfers Data sharing Data preservation
Presentation Focus…….
Definition of Data Quality Institute of Medicine Data that are fit for use Data that give the same result as error-free data In the context of a research study Data that accurately represent the measurements included in the statistical analysis plan
Definition of Data Quality FDA- Clinical Trials Transformation Initiative (CTTI) The ability to effectively and efficiently answer the intended question about the benefits and risks of a medical product (therapeutic or diagnostic) or procedure while assuring protection of human participants.
Team Approach to Research Design Quality Research Data Principal Investigator(s) Content Experts Statistician Research Coordinator Data Manager IT Support Regulatory Expert Human Subjects Protection Expert
Public Health Sciences Contribution Quality Research Data Principal Investigator(s) Content Experts Statistician Research Coordinator Data Manager IT Support Regulatory Expert Human Subjects Protection Expert
Team Contributions Statistical analysis Critical data Critical processes Data collection Electronic interfaces Database design Data cleaning Study monitoring Regulatory compliance Statistical analysis
Sources of Research Data Face-to-face Interviews Personal Devices Scanners/Fax Machines Forms/Surveys Computer Assisted Telephone Interview (CATI) Other Databases Research Database Physiological Measurements Interactive Voice Recognition (IVRS) Audio/Video Recordings Research Team Medical Records
Improve Data Quality Prior to data collection At the point of data entry After data entry
Improve Data Quality Prior to data collection At the point of data entry After data entry
Prior to Data Collection Define the data type for each variable Categorical - defined response choices Numeric Date/Time Text Unique formats Telephone number Zip code E-mail
Prior to Data Collection Minimize the number of variables allowing “free text” responses Cannot be analyzed Will need to be coded prior to analysis Appropriate to describe “Other” if selected from a list of responses
Prior to Data Collection Define the format Numeric fields Integer Number with one decimal place Number with two decimal places, etc. Date fields Month/Day/Year Day/Month/Year Year/Month/Day
Prior to Data Collection Define the format Time fields 12-hour clock with AM and PM designations 24-hour clock HH:MM MM:SS HH:MM:SS
Prior to Data Collection Define the units of a numeric value pounds versus kilograms inches versus centimeters g/dL mg/dL U/L mmHg
Prior to Data Collection Code categorical variables Ordinal – represents degree of measurement 1 = Mild 2 = Moderate 3 = Severe 1 = Strongly Disagree 2 = Disagree 3 = Agree 4 = Strongly Agree
Prior to Data Collection Code categorical variables Nominal - arbitrary; does not represent degree of measurement 0 = No 1 = Yes 9 = Don’t know 1 = Male 2 = Female Be consistent throughout the study by using the same codes
Improve Data Quality Prior to data collection At the point of data entry After data entry
At the point of data entry Choose a software application where you can minimize data entry errors by allowing you to: Define the data type Define the format Control permissible values Defined codes Ranges for numeric variables
At the point of data entry Microsoft Excel is an acceptable option ONLY if you use features which can minimize data entry errors Cell formatting Data validation Freeze panes Microsoft Excel lacks Ability for multiple users to enter data at the same time Adequate audit trail recording who and when edits are made Access security Role-based access
Excel Data Validation
Excel Data Validation
Excel Data Validation
Excel Cell Formatting
At the point of data entry Research Electronic Data Capture application (REDCap) www.ctsi.psu.edu/research-resources/redcap-home Developed by CTSA-supported group at Vanderbilt University Penn State has its own license Maintained by HMC Research IT Free of charge to use Used by over 2060 institutions in107 countries
Build electronic data collection forms with variable validation, branching logic, calculated fields Create a study database structure Create and distribute surveys Import electronic data from other sources Export data to common data analysis packages Review an audit trail of every action completed in the study database Create and view reports Store study related documents Create and execute data quality rules Resolve data errors across individuals View a graphical representation of data Control rights based on roles in study
Creating Forms in REDCap
Creating Forms in REDCap
Creating Forms in REDCap
Creating Forms in REDCap
Creating Forms in REDCap
Improve Data Quality Prior to data collection At the point of data entry After data entry
Verify that the data entry was completed accurately After data entry Verify that the data entry was completed accurately Perform double data entry and compare the two entries for inconsistencies REDCap has a double data entry option with a data comparison tool Visually audit the electronic data by selecting a random sample of records
Create data quality rules to identify potential data errors After data entry Create data quality rules to identify potential data errors Missing values Out-of-range values Problems with branching logic Illogical/inconsistent data
After data entry REDCap has 7 pre-defined data quality rules that you can execute following data entry. Missing values (excluding missing values due to branching logic) Missing values for required fields only Incorrect data type Out-of-range values Outliers for numerical fields Hidden fields that contain values Multiple choice fields with invalid values You can create customized rules as well.
Example of a Study Design in REDCap
Example of a Study Design in REDCap
PHS Data Management Services Develop data management plans Design data collection forms Design administrative forms to facilitate data collection and protocol adherence Provide design assistance to investigators/coordinators creating REDCap projects Create REDCap projects – forms, surveys, and data quality rules Create and implement testing plans for REDCap designs and data quality rules Perform data entry Perform study monitoring activities
PHS Data Management Services Up to 10 hours of services may be free using funds supported by the CTSI Contact: Rosanne Pogash Manager, PHS Data Management Unit rpogash@psu.edu 717-531-7689