Epi 202: Designing Clinical Research Data Management for Clinical Research Thomas B. Newman, MD,MPH Professor of Epidemiology & Biostatistics and Pediatrics, UCSF September 4,
Outline n Data management steps n Advantages of database vs spreadsheet entry n REDCap demonstration n Take-home message: Pretest should include data entry and analysis 2
Data Management Steps n Design data collection form n Capture data n Enter data n Clean data n Then can do data analysis 3
Traditional Paper method n Data collection form design -- Word n Data capture – Pen n Data entry -- keyboard transcription into Excel n Data cleaning -- painful 4
Questionnaire from TN’s DCR section
Oophorectomy ID oophe- rectomy 204no 205yes 207no 208no 209no 211no 212yes 214no 215no 216yes (one) 217no 218no 219no Advantage of paper form: ability to write in answers you had not anticipated Subject might leave it blank or guess if forced to chose 6
Questionnaire from DCR
Race coding: Problems IDrace 204black 205hispanic 207Asian 208white 209latina 211white 212asian 214white 215white 216black 217black 218hispanic 219white n Free text for “other”: hispanic, latina n “Asian” and “asian” are different values for a string variable 8
Questionnaire from DCR
Weight change IDrace weight changegain/lose 204black40loose 205hispanic35gain 207Asian2blank (+/-) 208white10gain 209latina5gain 211white0lose 212asian0 214white15gain 215white10loose 216black25loose 217black0 218hispanic15loose 219white 5-10 poundsloose 10
Data cleaning before transcription- study staff Different color ink Person making changes identified 11
Data cleaning (Stata example) replace race = “Asian” if race == “asian” replace weightchange = 7.5 if weightchange == “5-10 pounds” 12
Questionnaire from DCR
Exercise ID exercise type exercise freqency 204walking2-4times/week 205stretch/walk2-3 days/week 207walking3x 208Curves 3-5 x/week 209bikingevery day 211walking 212walking2x/week aerobic- resistant5-6days/week 216walking2x/week blank These variables will be hard to analyze. This is what we are trying to avoid. 14
Data cleaning before transcription- study staff 15 Simple coding
Advantages of paper n Rapid data entry anywhere n Readily understood n Permanent record n Allows ready annotation 16
Disadvantages of paper n No immediate quality control n Branching logic harder n Data entry required n Allows you to postpone thinking about data analysis when you should be thinking about it now! 17
Consider data analysis early n Restrict options n Provide range and logic checks n Include coding on the paper form n PRETEST data entry and analysis! 18
Data Dictionary n Variable name n Type of variable (binary, integer, real, string, etc.) n Variable label (longer name) n Value labels (e.g., 0 = No, 1 =Yes) n Permitted values n Notes 19
Research Electronic Data Capture (REDCap) n Design survey or data collection form n Creates data dictionary n Can track subjects and responses n Exports to statistical packages n Available with MyResearch account n Other options: Access (PC), Epi-Info (PC), FilemakerPro 20
REDCap demo 21
Home Page 22
My Projects 23
Project Setup 24
Online Survey Designer 25
Add New Field 26
New Question added 27
REDCap Creates a Stata do file clear insheet participant_id redcap_survey_timestamp redcap_survey_identifier mas_or_ticr want_attend_review dates_available___1 dates_available___2 dates_available___3 dates_available___4 field comments survey_complete using "DATA_DCR_FINAL_REVIEW_SESSION_SURVEY_COPY_2_TNEWMAN_ CSV", nonames label data "DATA_DCR_FINAL_REVIEW_SESSION_SURVEY_COPY_2_TNEWMAN_ CSV” label define mas_or_ticr_ 1 "No" 2 "Yes ===> Exit this survey" label define want_attend_review_ 1 "No ====> Exit this survey" 2 "Yes" label define dates_available___1_ 0 "Unchecked" 1 "Checked" label define field_ 1 "Clinical pharmacology" 2 "Community medicine" 3 "Dentistry" 4 "Dermatology" 5 "Emergency medicine" 6 "Endocrinology" 7 "Epidemiology/environmental health" 8 "Family medicine" 9 "Global health" 10 "Hospital medicine" 11 "Infectious disease" 12 … label variable mas_or_ticr "Are you in either the Masters Degree in Clinical Research program or the ATCR (Advanced Training in Clinical Research) program?" 28
Most Important Message: 29 n Pretest!
Questions and comments 30
Extra slides 31
Main decisions n Electronic capture vs paper n Optical form reading vs keyboard transcription n Enter data into database, spreadsheet or statistical package Highly recommended! 32
Advantages of database vs Spreadsheet n Restricts choices n Error checking n Can track study progress, produce reports, export to statistical package n Safer – harder to accidentally alter data 33