Download presentation
Presentation is loading. Please wait.
1
Research Data Management with REDCap
2
Agenda What can I do with a database? Why use a database?
Research Data Management Best Practices What are some software options and how do they compare? What is REDCap? Example of an existing REDCap Guided Practice On Own Practice Review of On Own Practice
3
Databases-What can I do with them?
A collection of data stored in a way that makes them useful Systematically organized Store data Many patients Variables for individual patients over time Organize data in a way that makes it easy to enter and easy to use Query the database (retrieve data) How many patients had an MI? Extract datasets to analyze Outcomes, exposures, confounders and effect modifiers
4
Why use a database? Allows for managing data
Entry, manipulation, export Allows access for those who need it Data entry Data export for analysis May offer needed security
5
Why use a database? Allows data for multiple subjects to be collected in a single place Using a database, you can find associations that you can’t find with just a few patients. Data can be aggregated Example: Do antibiotics cause MI? We can’t tell from one observation Of 1,000 patients who had an MI, 50 (5%) were on antibiotics within the prior year of their MI Of 3,000 who did not have an MI, 160 (5.3%) were on an antibiotic No association between MI and antibiotic (p=0.74)
6
Why use a database? Database allows you to VALIDATE or set controls for what data can be entered and how it can be entered. Spreadsheet: Database Study ID Sex Date of Birth Age at enrollment 1 M 14/12/1980 30 2 F 11/26/1976 45 3 female 20/01/12 The first example is a spreadsheet. Notice how the Sex is sometimes just a letter, and sometimes written out. The date of birth sometimes has the day first, sometimes has the month first, and can have either 2 or 4 digit years. The age is a calculated field, that is incorrect in a spreadsheet, which defaults to the current century. Study ID Sex Date of Birth Age at enrollment 1 M 14/12/1980 30 2 F 26/11/1976 45 3 20/01/1912 103
7
Why use a database? Databases maintain INTEGRITY of a record, whereas spreadsheets can be sorted independently by columns Spreadsheet—Sorted by date of birth Database—Sorted by date of birth Study ID Sex Date of Birth Age at enrollment 1 M 20/01/1912 30 2 F 26/11/1976 45 3 14/12/1980 In the top example a spreadsheet was sorted by the date of birth. Notice that only the date of birth was sorted, so the study ID, sex and age at enrollment no longer match up to the date of birth. In the bottom example, a database was sorted by date of birth. Notice that the database kept the study ID, sex and age at enrollment with the correct corresponding date of birth. Study ID Sex Date of Birth Age at enrollment 3 F 20/01/1912 2 26/11/1976 45 1 M 14/12/1980 30
8
Why use a database? Security Practicality Reliability
Can require authentication (login and password) Can be encrypted Can tell who accessed the data and what they did Practicality Multiple users can enter data at the same time Allows large numbers of records (patients/visits) and columns (variables) compared to spreadsheeets Reliability Eliminates possibility of multiple different versions of data (happens frequently with spreadsheets)
9
Research Data Management Best Practices
10
Research Data Management Best Practices
Is there a specific question or hypothesis? If no, STOP and determine the key question(s). Identify Key Variables to answer question Outcomes Explanatory Variables/Exposures Secondary Explanatory Variables Confounders Effect Modifiers
11
Research Data Management Best Practices—Define Variables
Set objective criteria whenever possible For example: High blood pressure is considered 3 consecutive blood pressure measurements with the systolic > X. Determine the source of the data Instrument, provider, patient, relative of patient
12
Research Data Management Best Practices—Determine Variable Types
Numeric (integer vs. decimal) Categorical Yes/No Absence/Presence Date (8JUL2016) Time (13:42) Datetime (8JUL :42)
13
Research Data Management Best Practices—Key Questions
Can the research question be answered with the variables included? How will the data be analyzed? What level of granularity is needed? Exact values v. Groups of values Will you need to derive analysis variables? Calculate age, BMI Is it acceptable or necessary to specify ‘unknown’? Is there a difference between unknown and missing? Are responses for a given variable mutually exclusive? Check box, individual questions v. radio buttons/dropdown Unknown, Missing and No example: No value for date of birth: does this mean that the person has no date of birth or that this data was not collected? Do not allow a missing value to equal no (no tobacco use is different than not asking someone if they use tobacco)
14
Research Data Management Best Practices—Metadata
Always document! Record ALL the information for the decisions you make Ensure that this is ACCESSIBLE and READABLE for those entering, managing analyzing the data
15
Research Data Management Best Practices—Form Creation
Consistency Questions using the same answer choices should use same order of answers Provide hints, clarifications and definitions Phrase questions in the positive Yes: Did the patient complete the visit rather No: Did the patient not complete the visit Avoid check boxes when possible Difficult to tell if not checked because missed v. not applicable
16
Types of applications commonly used as database
Requires a license ($$) Microsoft Excel Microsoft Access SQL Server Oracle Open Source/Free REDCap EpiInfo Open Office Calc Postgres MySQL
17
Selecting a database Consider the tasks you will need to perform
Data collection and editing? Data transformation/calculations? Basic vs. advanced statistics? Making figures? A single platform may not be enough, but use as few as possible Software Mac PC Data entry/ Forms Data editing Transform data Basic stats Adv statics Figures REDCap web +++ - + Epi Info X ++ Open Office R/RStudio
18
Benefits of some applications
REDCap Web-based point-and-click interface Requires little training HIPAA compliant (secure) EpiInfo Easy to set-up, well known in stats community Can use for data storage AND analysis Excel/Open Office Calc Easy set-up Postgres Can have multiple related tables Set acceptable formats within a column Powerful (many data points)
19
Potential Problems with some applications
Excel/Open Office Calc Uncertain data integrity Limited ability to validate data Problematic sorting (e.g. might only sort a single column) Limited in number of rows and columns Postgres No data entry interface – must create elsewhere Requires SQL training to manipulate and extract data EpiInfo Limited data validation functionality No longitudinal mode, must create multiple variables for each time point REDCap Requires IT support to set-up and maintain Need internet connection, at least for syncing
20
REDCap A database with data entry forms that is supported by the University of Zimbabwe Secure platform for data storage (protects patient data) Accessible via the internet: You can make your own database!
21
REDCap Features Develop forms through web interface or .csv file
Data formatting (dates v. numbers v. integers v. check boxes v. radio buttons/drop downs) Hint fields Data validation checks Branching logic (controls field availability) Import data from other sources Export data to multiple formats (Excel, R, Stata…) De-identification features Some built-in reporting and visualization tools Store documents (PDF, jpeg, doc) User and group-level security Demo of each feature to follow
22
Guided Practice Create a new project called ‘my first database’
23
Guided Practice Create a ‘demographics’ form
24
Guided Practice Create a ‘demographics’ form
First, press ‘create’, then the green box at the bottom will appear, where you can enter the name of the form. Once you enter the name, press ‘create’. After it is created, click on the form name, so that you can open it up and modify it.
25
Guided Practice Add the following field: Patient First Name
Text Box (short text) Required Identifier
26
Guided Practice Create the field: Study ID
You should always start the first form with a common identifier that will be used to link multiple forms
27
Guided Practice Add the following field: Patient First Name
Text Box (short text) Required Identifier
28
Guided Practice Repeat Steps for Patient Last Name
Text Box (short text) Required Identifier
29
Guided Practice Add the following field: Birthdate
Text box (short text) Date format ‘D-M-Y’ Required Identifier
30
Guided Practice Add the following field: Sex
Use drop down for the options: Male, Female, Unknown Make this field required Note that you do not have to assign the numeric value, REDCap will assign this for you. If you want to use different values, you can enter them yourself.
31
Guided Practice Create a new form called ‘Baseline Visit’
32
Guided Practice Add the following field Date of visit
Date format ‘D-M-Y’ Required Identifier
33
Guided Practice Create a section before Date of Visit called ‘Visit Information’
34
Guided Practice Add the following field: Were vitals obtained?
Yes/no field Required
35
Guided Practice Add the following field: Heart Rate Format: integer
Minimum: 0 Maximum: 500
36
Guided Practice Add the following field: Temperature
Format: one decimal place Hint text: Celsius Minimum: 0 Maximum: 45
37
Guided Practice Add a section between Date of Visit and Heart Rate and call it ‘Vital Signs’
38
Guided Practice This is what the form now looks like in online designer
39
Guided Practice Add branching logic
Click on the double green arrows in the Heart Rate field. In the box that appears, go to the bottom section, called Drag-N-Drop Logic Builder, click on the field/value set: vitals_obtained = Yes (1), and drag this to the next column
40
Guided Practice Add branching logic with advanced syntax to the temperature field This time, instead of using the ‘drag and drop’ logic builder, use the Advanced logic syntax. Type [vitals_obtained] = ‘1’
41
Guided Practice Here is what the form now looks like in online designer. Notice the text that indicates branching logic exists.
42
Guided Practice Here is what it looks like if you use the preview in Online Designer
43
Guided Practice: Data Entry
Here is what the form looks like when you open it to enter data
44
Guided Practice Once you answer the vitals obtained question, the heart rate and temperature become visible.
45
Guided Practice-Graphical Data View
46
Guided Practice-Graphical Data View
47
Guided Practice-Graphical Data View
Since Patient Last Name is free-text, the graphical view only tells you how many records there are and how many are missing. Since the sex variable is a drop down, you can see a bar chart of the values, in addition to how many unique values are in the data (in this case, only 2 of the 3)
48
Guided Practice-Graphical Data View
If we look at the other form, we see additional types of data. Here, you can see descriptive statistics for a continuous variable (heart rate), with a plot of the points. The median value is in red. Note that it indicates that one value is missing, which is 20% of the data. This is important to note when you are analyzing data, and you may need to go back and collect this missing value, if it is still available.
49
Guided Practice-Graphical Data View
50
Guided Practice-Exporting Data
Click on ‘Data export Tool’ You can either choose the simple export (all data) or an advanced export, where you choose the fields
51
Guided Practice-Exporting data
Here are all the formats you can export the data in. Click on the file type(s) that you prefer in the right-hand column
52
Guided Practice-Importing a Data Dictionary
First, export the existing data dictionary. Do this by going to Project Setup and then clicking on the ‘Download current data dictionary’ hypertext.
53
Guided Practice-Importing a Data Dictionary
Open the .csv file (opens in Excel or Open Office Calc). Enter two new fields: Sbp (systolic blood pressure): Form=baseline_visit Field Type = text Field Label=Systolic Blood Pressure Field Note=mmhg Text Validation=integer Text Validation min = 10 Text Validation max = 250 Branching Logic=[vitals_obtained]=‘1’ Create second variable dbp (diastolic blood pressure), with all the same information, except name and label. Save the file.
54
Guided Practice-Importing a Data Dictionary
Next, click on ’upload data dictionary’.
55
Guided Practice-Importing a Data Dictionary
Now, import the /csv file you just modified. Click browse and find the file you just modified. Then press ‘upload file’
56
Guided Practice-Importing a Data Dictionary
You will be asked to commit changes from the import.
57
Guided Practice-Importing a Data Dictionary
You will be told that the import was successful. If you do not receive this message, then you will need to troubleshoot what is wrong.
58
Guided Practice-Importing a Data Dictionary
Here is what the form looks like now in online designer.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.