Download presentation
Presentation is loading. Please wait.
Published byDarlene Preston Modified over 6 years ago
1
Collaborative Data Management for Longitudinal Studies
Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG A1)
2
Agenda 1. Background on Study
2. Problem – Data Management Deficiencies 3. Solution – Collaborative Data Management 4. STATA Programs – maketest & makedata
3
Background on Study NIH-funded Longitudinal Study Loneliness & Health
Thousands of Measures Loneliness Depression 230 subjects Repeated Yearly
4
Problem – Data Management Deficiencies
Code Not Modular …Difficult to manage the data cleaning code …Limited code reuse from year to year …Difficult to collaborate among interns No Established Set of Data Cleaning Steps …Difficult for research assistants (turn-over) …Inconsistent data cleaning techniques …Data cleaning code difficult to read
5
Problem – Data Management Deficiencies
Research Assistant Research Assistant Research Assistant Core File Set Research Assistant Research Assistant
6
Solution – Collaborative Data Management
Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Batch “Data Certification” STATA Programs maketest makedata
7
Solution – Collaborative Data Management
Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Ex:loneliness Batch “Data Certification” STATA Programs maketest makedata
8
Solution – Collaborative Data Management
Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Ex:loneliness Batch Ex:yr1, yr2, yr3 “Data Certification” STATA Programs maketest makedata
9
Solution – Collaborative Data Management
Set of Files for Each Module acquire-[module].do & fix-[module].do test-[module].do derive-[module].do label-[module].do Year-Specific 60% Code Reuse – Files Shared Between Years Acquire & Fix Test Derive Label
10
STATA Program – maketest
Purpose: Auto-generation of Data Certifying Tests Functionality: Tests Variable Type Checks Consistency of Value Labels Verifies Existence of Variable
11
STATA Program – maketest
Syntax: maketest [varlist] using, [REQuire(varlist) append replace] Example: maketest using filename.do, replace Options: using: specifies file to write REQ: requires presence of variables in list append: add to existing test .do file replace: overwrite existing .do file
12
STATA Program – makedata
“Bringing it all together”
13
STATA Program – makedata
Syntax: makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly] Example: makedata ats, p("acquire-*.do") b(yr1) clear replace Options: p: pattern – file naming convention replace: overwrite existing data file clear: clear current data in memory Noisily: full output (default = summary) b: batch – year, wave, center TESTonly: only run tests step
14
Other Applications Beyond Longitudinal Data
Teaching Data Cleaning with STATA Contact Information Stephen Brehm: L. Philip Schumm: Ronald A. Thisted: Supported by National Institute on Aging Grant P01 AG A1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.