Collaborative Data Management for Longitudinal Studies

Slides:



Advertisements
Similar presentations
INDIVIDUAL ACHIEVEMENT. EDUCATIONAL EXCELLENCE. ADMINISTRATIVE INNOVATION. INSTITUTIONAL PERFORMANCE. PA Bug 2008 User Conference 1 Benefit/Deduction Combined.
Advertisements

Creative Create Lists Elizabeth B. Thomsen Member Services Manager North of Boston Library Exchange
1 Input/Output and Debugging  How to use IO Streams  How to debug programs  Help on coursework.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
ARCH-05 Application Prophecy UML 101 Peter Varhol Principal Product Manager.
SPSS Tips & Tricks Paula Pelletier, SUNY Stony Brook Julie Meyer Rao, Monroe CC SUNY AIRPO Conference, Summer 2006.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Programming Types of Testing.
Lecture Notes 1/21/04 Program Design & Intro to Algorithms.
Object-Orientated Design Unit 3: Objects and Classes Jin Sa.
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.
1 An introduction to design patterns Based on material produced by John Vlissides and Douglas C. Schmidt.
PROC_CODEBOOK: An Automated, General Purpose Codebook Generator
Bret Juliano. Introduction What Documentation is Required? – To use a program – To believe a program – To modify a program The Flow-Chart Curse Self-Documenting.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Simple Program Design Third Edition A Step-by-Step Approach
Microsoft Excel 2007 © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line.
Automated Data Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Data Management Training Cairo, Egypt April.
What is the NIH RePORTER? And How Will it Help My PI?
CMPS 211 JavaScript Topic 2 Functions and Arrays.
Linux+ Guide to Linux Certification, Third Edition
MERCURY BUSINESS PROCESS TESTING. AGENDA  Objective  What is Business Process Testing  Business Components  Defining Requirements  Creation of Business.
Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.
Chapter 13: sed Say what?. In this chapter … Basics Programs Addresses Instructions Control Spaces Examples.
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
Feb. 8, 2008 UHCO Graduate Course in MATLAB Core Programming Module Best Practices Core Grant Programming Module Best Practices (Coding Conventions) General.
TIGER Standards & Interoperability Collaborative Style Guide for Tutorials.
Comparison of different output options from Stata
PROG Developing Robust Modular Software.. Objectives What do we want? Programmatic Elements in a Business System. Logic Layer. Persistence (Data)
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Creating Analysis Files: Description of Preparation Steps.
Social Learning Construct PSA Core Concept. Have a problem?
CHAPTER 3 MODELING COMPONENT-LEVEL DESIGN.
Data Management Research Methods Professional Development Institute December 4, 2015.
17b.Accessing Data: Manipulating Variables in SAS ®
Using Workflow With Dataforms Tim Borntreger, Director of Client Services.
Rasch model (MML estimation) for 12 GHQ items. Loevinger H, ppp pmm for 12 GHQ items.
CIT 590 Intro to Programming Lecture 2. Agenda ‘import’ Abstraction Functions Testing The concept of scope.
Campus Organization and Relationship Synchronizing IA Track Session ID #4975 Presented by: Teresa Jian-Najar, Datatel Al Hastings, Datatel Celia Abrams,
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
Solving Systems of Linear Equations in 2 Variables Section 4.1.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Extreme Abstraction. If you give a man a fish he will eat for a day. … But if you teach a man to fish he will buy an ugly hat. … And if you talk about.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Accessing data – a user’s perspective
EDC at Dana-Farber/Harvard Cancer Center: Implementing InForm to Support a Bone Marrow Transplant Program Douglas J. Buell Project Specialist Dana-Farber.
CCNA 3 v3.1 Module 6 Switch Configuration
Data Access Policy Review
CS 240 – Lecture 11 Pseudocode.
ECONOMETRICS ii – spring 2018
Dale Rhoda & Mary Kay Trimner Stata Conference 2018
Introduction Introduction to Stata 2016.
Cognitive Interview Obstacles
Increased Efficiency and Effectiveness
Batches, Transactions, & Errors
Introduction to IPUMS NYTS and IPUMS YRBSS
Lecture Set 3 Introduction to Visual Basic Concepts
iLab Training for VU Departments & Users of VUMC Core Groups
Need for the subject.
Runtime Root feature Jason Kenny.
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
Introduction to IPUMS NYTS and IPUMS YRBSS
Alexa Stuifbergen, PhD, RN, FAAN Heather Becker, PhD, Frank Perez, PhD
Stata Basic Course Lab 2.
Presentation, data and programs at:
Object Oriented JSL Techniques for Writing Maintainable/Extendable JSL Code DREW FOGLIA.
Grants Management Solution Suite (GMSS)
File System Performance
Productivity Loop PowerWriter A systematic approach to world-class
Presentation transcript:

Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG18911-01A1)

Agenda 1. Background on Study 2. Problem – Data Management Deficiencies 3. Solution – Collaborative Data Management 4. STATA Programs – maketest & makedata

Background on Study NIH-funded Longitudinal Study Loneliness & Health Thousands of Measures Loneliness Depression 230 subjects Repeated Yearly

Problem – Data Management Deficiencies Code Not Modular …Difficult to manage the data cleaning code …Limited code reuse from year to year …Difficult to collaborate among interns No Established Set of Data Cleaning Steps …Difficult for research assistants (turn-over) …Inconsistent data cleaning techniques …Data cleaning code difficult to read

Problem – Data Management Deficiencies Research Assistant Research Assistant Research Assistant Core File Set Research Assistant Research Assistant

Solution – Collaborative Data Management Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Batch “Data Certification” STATA Programs maketest makedata

Solution – Collaborative Data Management Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Ex:loneliness Batch “Data Certification” STATA Programs maketest makedata

Solution – Collaborative Data Management Process Established Steps File System Layout Automated Tests Collaboration Concepts Module Ex:loneliness Batch Ex:yr1, yr2, yr3 “Data Certification” STATA Programs maketest makedata

Solution – Collaborative Data Management Set of Files for Each Module acquire-[module].do & fix-[module].do test-[module].do derive-[module].do label-[module].do Year-Specific 60% Code Reuse – Files Shared Between Years Acquire & Fix Test Derive Label

STATA Program – maketest Purpose: Auto-generation of Data Certifying Tests Functionality: Tests Variable Type Checks Consistency of Value Labels Verifies Existence of Variable

STATA Program – maketest Syntax: maketest [varlist] using, [REQuire(varlist) append replace] Example: maketest using filename.do, replace Options: using: specifies file to write REQ: requires presence of variables in list append: add to existing test .do file replace: overwrite existing .do file

STATA Program – makedata “Bringing it all together”

STATA Program – makedata Syntax: makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly] Example: makedata ats, p("acquire-*.do") b(yr1) clear replace Options: p: pattern – file naming convention replace: overwrite existing data file clear: clear current data in memory Noisily: full output (default = summary) b: batch – year, wave, center TESTonly: only run tests step

Other Applications Beyond Longitudinal Data Teaching Data Cleaning with STATA Contact Information Stephen Brehm: sbrehm@uchicago.edu L. Philip Schumm: pschumm@uchicago.edu Ronald A. Thisted: thisted@health.bsd.uchicago.edu Supported by National Institute on Aging Grant P01 AG18911-01A1