Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Slides:



Advertisements
Similar presentations
INDIVIDUAL ACHIEVEMENT. EDUCATIONAL EXCELLENCE. ADMINISTRATIVE INNOVATION. INSTITUTIONAL PERFORMANCE. PA Bug 2008 User Conference 1 Benefit/Deduction Combined.
Advertisements

Maintaining data quality: fundamental steps
Creating New Financial Statements In Excel Presented by: Nancy Ross.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
ARCH-05 Application Prophecy UML 101 Peter Varhol Principal Product Manager.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
SPSS Tips & Tricks Paula Pelletier, SUNY Stony Brook Julie Meyer Rao, Monroe CC SUNY AIRPO Conference, Summer 2006.
Why python? Automate processes Batch programming Faster Open source Easy recognition of errors Good for data management What is python? Scripting programming.
Programming Types of Testing.
The Zebra Striped Network File System Presentation by Joseph Thompson.
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
Contributions of Dr. David Parnas to the Development of Software Engineering Background History of Computer Technology Career of David Parnas Areas of.
Operating Systems Simulator Jessica Craddock Kelvin Whyms CPSC 410.
Lecture Notes 1/21/04 Program Design & Intro to Algorithms.
Creating Packages. 2 home back first prev next last What Will I Learn? Describe the reasons for using a package Describe the two components of a package:
Object-Orientated Design Unit 3: Objects and Classes Jin Sa.
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
1 An introduction to design patterns Based on material produced by John Vlissides and Douglas C. Schmidt.
PROC_CODEBOOK: An Automated, General Purpose Codebook Generator
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
Bret Juliano. Introduction What Documentation is Required? – To use a program – To believe a program – To modify a program The Flow-Chart Curse Self-Documenting.
Structured COBOL Programming, Stern & Stern, 9th edition
Simple Program Design Third Edition A Step-by-Step Approach
Microsoft Excel 2007 © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line.
Automated Data Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Data Management Training Cairo, Egypt April.
Key Data Management Tasks in Stata
Mental Healthcare Utilization as Adolescents Become Young Adults Jennifer W. Yu, Sc.D. Sally H. Adams, Ph.D. Claire Brindis, Dr.P.H. Charles E. Irwin,
A collaborative partnership between the State of Kansas Department of Revenue – Property Valuation Division (KDOR/PVD), the Kansas GIS Policy Board’s Data.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015.
CMPS 211 JavaScript Topic 2 Functions and Arrays.
Linux+ Guide to Linux Certification, Third Edition
MERCURY BUSINESS PROCESS TESTING. AGENDA  Objective  What is Business Process Testing  Business Components  Defining Requirements  Creation of Business.
Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.
Chapter 13: sed Say what?. In this chapter … Basics Programs Addresses Instructions Control Spaces Examples.
Cs413_design04.ppt Design and Software Development Design : to create a functional interface that has high usability Development : an organized approach.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer Progress Sonic.
Java Programming, 2E Introductory Concepts and Techniques Chapter 4 Decision Making and Repetition with Reusable Objects.
#include using namespace std; // Declare a function. void check(int, double, double); int main() { check(1, 2.3, 4.56); check(7, 8.9, 10.11); } void check(int.
PROG Developing Robust Modular Software.. Objectives What do we want? Programmatic Elements in a Business System. Logic Layer. Persistence (Data)
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Creating Analysis Files: Description of Preparation Steps.
Social Learning Construct PSA Core Concept. Have a problem?
Data Management Research Methods Professional Development Institute December 4, 2015.
Linux+ Guide to Linux Certification, Second Edition
17b.Accessing Data: Manipulating Variables in SAS ®
Rasch model (MML estimation) for 12 GHQ items. Loevinger H, ppp pmm for 12 GHQ items.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
CIT 590 Intro to Programming Lecture 2. Agenda ‘import’ Abstraction Functions Testing The concept of scope.
Chapter 2: The Visual Studio.NET Development Environment Visual Basic.NET Programming: From Problem Analysis to Program Design.
Solving Systems of Linear Equations in 2 Variables Section 4.1.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Extreme Abstraction. If you give a man a fish he will eat for a day. … But if you teach a man to fish he will buy an ugly hat. … And if you talk about.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Using a set-up file to read ASCII data into Stata
Accessing data – a user’s perspective
MATLAB – More Script Files
ECONOMETRICS ii – spring 2018
Collaborative Data Management for Longitudinal Studies
Dale Rhoda & Mary Kay Trimner Stata Conference 2018
Introduction Introduction to Stata 2016.
Finding and Preparing Data
Lecture Set 3 Introduction to Visual Basic Concepts
Need for the subject.
Runtime Root feature Jason Kenny.
Stata Basic Course Lab 2.
Presentation, data and programs at:
Object Oriented JSL Techniques for Writing Maintainable/Extendable JSL Code DREW FOGLIA.
Review: Graphing an Equation
File System Performance
Presentation transcript:

Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG A1)

Agenda 1. Background on Study 2. Problem – Data Management Deficiencies 3. Solution – Collaborative Data Management 4. STATA Programs – maketest & makedata

Background on Study NIH-funded Longitudinal Study Loneliness & Health Thousands of Measures –Loneliness –Depression 230 subjects Repeated Yearly

Problem – Data Management Deficiencies Code Not Modular …Difficult to manage the data cleaning code …Limited code reuse from year to year …Difficult to collaborate among interns No Established Set of Data Cleaning Steps …Difficult for research assistants (turn-over) …Inconsistent data cleaning techniques …Data cleaning code difficult to read

Problem – Data Management Deficiencies Research Assistant Research Assistant Research Assistant Research Assistant Research Assistant Core File Set

Solution – Collaborative Data Management Process –Established Steps –File System Layout –Automated Tests –Collaboration Concepts –Module –Batch –“Data Certification” STATA Programs –maketest –makedata

Solution – Collaborative Data Management Process –Established Steps –File System Layout –Automated Tests –Collaboration Concepts –Module Ex: loneliness –Batch –“Data Certification” STATA Programs –maketest –makedata

Solution – Collaborative Data Management Process –Established Steps –File System Layout –Automated Tests –Collaboration Concepts –Module Ex: loneliness –Batch Ex: yr1, yr2, yr3 –“Data Certification” STATA Programs –maketest –makedata

Solution – Collaborative Data Management Set of Files for Each Module acquire-[module].do & fix-[module].do test-[module].do derive-[module].do label-[module].do Acquire & Fix DeriveTestLabel Year-Specific60% Code Reuse – Files Shared Between Years

STATA Program – maketest Purpose: –Auto-generation of Data Certifying Tests Functionality: –Tests Variable Type –Checks Consistency of Value Labels –Verifies Existence of Variable

STATA Program – maketest Syntax: –maketest [varlist] using, [REQuire(varlist) append replace] Example: –maketest using filename.do, replace Options: –using: specifies file to write –REQ: requires presence of variables in list –append: add to existing test.do file –replace: overwrite existing.do file

STATA Program – makedata “Bringing it all together”

STATA Program – makedata Syntax: –makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly] Example: –makedata ats, p("acquire-*.do") b(yr1) clear replace Options: –p: pattern – file naming convention –replace: overwrite existing data file –clear: clear current data in memory –Noisily: full output (default = summary) –b: batch – year, wave, center –TESTonly: only run tests step

Other Applications Beyond Longitudinal Data Teaching Data Cleaning with STATA Contact Information –Stephen Brehm: –L. Philip Schumm: –Ronald A. Thisted: Supported by National Institute on Aging Grant P01 AG A1