New Infrastructure for Harmonized Longitudinal Data with MIDUS and DDI 3.2 Project that MIDUS is working on with Colectica using DDI 3.2 to deliver harmonized datasets. IASSIST 2014 - Toronto
Overview of Presentation Background on MIDUS Importance of DDI for Harmonization Facilitating discovery and complex analysis Current Project Goals Implementation of Project Goals Upgrading MIDUS from DDI 3.1 to 3.2 Building on the MIDUS-Colectica Repository
Background on Baseline: 1995-96 Harvard University MacArthur Foundation N=7,108 Ages 25-74 Satellite studies of stress, cognition
MIDUS: Unique Characteristics Multiple waves (9-10 year interval) Multiple samples/cohorts 1 National (MIDUS Core) 2 Milwaukee 3 Japan (MIDJA) 4 National (MIDUS Refresher) 5 Milwaukee (Refresher) Multidisciplinary design Aging as integrated bio-psycho-social process Result N=11,500 34,000 variables
MIDUS: Unique Characteristics Multiple waves (9-10 year interval) Multiple samples/cohorts Multidisciplinary design Wide use of MIDUS – Open Data philosophy #1 data download at NACDA Top 10 data download at ICPSR 530+ publications
Status of Current DDI Efforts MIDUS Metadata Repository http://midus.colectica.org/
Moving Forward National Institutes of Aging project: “Facilitating Secondary Analysis and Archiving of MIDUS through DDI” Timeline: 2013-2015 The proposal is in response to RFA-AG-13-009 “Secondary Analyses and Archiving of Social and Behavioral Datasets in Aging”.
Current Project goals Under a DDI 3.2 rubric… 1. Harmonization (internal, post-hoc) Clarify related nature of longitudinal and cross-cohort survey variables (RepresentedVariable) Provide information/procedures for reconciliation 2. Custom Data Extract (CDE) Allow researchers to focus on variables of interest Facilitate accurate merges across numerous datasets
Harmonization Concordance table MIDUS P1 concordance table (Google Spreadsheet) Includes “Comparability notes” and “Comparability class” Example: Variable A1PA30 “time since last BP test” Comparabililty notes: “M1 is not directly comparable with M2, MKE, MR, MKER, M3: M1 responses were coded as number of months, while other waves broke out number (amount) and unit (days, weeks, months, years) into 2 separate variables.” Offer code/algorithm for reconciliation Also includes notes and descriptions of how the variables differ. READ example. We’re making this table available to our users and researchers as an Excel spreadsheet, so it can be used “manually,” but it is large and unwieldy. Ideally this information will be integrated in our DDI codebooks and the Colectica repository – aside from the goal of harmonizing, it also accomplishes a much more mundane goal of identifying the same variables across datasets, facilitating longitudinal and cross-cohort analysis.
Custom Data Extract Customized dataset Search variables, use shopping basket Include variables from across all MIDUS projects Merge different datasets Different formats (csv, SPSS, SAS, Stata) Associated DDI codebook
Development Milestones 1. Metadata Quality Report 2. Harmonization 3. Web-based Discoverability 4. Data Extraction
Step 1. Metadata Quality Report Compare the harmonization spreadsheet to the Repository Check for: Missing information Inconsistent labels Inconsistent data types Update the metadata to improve quality
Step 2. Harmonization Use the harmonization spreadsheet Create a RepresentedVariable for each row Store these in the repository
Step 3. Web-based Discoverability Build on top of Colectica Portal Searching and information retrieval out-of-the-box Add cross-reference tables for easy discoverability Choose variables or groups of variables to include in the data extract
Step 4. Data Extraction Store master data in Colectica Repository Based on a user’s selected variables, generate: Datasets CSV, R, SAS, SPSS, Stata HTML and PDF codebooks DDI XML
Progress Complete Metadata Quality Report Complete Harmonization Upcoming Web-based Discoverability Data Extraction Approximately 3800 RepresentedVariables
Acknowledgement This research project is supported by a grant from the National Institute on Aging (R03-AG046312).
NADDI 2015 – April 8 - 10 “Research Data Management: Facilitating Discoverability using Open Metadata Standards”
NADDI 2015 – April 8 - 10 “Research Data Management: Facilitating Discoverability using Open Metadata Standards” University of Wisconsin - Madison
Thank you midus.wisc.edu www.colectica.com Jeremy Iverson – Colectica (jeremy@colectica.com) Barry Radler – UW-Madison (bradler@wisc.edu) Dan Smith – Colectica (dan@colectica.com) midus.wisc.edu www.colectica.com