VACS Data Build Process of Cleaning and versioning of data sources Creating faster and efficient data access for analysis Provide a single, secure and centralized location for all data sources.
Administrative (Austin) data (1997-2006) Survey (full study baseline,) no follow-ups yet right? FU1,2,3) PBM (October 1998 – July 2005) ICR (fiscal year 1991-2003, HIV +) Pharmacy, Labs, Diagnoses Enrollment (demographics, consent) VISTA EMR direct from sites
VACS Data Build 1 Goals: Accurate clean data Centralized Documented Correct enrolled full study – documented consent and completed baseline survey Data integrity – to prevent duplication and redundancy Centralized Documented Secure
VACS Data Build Status Data in SQLServer Data in SAS, STATA, Text Master “Enrolled” Baseline surveys, cause of death, TLFB Admin Data – ICR, Austin, PBM Vista – labs (06) Data in SAS, STATA, Text VACS3,5 surveys, TLFB SAS cleaning code VISTA other - such as clean site labs Documentation – Metadata for survey questions, choices, lab codes, med. Codes (ARV), diagnostic codes
What does this mean? Faster turn-around of data requests will be possible - Main coordinating staff will have access to data (not just MS) - Data will be linkable for ease of combining various data sources Easy tracking of methods for papers, abstracts and presentations
What does this mean? BUT—data is ‘clean’ at the row/record level not subject Programming still required to create data sets for requests such as: - selecting specific labs, pharmacy data for subjects within +- of enrollment dates - comorbidities (ICDs) at different times – Baseline, FU’s
Subsequent Builds Data – normalized and cleaned Surveys FU1, FU2, FU3 Other such as Vista/EMR site data Automated load/extract processes Security enhancements Stored objects for queries and statistical analysis Web interfaces for data analysis and requests using metadata Documentation on data sources such as Austin, PBM
Credits The VACS team for their functional and technical input.