Journey to Becoming a Test Data Management HealthNow New York: Journey to Becoming a Test Data Management Center of Excellence
Agenda Drivers for our initiative Our Team (Past & Present) Our High Level Strategy De-identification Subsetting Database Systems How we got started Challenges Where we are today Where we want to be
Drivers Data breach risk Blues Association Mandate Third Party obligations
Our Team 3 Years ago Today 1.5 Optim Administrators Run as “support the business” Implementation Partner Today 2.5 Optim Administrators Run as a project (transitioning to being run as a program) Project Team Project Manager Requirements Analyst Membership System SME QA Lead 3 QA Testers
High Level Strategy: De-identification De-identify core systems and incoming files in the non- production environments Purge downstream systems where possible Allow existing batch processes to populate downstream systems
High Level Strategy: De-identification & Subsetting
High Level Strategy: De-identification
High Level Strategy: Subsetting Subset data Benefits: Reduce the amount of data that can fall into the wrong hands Reduce the disk space and storage costs required by having full copies of our databases in every environment Methodology: Member-centric subset Select every 10th subscriber and all dependents Select all related rows based on this smaller (10%) set of members.
High Level Strategy: Subsetting
High Level Strategy: Database Systems Sybase MSSQL DB2 Oracle PostgreSQL
How We Got Started Implementation Partner Selected one database to de-identify Selected 3 data elements to be de-identified Member First Name Member Last Name Member SSN Manually searched 2000+ tables for the 3 elements
How We Got Started Challenges No Pre-established Data Governance Cross Functional support was very limited Lack of proven strategy
Where We Are Today Gold Copy established for core Membership System De-identified source for refreshes De-identification and subsetting implemented in 3 environments All refreshed multiple times with new features added each cycle 2 TB database => 300 GB with still more room for improvement Data elements being de-identified include: Names, Address, DOB, Phone, Fax, Email, SSN, Provider Tax ID
Where We Are Today 1.7 TB Savings each!! 300 GB each!!
Where We Are Today Several ad hoc requests have been completed Refresh of Plan data Refresh of Provider data Refresh of Pricing data Subset of Inter-plan data Rest API is under development Self-service portal is under development Process for creation of training data is under development
Where We Want To Be Continue to address remaining PHI elements until all completed De-identify remaining non-prod environments Implement dynamic masking for Production environments Implement unstructured data de-identification Establish our self-service portal as the one-stop-shop for de- identification, data subsetting, & data fabrication Migrate flat-file de-identification from Mapforce to REST API
Questions?