25th Annual STATS-DC 2012 Data Conference - - Virginia Longitudinal Data System (VLDS) July 12th, 2012
Agenda Introductions Background CRM Portal Shaker Dashboard Overview Accounts Contacts Entities (RP, Artifacts, Contracts) Workflows Portal User Interface Data Dictionary and Selection Tool Data Request Tool Shaker Questions / Discussion
VLDS Component Overview SLDS Components Portal Security Workflow Reporting Lexicon Shaker Data Service Oriented Architecture Database Agnostic Supports Federated or Warehouse Data Models Data Security SLDS Portal Reporting Workflow Lexicon Shaker
Architecture Overview
VLDS Process Overview
Show me the DEMO!
Data/Communication Graph Portal/CRM Lexicon Exp DB DA Exp DB DA Shaker Exp DB DA
Shaker Ingredients 2 parts Data Request Rx 2 parts Data Request Parsing 1 part Identity Resolution 2 parts Query Execution Yield: Dataset Output
Shaker Details Identity Resolution Source demographics are not constant, so all demographic records are collected in the exposed database The demographic records are grouped by local identifier, ranked, and assigned an alternate identifier The top results are hashed and sent to the Shaker Matching is performed using deterministic fields first, then by probabilistic algorithm involving: Hashed First Name (modified Jaro-Winkler distance algorithm) Hashed Last Name (modified Jaro-Winkler distance algorithm) Hashed Month of birth Hashed Year of birth Hashed Gender Location (ZIP to FIPS region) Product: ID Map of alternate identifiers
Shaker Details Query Execution Dataset Assembly Each set of alternate identifiers is sent back to its exposed database to retrieve the data requested by the researcher/user Dataset Assembly A final identifier is generated to replace the alternate identifiers, thereby achieving our super-secret, double-de-identified directive The final output is a dataset for distribution to a researcher/user or to back an aggregate report Notifications are sent back to the CRM component Then everything is reset for the next data request…
Questions & POCs Sponsors Program Technical Bethann Canada – bethann.canada@doe.virginia.gov Tod Massa – todmassa@schev.edu Jeremy Deyo – jeremy.deyo@vec.virginia.gov Program Matt Bryant matthew.bryant@doe.virginia.gov Technical Ajay Rohatgi (Technical PM) – ajay.rohatgi@vita.virginia.gov Will Goldschmidt (Workflow & Portal PM) – will.goldschmidt@vita.virginia.gov Kathy Graham (Reporting PM) – kathy.graham@vita.virginia.gov Aaron Schroeder (Lexicon & Shaker) – aaron.schroeder@vt.edu
Backup Slides
Portal Features (Public Facing) General Information FAQs Aggregated Data Reports Links to Agency Reports Request for Named User Account (Potentially)
Portal Features (Named Users) My VLDS Team Member Management Research Information Management (Who, What, Where, When, Why) Data request and retrieval Document management (NDAs, research papers, etc.) Ability to check status, modify or cancel account and/or data request Help / Training Password reset Reports Data Request Tool (DRT) Data Dictionary & Selection Tool
Workflow Features Manage Contacts (Researchers) Create / Edit / Approve / Disapprove: Research Purposes Restricted Use Data Agreements Data Packages Artifacts (Documents) Automated Email Notifications and Tasks Document Storage Audit Logs Integration with Microsoft Outlook
Infrastructure / Messaging