Virginia’s Longitudinal Data System

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Service Manager for MSPs
FAST Radar System Engineering Overview. FAST Radar Overview –What’s Required? IIS 6.0  With Microsoft.NET Framework 1.1 and SMTP for MS SQL Server.
SiS Technical Training Development Track Technical Training(s) Day 1 – Day 2.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Title Slide without Picture Subtitle Presenter’s Name Presenter’s Title Organization,
Database Security Managing Users and Security Models.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Databases & Data Warehouses Chapter 3 Database Processing.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Privacy and Security in the VLDS. 2 Commonwealth Security Benefits (Intended) Confidence in the integrity of the data and the systems processes Assistance.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Using SAS® Information Map Studio
Enterprise Reporting Solution
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
IPortal Bringing your company and your business partners together through customized WEB-based portal software. SanSueB Software Presents iPortal.
Virginia’s Longitudinal Data System A Federated Approach to Longitudinal Data April 4 th, 2011.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
2 Copyright © 2008, Oracle. All rights reserved. Building the Physical Layer of a Repository.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. PeopleSoft General Ledger 9.2 New Features 9.2 Release New Features.
19 Copyright © 2008, Oracle. All rights reserved. Security.
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
1 Finding Your Way Through a Database Exploring Microsoft Office Access.
(Required for DTCs, Recommended for STCs)
ICD v7.6 Analytic Capability
Building a Data Warehouse
REDCap General Overview
Database and Cloud Security
Architecture Review 10/11/2004
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
The Self-Service Business Intelligence Suite
Introduction To DBMS.
Software Application Overview
IST 220 – Intro to Databases
Creating Oracle Business Intelligence Interactive Dashboards
Welcome! To the ETS – Create Client Account & Maintenance
Leveraging the Business Intelligence Features in SharePoint 2010
Using E-Business Suite Attachments
Data and Applications Security Developments and Directions
Database Management:.
The Self-Service Business Intelligence Suite
SysML 2.0 Model Lifecycle Management (MLM) Working Group
Basic Work-Flow with SQL Server Standard
ERO Portal Overview & CFR Tool Training
Exploring Microsoft Office Access
A Guide to Shift’s Open Data ecosystem & Data workflow
Swagatika Sarangi (Jazz), MDM Expert
25th Annual STATS-DC 2012 Data Conference - - Virginia Longitudinal Data System (VLDS) July 12th, 2012.
Chapter 1 Database Systems
Collaborative Business Solutions
(Required for DTCs, Recommended for STCs)
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Data and Applications Security Developments and Directions
Chapter 1 Database Systems
Serving Area Process ACACSO Conference May 10, 2018
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
Data and Applications Security Developments and Directions
Unit J: Creating a Database
Health & Consumers DG SANCO Unit A.4 Information systems
Exploring Microsoft Office Access
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Survey Results Respondents: 39 of 51 – 76%
Best Practices in Higher Education Student Data Warehousing Forum
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Implementing a Distributed Enterprise Architecture to Deliver BI
Check-in Identity and Access Management solution that makes it easy to secure access to services and resources.
SDMX IT Tools SDMX Registry
Presentation transcript:

Virginia’s Longitudinal Data System A Federated Approach to Longitudinal Data April 4th, 2011

Agenda The Challenge Virginia’s Approach Best Practice and SME Findings Design Considerations Proposed Solution Summary

The Challenge To develop a Statewide Longitudinal Data System (SLDS) that, without violating privacy policies or law, provides users with a capability to query, link, download and create reports from record level or aggregate data between one or more agencies Because of existing Commonwealth law, the SLDS could not be based on an underlying data warehouse De-identified data may be merged when a viable reason exist. However, The use of persistent, de-identified, linked (merged) data was determined to be highly inefficient and raised political issues which could have endangered the project. Also contains a BI capability for DOE reports. December 13, 2010

Virginia’s Approach Virginia undertook a comprehensive investigation of best practices and subject matter experts to determine the feasibility of a federated data model. Between October and December 2010, the Center for Innovative Technology (CIT), Virginia Information Technologies Agency (VITA) and the Department of Education (DOE) interviewed six best practice organizations and ten subject matter experts. Those findings led to a SLDS Technical Architecture which fulfilled the objective of the grant while adhering to the Commonwealth’s privacy constraints.

Significant Findings Best Practice Interviews Subject Matter Experts Interviews Stakeholder Management Federated Systems Perform Poorly Data Governance Use of Commercial Solutions Leveraging Existing Systems Use of Multiple Hash Keys Requirements Drive System Architecture Cleary Defined Security Policies

Important Design Considerations User friendly Maximize use of existing technologies/solutions Minimize sustainment costs Record level data queries were not time sensitive Strong central security model

The Solution A federated data model and technical architecture comprised of a web based user interface (UI), a query/linking engine, a multi-level security module, a rich business intelligence (BI) capability, a Lexicon and integrated workflow. Data Security SLDS Portal Reporting Workflow Lexicon Shaker Also contains a BI capability for DOE reports. December 13, 2010

Data Security SLDS Portal Reporting Workflow Lexicon Shaker

Conceptual Portal SLDS Portal Security Reporting Workflow Lexicon Data Security SLDS Portal Reporting Workflow Lexicon Shaker

Portal Components Shaker Reports Lexicon Workflow Data Security SLDS Portal Reporting Workflow Lexicon Shaker Shaker Distributed Query Engine (DQE) For use by Agency employees and named users Reports Public Facing Aggregated Data Named Users - Query Building Tool (QBT) Lexicon Workflow Account request Data request

Portal Features (Public Facing) Data Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregated Data Reports Lexicon Links to Agency reports Help Files FAQs Request for Named User Account

Portal Features (Named Users) Data Security SLDS Portal Reporting Workflow Lexicon Shaker Help / Training Reports Non-suppressed aggregated data Query Building Tool (QBT) Lexicon Workflow Account and Data request Data retrieval File Attachment for uploading NDAs, etc. Ability to check status, modify or cancel account and/or data request Password reset

Data Security SLDS Portal Reporting Workflow Lexicon Shaker

Security Overview Anonymous Named Schools Researchers Agency Employees Data Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregated Data (Suppressed) Aggregated Data (Non- Suppressed) Unit Record Level Data Account Management Portal Components Anonymous Named Schools Researchers Agency Employees System Admin

Security Authentication Authorization Viewing Viewing Suppressed Data SLDS Portal Reporting Workflow Lexicon Shaker Data Security SLDS Portal Reporting Workflow Lxicon Shaker Authentication Viewing Viewing Suppressed Data Non-Suppressed Data Authorization Database Table Column Role Based Permission Viewing Editing

Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

Workflow Data Security SLDS Portal Reporting Workflow Lexicon Shaker

Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

Reporting: Record Level Linked Data Security SLDS Portal Reporting Workflow Lexicon Shaker Report Creation1,2 (Ad Hoc interface) Lexicon Shell Database1,2 Ad Hoc Metadata Query Results5,6 DOE SCHEV VEC Approval 1. Instantiates the information contained in the Lexicon. 2. Contains dummy data. Source Data Report link will display report with dummy data. Report will have a button that will allow submission of report to workflow. Distributed query engine generate queries to each of the source data systems and join the result sets . Engine will interact with Lexicon. Options for report display include a Logi Analysis Grid (depending on number of records returned.) or a link to download a file. Access may be provided through Ad Hoc report portal. Results Shaker3,4

Reporting: Aggregate Linked Data Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregate Linked Data3 DOE SCHEV VEC Source Data There will be prebuilt reports for linked data from the different sources (e.g., DOE to SCHEV, SCHEV to VEC). The prebuilt reports may provide the user with some capabilities to perform analysis on the data (e.g., crosstabbing, grouping, filtering, etc.) Prebuilt Reports1,2 User ETL1,2 ETL process will periodically pull source data and load aggregate data tables. The tool used for the ETL process may be SSIS or LogiETL. . Data access through Stored Procedures which will handle data suppression. HTTP Record Level Linked Data Direct DB Connection SLDS Portal Portal1 Prebuilt Reports will be displayed within iFrames in Portal. Public Reports SLDS Portal

Data Security SLDS Portal Reporting Workflow Lexicon Shaker

Lexicon Defined For Our Purposes: The Lexicon is an inventory of every available data field in every available data source, the structure of their storage, the possible values and meanings of the information stored, all possible transformations of each set of field values to another set of field values, methods of data source access, and matching algorithms and how they are to be used in conjunction with possible field value transformations. Transformations & Matching Algorithms

Lexicon Maintenance To maintain accuracy and manage extensibility, the linking module will process all data sources periodically at a predetermined time/interval looking for: Changes in data ranges ( a new code was added for race/ethnicity ) New fields (more data, more data, more data!) Anything else that would disrupt the probabilistic matching or provide more ways to slice and dice the data Anomalies found by the linking module will prompt an alert for a system administrator to modify the matching algorithm or add query choices For new sources, or those with known common fields/links, this would be the method of entry

Shaker SLDS Portal Reporting Workflow Lexicon Shaker Security Security Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

Lexicon – Shaker Process User Interface/ Portal/ LogiXML Lexicon Common IDs [deterministic] or Common Elements with appropriate Transforms, Matching Algorithms and Thresholds [probabilistic] Shell Database Query Building Process (Pre-Authorization) ? Field Name Meta data A 10101101010100110110 B 01010111001010010110 C 01101010100101010110 Field Name Meta data A 10101101010100110110 B 01010111001010010110 N 01101010100101010110 Field Name Meta data k 10101101010100110110 b 01010111001010010110 n 01101010100101010110 Sample Data Workflow Manager DS 1 Linking Control A linking engine process will update the Lexicon periodically to allow query building on known available matched data fields. No data is used in this process. Queries are built on the relationships between data fields in the Lexicon. DS 2 Data Access Control Sub-Query Optimization Hashed ID Matrix Authorized Query DS 3 Query Results

Joining Sub-Queries on Hashed-IDs Data Security SLDS Portal Reporting Workflow Lexicon Shaker Joining Sub-Queries on Hashed-IDs Add’l Data Sources Possible Connection using Web Service – creates Web Services Data Source (Oracle) - enables application and data integration by turning external web service into an SQL data source, making external Web services appear as regular SQL tables. This table function represents the output of calling external web services and can be used in an SQL query. Possible Connection using Homogeneous link between Oracle DBs – establish synonyms for global names of remote objects in the distributed system so that the Shaker can access them with the same syntax as local objects Possible Connection using Heterogeneous link using available Transparent Gateway or Generic ODBC/OLE Sub-query processing priority will be determined for each query to minimize unnecessary data transfer (e.g. not downloading unmatched records unless specifically requested) to optimize join performance – see Query Sub-Process Optimization Matched Hash ID Values The SLDS server will match records from different agencies using the Hash ID After records are matched, the SLDS server will delete the Hash ID values and replace them with randomly generated unique IDs. November 10, 2018

Sub-Query Process Optimization Data Security SLDS Portal Reporting Workflow Lexicon Shaker Sub-Query Process Optimization Agency Creates Hash-IDs DS 1 DS 2 DS 3 1st DS to query is DS with least count using specified criteria Query 1st DS using today’s key Returns set with hashed IDs 2nd DS to query is DS with next least count using specified criteria (if Inner Join) Query 2nd DS using today’s key AND hashed-ID list from 1st DS Get COUNTS from each DS Web Service for each set of limiting criteria Parse Sub-Queries Run 1st Sub-Query Join Sub-Queries on Hashed ID Create Hash-Key Derive JOIN Criteria from Lexicon - Common IDs [Deterministic] or Common Elements with appropriate Transforms, Matching Algorithm and Thresholds [probabilistic] Run 2nd Sub-Query Query Results Lexicon Query

Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

Data Architecture DS 1 DS 2 DS 3 DS 1 ETL1 VITA (CESC) Security SLDS Portal Reporting Workflow Lexicon Shaker DS 1 DS 2 DS 3 Contains DBs for Shaker, Ad Hoc metadata, logging, auditing, etc. Database for Shaker process and that temporarily stores linked record level data. The temporary tables will be dropped after a set period of time. For canned reports, Stored Procedures will be used for data querying and suppression. DS 1 ETL1 VITA (CESC) Metadata and Security1 Workflow Lexicon Shell DB Shaker/ Deidentified Record Level Data2 Aggregate Linked Data SPs3 Workflow Lexicon UI / Admin Record Level Query / Reports Aggregate Linked Reports SLDS Portal

Physical Infrastructure

Physical Infrastructure Shaker – Production Env. (CESC)

SLDS Components Matrix Custom / COTS Suggested Product Portal Custom Security Authentication COV AUTH Authorization Mixed Workflow COTS MS Dynamics Reports Public Facing Logi Info Query Building Logi Ad-Hoc Lexicon Shaker Extract, Transform & Load Logi ETL, SSIS or Informatica Distributed Query Engine (DQE) Custom or COTS Syncsort, Informatica or Custom

Questions?

Back-Up Slides

Security Data security enforced by/at …. Authentication Authorization SLDS Portal Reporting Workflow Lexicon Shaker Authentication COV AUTH Authorization Role Based Anonymous User Named User System Administrator Agency Employee Researcher Permissions Workflow Reports (Suppressed and Non-Suppressed) Query Building Tool Lexicon Data elements User Account Management Data security enforced by/at …. Portal Lexicon Viewing Editing Reports Suppressed Data Non-Suppressed Data Workflow Data Database Table Column