2012 SLDS P-20W Best Practice Conference 1 F EDERATED & C ENTRALIZED M ODELS Tuesday, October 30, 2012 Facilitator: Jim Campbell (SST), Jeff Sellers (SST),

Slides:



Advertisements
Similar presentations
Animal Tracking Component Private and State Animal Tracking Databases (ATD) Animal Trace Processing System (ATPS) National Animal Identification System.
Advertisements

Project L.O.F.T. Report May 2007 through October 2007 Creating a design to meet stakeholder desires and dissolve our current set of interacting problems.
Effective Contract Management Planning
Copyright 2009 bSolv. All rights reserved Citizen360 Product Overview Introduction to the Citizen360 product suite Version 1.4.
SST Webinar SLDS Webinar1 The presentation will begin at approximately 3:00 p.m. ET Information on joining the teleconference can be found on the “Info”
SST Webinar SLDS Webinar 9/29/ The presentation will begin at approximately 2:00 p.m. ET Information on joining the teleconference can be found on.
GUIDED FORUM ON INTERSECTORAL ACTION Communities’ experiences in developing intersectoral actions How to go further? Results of the guided forum January.
Cross-Jurisdictional Immunization Data Exchange Project Updated 4/29/14.
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
The Business Solutions Professional Approach: Connecting Extension to Workforce and Economic Development Presentation to members of the North Central Regional.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
2012 SLDS P-20W Best Practice Conference 1 V IRGINIA L ONGITUDINAL D ATA S YSTEM (VLDS) Tuesday, October 30 th 2012 Will Goldschmidt, VITA.
2012 SLDS P-20W Best Practice Conference 1 E NSURING D ATA G OVERNANCE A CROSS THE P-20W S PECTRUM Tuesday, October 30, 2012 Melissa Beard, Data Governance.
University of Minho School of Engineering Centre Algoritmi Uma Escola a Reinventar o Futuro – Semana da Escola de Engenharia - 24 a 27 de Outubro de 2011.
Data Management Awareness January 23, University of Michigan Administrative Information Services Data Management Awareness Unit Liaisons January.
Database Management: Getting Data Together Chapter 14.
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
Professor Michael J. Losacco CIS 1150 – Introduction to Computer Information Systems Databases Chapter 11.
MARCH 2010Developed by Agency Human Resource Services, DHRM1 Organizational Design What Is It? Organizational Design is the creation of roles, processes,
2013 Management Information Systems Conference 1 L ABOR & E DUCATION D ATA : S UCCESS S TORIES Wednesday, February 13, 2013 Carol Jenner, Washington State.
April 2, 2013 Longitudinal Data system Governance: Status Report Alan Phillips Deputy Director, Fiscal Affairs, Budgeting and IT Illinois Board of Higher.
Navigating the Maze How to sell to the public sector Adrian Farley Chief Deputy CIO State of California
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
2013 MIS Conference 1 F EDERATED AND C ENTRALIZED M ODELS Wednesday, February 13, 2013 Facilitator: Jeff Sellers (SST) Panelists: Charles McGrew, Kentucky.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
Privacy and Security in the VLDS. 2 Commonwealth Security Benefits (Intended) Confidence in the integrity of the data and the systems processes Assistance.
2012 SLDS P-20W Best Practice Conference 1 M ANAGING V ENDOR R ELATIONSHIPS Monday, October 29, 2012 Facilitator: Jim Campbell (SST) Panelists: John Brandt,
2012 SLDS P-20W Best Practice Conference 1 R ESEARCHER A CCESS TO THE SLDS Monday, October 29, 2012 Kathy Gosa, Kansas Department of Education Bethann.
Information Sharing Challenges, Trends and Opportunities
How Can NRCS Clients Use the Conservation Client Gateway
1 What’s Next for Financial Management Line of Business (FMLoB)? AGA/GWSCPA 6 th Annual Conference Dianne Copeland, Director, FSIO May 8, 2007.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
EPASS - Overview November 2007 eWiSACWIS Production Access Security System.
FMCSA Tools To Improve State Data Quality Presented By: Shaun Dagle.
2012 SLDS P-20W Best Practice Conference 1 D EVELOPING AND U SING P-20W L ONGITUDINAL R EPORTS Monday, October 29, 2012 Carol Jenner, Washington State.
State of North Dakota Information Technology Department (ITD) July 2012.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Name Position Organisation Date. What is data integration? Dataset A Dataset B Integrated dataset Education data + EMPLOYMENT data = understanding education.
2012 SLDS P-20W Best Practice Conference 1 D ESIGNING S YSTEMS FOR E XPANSION AND E XCHANGE INTO O THER S ECTORS Tuesday, October 30, :00 – 3:00.
SLDS State Support Team Webinar 1 Linking K12 and Early Childhood Data: A State Example from Kentucky The webinar will begin at approximately 11:00 AM.
SYSTEM TESTING AND DEPLOYMENT CHAPTER 8. Chapter 8: System Testing and Deployment 2 KNOWLEDGE CAPTURE (Creation) KNOWLEDGE TRANSFER KNOWLEDGE SHARING.
Arkansas Research Center Neal Gibson, Ph.D. Greg Holland, Ph.D.
1 P20 Confidentiality & Data Sharing Agreement Wednesday, November 16, 2011.
1 Standard Student Identification Method Jeanne Saunders Session 16.
2012 SLDS P-20W Best Practice Conference 1 A PPROACHES TO A GENCY R ESTRUCTURING T OWARD C OMMON G OALS Tuesday, October 30, 2012 Facilitators: Robin Taylor.
National Enrolment Service (NES) Overview October 2015 – June 2016.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Chris Apgar, CISSP President, Apgar & Associates, LLC December 12, 2007.
Virginia’s Longitudinal Data System A Federated Approach to Longitudinal Data April 4 th, 2011.
Kate Akers Ph.D. Deputy Executive Director The Kentucky Center for Education and Workforce Statistics BOY July 25……...………….Session Begins 9:50 ET.
August 14-15, 2003 Crystal Gateway Marriott Arlington, VA Software Developers Conference.
Database Relationships Objective 5.01 Understand database tables used in business.
HIPAA Compliance Case Study: Establishing and Implementing a Program to Audit HIPAA Compliance Drew Hunt Network Security Analyst Valley Medical Center.
6/13/2016 U.S. Environmental Protection Agency 1 Starting a Facilities Flow Lee David
The Common Assessment Framework (CAF) & Lead Professional (LP)
1 Auditing Your Fusion Center Privacy Policy. 22 Recommendations to the program resulting in improvements Updates to privacy documentation Informal discussions.
Data Policy Politics K16 Data Issues  Clear purpose for the system, the content for the data (standards) and where it can be located  Adequate unit-level.
Getting started with Accurately Storing Data
What is the Best Way to Select an EHR
Pre-planning Planning to plan (and adapt) Implementation starts Here!
Modern Systems Analysis and Design Third Edition
Lecture 6. Information systems
25th Annual STATS-DC 2012 Data Conference - - Virginia Longitudinal Data System (VLDS) July 12th, 2012.
Welcome: How to use this presentation
Tennessee Longitudinal Data system (TLDS)
Modern Systems Analysis and Design Third Edition
Modern Systems Analysis and Design Third Edition
Chapter 17 Designing Databases
Presentation transcript:

2012 SLDS P-20W Best Practice Conference 1 F EDERATED & C ENTRALIZED M ODELS Tuesday, October 30, 2012 Facilitator: Jim Campbell (SST), Jeff Sellers (SST), Keith Brown (SST) Panelists: Charles McGrew, Kentucky P-20 Data Collaborative Mimmo Parisi, National Strategic Planning & Analysis Research Center (nSPARC) Neal Gibson, Arkansas Research Center Aaron Schroeder, Virginia Tech

2012 SLDS P-20W Best Practice Conference - Background/Overview -Rationale for choosing a model -Infrastructure and design -Data Access -Additional information A GENDA 2

2012 SLDS P-20W Best Practice Conference B ACKGROUND /O VERVIEW 3

2012 SLDS P-20W Best Practice Conference Model: Centralized data warehouse that brings the data into a neutral location (third-party) where they are linked together then de-identified so no agency has access to any other agency’s identifiable data. K ENTUCKY 4

2012 SLDS P-20W Best Practice Conference Model: Virginia is a case study in the difficulties of combining data from multiple agencies while remaining in compliance with federal and state-level privacy requirements Traditional Data Integration Issues Public Sector Specific Integration Issues Virginia Specific Issues V IRGINIA 5

2012 SLDS P-20W Best Practice Conference Implementation Environment Public Sector Statutory and Regulatory Heterogeneity Multiple levels of statutory law Multiple implementations of regulatory law at each level of statutory law Most conservative interpretation of regulatory law becomes de facto standard V IRGINIA 6

2012 SLDS P-20W Best Practice Conference Arkansas Research Center A RKANSAS 7

2012 SLDS P-20W Best Practice Conference A RKANSAS 8

2012 SLDS P-20W Best Practice Conference Knowledge Base Approach: All known representations are stored to facilitate matching in the future and possibly resolve past matching errors. A RKANSAS 9

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 10

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 11 CULTURE OF COOPERATION STRUCTURAL & TECHNICAL CAPACITY PERFORMANCE-BASED MANAGEMENT

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 12 SLDS CONCEPTUAL MODEL

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 13 DATA WAREHOUSE MODEL

2012 SLDS P-20W Best Practice Conference R ATIONALE FOR C HOOSING A M ODEL 14

2012 SLDS P-20W Best Practice Conference Not all agencies we want to participate have data warehouses and could participate in a federated model. Upgrades and infrastructure changes in participating agencies would interrupt “service”. Political concerns about changing leadership that may allow agencies to simply stop participating. Centralizing data allows for shared resources and cost savings over more “silo” data systems. Most agencies lack research staff to utilize the system and there was a desire to centralize some analyses. The de-identified model satisfies agency legal concerns. The ability to reproduce numbers over time. K ENTUCKY 15

2012 SLDS P-20W Best Practice Conference Implementation Environment and Virginia Specific Limitations: Structural Decentralized authority structure in potential partner agencies (e.g. health, social services) resulting in different data systems, standards, and data collected Legal VA § : Government Data Collection and Dissemination Practices Act VA § Restricted use of social security numbers Assistant Attorneys General interpretations “No one person, inside or outside a government agency, should be able to create a set of identified linked data records between partner agencies” V IRGINIA 16

2012 SLDS P-20W Best Practice Conference Consolidated Data Systems (Warehouse) Can be very expensive (to both build and maintain) Too difficult to embody (program) the multiple levels of federal and state statutory and regulatory privacy requirements – must have laws in place to allow for centralized collection Lack of clear data authority, per data system, between state agencies and between state and local-level agencies – participation is not compulsory Federated Data Systems System that interacts with multiple data sources on the back-end and presents itself as a single data source on the front-end The key to linking up the different data sources is a central linking apparatus Allows for the maintenance of existing privacy protection rules and regulations Can significantly reduce application development time and cost V IRGINIA 17

2012 SLDS P-20W Best Practice Conference A RKANSAS 18

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 19

2012 SLDS P-20W Best Practice Conference I NFRASTRUCTURE AND D ESIGN 20

2012 SLDS P-20W Best Practice Conference Agencies provide data on a regular schedule and have access to the de-identified system through a standard reporting tool. “Master Person” record matching process that becomes “better” over time by retaining all the different versions of data for matching. Staging environment where data are validated and checked. K ENTUCKY 21

2012 SLDS P-20W Best Practice Conference K ENTUCKY 22

2012 SLDS P-20W Best Practice Conference V IRGINIA - W ORKFLOW 23

2012 SLDS P-20W Best Practice Conference L EXICON – S HAKER P ROCESS O VERVIEW 24 DS 1 DS 2 DS 3 Lexicon Linking Control Shaker User Interface/ Portal/ LogiXML Sub-Query Optimization ID Resolution & Query Authorized Query Query Results Common IDs [deterministic] or Common Elements with appropriate Transforms, Matching Algorithms and Thresholds [probabilistic] A linking engine process will update the Lexicon periodically to allow query building on known available matched data fields. No data is used in this process. Queries are built on the relationships between data fields in the Lexicon. Workflow Manager Lexicon Query Building Process (Pre-Authorization) ?

2012 SLDS P-20W Best Practice Conference P RIVACY P ROTECTING F EDERATED QUERY 25 Two Steps: 1.Identity Resolution Process 2.Query Execution Process

2012 SLDS P-20W Best Practice Conference How De-Identification Works in VLDS System V IRGINIA 28 Agency 1 Data Source Agency 2 Data Source VLDS DATA ADAPTER DATA ADAPTER Agency Firewalls The Data Adapters prepare the data (including de-identification) before leaving the agency firewall.

2012 SLDS P-20W Best Practice Conference What the Data Adapter Does: G ETTING D ATA R EADY FOR “D E -I DENTIFIED F EDERATION ” 29 De’Smith-Barney IV De’Smith-Barney DeSmithBarney Remove Suffix Remove Symbols De’Smith-Barney IV YDXWKQTAGOLCNSVEFHMRJPBZUI b164f11d-aa37-44ca-93c3-82d3e C57S78XCEBF9WECP2AA9DK59N1CO27QBES54HFD DeSmithBarney CSXBWPAKNOQSH C57S78XCEBF9WECP2AA9DK59N1CO27QBES54HFD Substitution Cipher Order Hashing Substitution Alphabet – Dynamically Generated using Hashing Key (below) Hashing Key – Dynamically Generated and sent from HANDS CLEANED & ENCODED for TRANSPORT

2012 SLDS P-20W Best Practice Conference INTERNAL_ID is the same LAST_NAME is NOT What do we do? Statistical Log Analysis and Reduction We dynamically build a new “virtual” record made up of “most likely” demographics Many agencies DO NOT have an Index of unique individuals. There can be many representations of that individual. Cleaned and Encoded Matching Data (Internal ID, First and Last Name)

2012 SLDS P-20W Best Practice Conference Probabilistic Linkage Process (Creating a Linking Directory) Blocking m and u Parameter Calculation Matching-Column Weight Calculations Match Scoring Linkage Determination and addition to Linking Directory (After we have a unique person index for each agency dataset) Matching a 10,000 record set against a 1,000,000 record set = 10 BILLION possible combinations to analyze! It will take many hours/days However, we know there can only be at most 10, Big waste of time and resources Instead, we block, aka “pre-match”, on differing criteria, like last name, birth month, etc.. This usually cuts down possible combinations into the 10k-100k range – much better! We block on many different criteria combinations in case one blocking scheme misses a match (e.g. because of misspellings) The m probability (quality) -- if we have two records that we know match, then all of the paired field values should, in a perfect world, match. If some of the fields don’t match, then we have some degree of quality/reliability problem with those fields. The u probability (commonness) -- if we have two records that we know don’t match, how likely is it that the values in a particular field pairing (e.g. “City”) will match anyway (because of the relative commonness of the values)? Matching, Weighting and Scoring Weights for matches and non-matches are applied, and scores are summed DS_1_UNQ_IDDS_2_UNQ_ID LN_MAT CH FN_MAT CH BD_MAT CH BM_MAT CH BY_MAT CH G_MATC H F_MATC HSCORE 4688B2133F14C C AB3D3B29E9 2E56D712A3D06 7F72AC8F C07AD9883A CE606E5377B E06E5A8CAFB FC8AD294B 2F71159DC11FF 41501CA566BB0 9226FDA33176E F F83BEFE63B2C D834F9321CC CA45ADE FD2B0C73D 17B0BF B7D25A6C989C6 B B297C9C6B6B0 A08B624E93B66C A72A0D66FF2E 15E8F1EF7836B 73B217EF9D15F 20A3571E CE2F AE2A7D3B49D C 1C42EB0E1FBA6 D06F47C4A0D7 D5EA6C89CC21B 5B EE156D74C29D F4DDAC621F86CC 9E0BC8CA0BD6 0DED671A93276 C5BCEBC0BCBB B DD3DFB0CDEDD9 F076DCB613AC81 BC0EB0A1266ED 0D55B49A56B0F 8FA9A9C C564B4E2AB9E Linkage Determination – A Cutoff score needs to be set for each blocked comparison, below which a link is not accepted as a real “link” The best method of establishing this cutoff is for the system operator to work with a content-area expert to determine the peculiarities of data for that content-area In some data sets in may be very unlikely that a birthdate was entered incorrectly, while in another, it may happen very regularly – a computer can not automatically know this Once these cutoffs are set, they don’t need to be changed unless something drastic occurs to change the nature of the dataset

2012 SLDS P-20W Best Practice Conference Data Partner Intake Process New Agency VLDS Governance Council VT 3. Construct 1. Application 2. Proposal/Details (Change Requests) Technical Reqs

2012 SLDS P-20W Best Practice Conference TrustEd: Knowledgebase Identity Management (KIM) TrustEd Identifier Management (TIM) A RKANSAS 33

2012 SLDS P-20W Best Practice Conference TrustEd: KIM & TIM A RKANSAS 34

2012 SLDS P-20W Best Practice Conference A RKANSAS 35 TrustEd: KIM & TIM

2012 SLDS P-20W Best Practice Conference A RKANSAS 36

2012 SLDS P-20W Best Practice Conference A RKANSAS 37

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 38

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 39 has 0..n n has 1..n Organization Program Individual

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 40 SLDS ENTITY RELATIONSHIP DIAGRAM

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 41

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 42

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 43 DATA LIFECYCLE

2012 SLDS P-20W Best Practice Conference D ATA A CCESS 44

2012 SLDS P-20W Best Practice Conference Agencies have access to all the identifiable data they have provided to P-20 – but not to each other’s. Agencies and P-20 Staff have access to the de-identified production data. Shared “universes” go live in December. P-20 Staff respond to multi-agency data requests with vetting process to validate for accuracy. Access through Business Objects Web Intelligence (WebI), P-20 staff use WebI, Crystal, SSRS, SPSS, SAS, Arc-View, and other tools. Data retention as needed. Long-term retention to be determined. K ENTUCKY 45

2012 SLDS P-20W Best Practice Conference Centralized Reporting for Cross-Agency Issues K ENTUCKY 46 High School Feedback Adult Education Feedback Employment Outcomes and Earnings County Profiles Workforce and Training Outcomes

2012 SLDS P-20W Best Practice Conference V IRGINIA 47

2012 SLDS P-20W Best Practice Conference V IRGINIA 48

2012 SLDS P-20W Best Practice Conference A RKANSAS 49 TrustEd: KIM & TIM

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 50 DATA ACCESS

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 51

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 52

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 53

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 54

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 55

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 56

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 57

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 58

2012 SLDS P-20W Best Practice Conference M ISSISSIPPI 59

2012 SLDS P-20W Best Practice Conference Contact information: Charles McGrew, Aaron Schroeder, Neal Gibson, Mimmo Parisi, Jim Campbell, Jeff Sellers, Keith Brown, Resources: C ONTACTS & A DDITIONAL R ESOURCES 60