Presentation is loading. Please wait.

Presentation is loading. Please wait.

Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1.

Similar presentations


Presentation on theme: "Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1."— Presentation transcript:

1 Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1

2 AGENDA Vision (Marc Baldwin) Identity matching (John Sabel) Data Governance (Melissa Beard) Questions 2

3 IDENTITY MATCHING 3

4 4 Protecting Personally Identifiable Information (PII) Step 1: Isolate PII data from all other data Link PII data in isolated environment to create linking IDs. Perform data analysis linked data in different environment. This environment has no PII data. Step 2: Redact data FERPA (Family Educational Rights and Privacy Act) requirements. Subject to data sharing agreements.

5 5 P-20 Data Warehouse Inputs through Outputs Personally Identifiable Information (PII) is encapsulated in the MDM Oracle database. Operational Data Store (SQL Server Database) Operational Data Store (SQL Server Database) Sector Data Providers DEL OSPI SBCTC PCHEES ESD DRS NSC L&I Sector Data Providers DEL OSPI SBCTC PCHEES ESD DRS NSC L&I Master Data Manageme nt (MDM) (Oracle Database) Master Data Manageme nt (MDM) (Oracle Database) PII Data Linked IDs Only (PII Data Stripped) Input Identity Matching Data Store Output (Business Intelligence) Stars Cubes Data Sets Non-PII Data (Bulk of Data) 5

6 6 Identity Matching Challenges Most of education data involves deduplication (i.e. consolidation). Between sources of data, varying number and quality of common identifiers. Public post-secondary instruction data has SSNs but K12 data does not. Idiosyncrasies in data For example, Jan 1 st birth dates are often used when the birth day and month are unknown. Twins

7 7 Identity Matching Challenge Matrix Many Common Identifiers (Easy) Few Common Identifiers (Hard) Linking Two Data Sources* (Easy) Easy 2 Hard x Easy Deduplicating One Data Source** (Hard) Easy x HardHard 2 * Example: Linking birth certificate data to hospitalization data. ** Example: Post-secondary instruction data. A single student can be enrolled in multiple colleges, both longitudinally (over time) as well as at the same time.

8 8 Identifiers in the Perfect World SELECT K12.*, College.* FROM K12 INNER JOIN College ON K12.Bulletproof_Surefire_Global_Student_ID = College.Bulletproof_Surefire_Global_Student_ID

9 9 Identifiers in the Other Perfect World SELECT K12.*, College.* FROM K12 INNER JOIN College ON K12.SSN = College.SSN Note: Every student has a valid, properly assigned SSN.

10 10 Addressing Identity Matching Challenges Deduplicate each data source first You then can take advantage of source specific identifiers. For example, K12 data has the State Student Identifier (SSID). Merge deduplicated data source with the rest of the data warehouse.* * This is itself a deduplication process.

11 11 Identity Matching Opportunities Use name change data For example, DOH marriage and divorce data.* * As of 2012, marriage and divorce contains inferred name changes for females only.

12 12 Identity Matching Mechanics First, deterministically deduplicate data Always strive first to minimize false positives and then try to minimize false negatives. These matches are then auto-merged. Second, use probabilistic techniques to auto-merge additional data Last, use probabilistic techniques to create manual review sets These are selectively merged.

13 DATA GOVERNANCE 13

14 ERDC Data Governance No data warehouse without data governance Rules of engagement Goal: Link data so it can be shared Data contributors Data sharing policy workgroup Defined set of tasks Temporary Small group of problem-solvers 14

15 P-20W D ATA G OVERNANCE C OMMITTEE S TRUCTURE 15 Office of Financial Management Education Research & Data Center (ERDC) Data Stewards Committee Experts directly familiar with data from their agency used in research. Data Custodians Committee Technical experts responsible for the technical delivery of data to and from the warehouse. Research Coordination Committee Policy experts who interact with agency decision-makers, stakeholders, and researchers. ERDC Guidance Committee Agency directors or deputies from agencies contributing data

16 CONTACT INFORMATION Marc Baldwin, OFM Assistant Director, Forecasting Marc.baldwin@ofm.wa.gov 360-902-0590 John Sabel, Education Research Analyst John.sabel@ofm.wa.gov 360-902-0943 Melissa Beard, Data Governance Coordinator Melissa.beard@ofm.wa.gov 360-902-0584 16


Download ppt "Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1."

Similar presentations


Ads by Google