Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1.

Slides:



Advertisements
Similar presentations
SST Webinar SLDS Webinar1 The presentation will begin at approximately 3:00 p.m. ET Information on joining the teleconference can be found on the “Info”
Advertisements

Voter Registration and Privacy Barbara Simons. False Positives Financial Services Technology Consortium credit card fraud analysis –500,000 samples, 100,000.
W ASHINGTON S TATE E DUCATION R ESEARCH & D ATA C ENTER, O FFICE OF F INANCIAL M ANAGEMENT 2014 ERDC ARRA SLDS Grant Conference | May 21, 2014 G OVERNOR.
April 2010 Minnesota’s P-20 Statewide Longitudinal Education Data System (SLEDS) Minnesota Office of Higher Education.
W ASHINGTON S TATE E DUCATION R ESEARCH & D ATA C ENTER, O FFICE OF F INANCIAL M ANAGEMENT ERDC Conference May 21, 2014 ERDC Conference | May 21, 2014.
Where does data come from and how it is used
NCES Forum Tech Committee July 2010 Presented by: Kathy Gosa Kansas State Department of Education.
Regarding the Use of SSNs in Education and Related Data Systems.
M AY 21, 2014 I DENTITY M ATCHING : SSN S ARE NOT ENOUGH ! J OHN S ABEL ERDC ARRA SLDS Conference.
Washington State: Collaboration Efforts Amongst Agencies DQI June 7, 2012 Presented by: Phouang Sixiengmay Hamilton Bill Huennekens.
2012 SLDS P-20W Best Practice Conference 1 E NSURING D ATA G OVERNANCE A CROSS THE P-20W S PECTRUM Tuesday, October 30, 2012 Melissa Beard, Data Governance.
PENN Community Project SUG Presentation April 8, 2002.
Alliance for Strategic Technology (AST) SUNY Business Intelligence Initiative January 8, 2009.
Data Governance Panel Facilitator: Melissa Beard
Project Manager: Arthur Kenjora IT Programs and Projects Dept. Arizona Department of Education 1535 West Jefferson Street, Phoenix, AZ Office:
University of Arizona Cooperative Extension, Yavapai County Updated 1/11/2013 How to Become a Certified Master Gardener.
Public 4-Year Dashboard and the Usability Lab Melissa Beard (OFM) and Wendy Wickstrom (DES) ERDC Conference May 21, 2014.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Digital Identity Management Strategy, Policies and Architecture Kent Percival A presentation to the Information Services Committee.
Data Collections Demystified Amy McLaughlin Director of IT Support Services Oregon Department of Education.
Agenda 1. Definition and Purpose of Data Governance
P20 & CTE – The Perfect Match NACTEI Conference May 17, 2012 Presented by: Phouang Sixiengmay Hamilton Carol Jenner.
CORPORATE SERVICES Empowering Students For A Lifetime Of Success.
NYMTC Strategic Data Management Kuo-Ann Chiao Technical Group Director.
Presented by: Kathy Gosa Andrea Hall Kansas State Department of Education 26 th Annual Management Information Systems (MIS) Conference February 14, 2013.
DATA GOVERNANCE Presentation to CSG September 27, 2007 Mary Weisse Manager, MIT Data & Reporting Services
What is SQL and Who uses it? Presented by: John Deardurff Global McOWL Internal Sales Training October 24, 2014.
Monthly APCD User Workgroup Webinar April 22, 2014.
U.S. Department of Education Privacy Initiatives Kathleen M. Styles Chief Privacy Officer U.S. Department of Education April 18, 2011.
Research and Planning Commission 2012Conference November 9, 2012 Katie Weaver Randall Education Research and Data Center Office of Financial Management.
Roles and Responsibilities
Interoperability Updates -National Interoperability Roadmap 8/20/2014 Erica Galvez, ONC Interoperability Portfolio Manager.
Washington’s Education Research & Data Center (ERDC) Carol B. Jenner SHEEO/NCES Network Conference May 2009.
Crystal Reports and Circulation Workflow Margie Fiels  Head, Access Services Bob Gerrity  Head, Systems Boston College Libraries.
Longitudinal Data Systems: What Can They Do for Me? Nancy J. Smith, Ph.D. Deputy Director Data Quality Campaign November 30, 2007.
Statewide Unit Record Databases in Higher Education: Growth and Application Peter Ewell National Center for Higher Education Management Systems (NCHEMS)
What is the Riser Process? The process and assignment of students into their appropriate school with special education services offered per the student’s.
1 Free Help: State Support Team Technical Assistance Services 2012 MIS Conference February 15, 2012 Corey Chatis, State Support Team Jan Petro, CO Department.
Using Name Change and Non-Education Administrative Data to Assist in Identity Matching 26th Annual Management Information Systems (MIS) Conference February.
2013 MIS Conference 1 E NSURING D ATA G OVERNANCE A CROSS THE P-20W S PECTRUM Thursday, February 14, 2013 Melissa Beard, Data Governance Coordinator, Washington.
Delaware Child Outcomes Part C and 619 Collaboration Measuring Child and Family Outcomes July 30, 2010 Arlington, Virginia.
Developing an Enterprise-Wide Privacy and Data Security Training Program Ross T. Janssen, J.D., CIPP Privacy & Security Officer University of Minnesota.
5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN.
1 Getting Free Help: States’ Experiences with the SLDS State Support Team STATS-DC 2012 Data Conference July 11, 2012 Corey Chatis, State Support Team.
Master Data Management & Microsoft Master Data Services Presented By: Jeff Prom Data Architect MCTS - Business Intelligence (2008), Admin (2008), Developer.
OBTAINING WIOA COMMON MEASURES BEFORE AND AFTER WDQI Strengthening Washington workforce development data.
Washington’s Education Research & Data Center 26 th Annual Management Information Systems Conference Concurrent Session I-B: Using a Research Center or.
1 P-20W Identity Management November 16, :15 – 12:15 Bob Swiggum, GA Bill Hurwitch, ME Cathy Wagner, MN.
Presentation to Indiana Career Council Presented by: Scott B. Sanders, Commissioner Indiana Department of Workforce Development August 19, 2013.
Arizona Department of Education in partnership with Arizona State University Using Technical Skill Attainment Data.
Cross-Sector Policy Research and Analysis Darby Kaikkonen, Policy Research Associate, SBCTC SHEEO Meeting on Effective Utilization of Postsecondary Data.
The Minnesota State Colleges and Universities System is an Equal Opportunity employer and educator. Data for Effective Educational Policy Decision-Making:
Welcome to RPC!. A brief history of RPC… Formerly known as WARP (Washington Association for Research and Planning – no formerly recognized structure.
State Higher Education Executive Officers Multi-State Longitudinal Data Exchange WICHE’s 4-State Pilot 2011 National Forum – PPI Committee.
Subjects of the presentation:  Microsoft Business Solutions–Navision – integrated business solutions  Navision architecture  Product highlights  Security.
P-20W Statewide Longitudinal Information System: Looking toward the Future… Research Coordination Committee December 11, 2015.
Roland Gamache, Ph.D., MBA Director, State Health Data Center Indiana State Department of Health.
Magnific training India USA : , Online | classroom| Corporate.
Data Policy Politics K16 Data Issues  Clear purpose for the system, the content for the data (standards) and where it can be located  Adequate unit-level.
When to share and not to share information
Linking information for better lives in Connecticut
American Samoa taifita solomona
Tennessee Longitudinal Data system (TLDS)
Tennessee Longitudinal Data system (TLDS)
Public Education Department
American Samoa taifita solomona
National Center for Higher Education Management Systems (NCHEMS)
HLN Consulting, LLC® November 8, 2006
Evidence-Based Policymaking: The Case from Washington State, USA
Presentation transcript:

Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21,

AGENDA Vision (Marc Baldwin) Identity matching (John Sabel) Data Governance (Melissa Beard) Questions 2

IDENTITY MATCHING 3

4 Protecting Personally Identifiable Information (PII) Step 1: Isolate PII data from all other data Link PII data in isolated environment to create linking IDs. Perform data analysis linked data in different environment. This environment has no PII data. Step 2: Redact data FERPA (Family Educational Rights and Privacy Act) requirements. Subject to data sharing agreements.

5 P-20 Data Warehouse Inputs through Outputs Personally Identifiable Information (PII) is encapsulated in the MDM Oracle database. Operational Data Store (SQL Server Database) Operational Data Store (SQL Server Database) Sector Data Providers DEL OSPI SBCTC PCHEES ESD DRS NSC L&I Sector Data Providers DEL OSPI SBCTC PCHEES ESD DRS NSC L&I Master Data Manageme nt (MDM) (Oracle Database) Master Data Manageme nt (MDM) (Oracle Database) PII Data Linked IDs Only (PII Data Stripped) Input Identity Matching Data Store Output (Business Intelligence) Stars Cubes Data Sets Non-PII Data (Bulk of Data) 5

6 Identity Matching Challenges Most of education data involves deduplication (i.e. consolidation). Between sources of data, varying number and quality of common identifiers. Public post-secondary instruction data has SSNs but K12 data does not. Idiosyncrasies in data For example, Jan 1 st birth dates are often used when the birth day and month are unknown. Twins

7 Identity Matching Challenge Matrix Many Common Identifiers (Easy) Few Common Identifiers (Hard) Linking Two Data Sources* (Easy) Easy 2 Hard x Easy Deduplicating One Data Source** (Hard) Easy x HardHard 2 * Example: Linking birth certificate data to hospitalization data. ** Example: Post-secondary instruction data. A single student can be enrolled in multiple colleges, both longitudinally (over time) as well as at the same time.

8 Identifiers in the Perfect World SELECT K12.*, College.* FROM K12 INNER JOIN College ON K12.Bulletproof_Surefire_Global_Student_ID = College.Bulletproof_Surefire_Global_Student_ID

9 Identifiers in the Other Perfect World SELECT K12.*, College.* FROM K12 INNER JOIN College ON K12.SSN = College.SSN Note: Every student has a valid, properly assigned SSN.

10 Addressing Identity Matching Challenges Deduplicate each data source first You then can take advantage of source specific identifiers. For example, K12 data has the State Student Identifier (SSID). Merge deduplicated data source with the rest of the data warehouse.* * This is itself a deduplication process.

11 Identity Matching Opportunities Use name change data For example, DOH marriage and divorce data.* * As of 2012, marriage and divorce contains inferred name changes for females only.

12 Identity Matching Mechanics First, deterministically deduplicate data Always strive first to minimize false positives and then try to minimize false negatives. These matches are then auto-merged. Second, use probabilistic techniques to auto-merge additional data Last, use probabilistic techniques to create manual review sets These are selectively merged.

DATA GOVERNANCE 13

ERDC Data Governance No data warehouse without data governance Rules of engagement Goal: Link data so it can be shared Data contributors Data sharing policy workgroup Defined set of tasks Temporary Small group of problem-solvers 14

P-20W D ATA G OVERNANCE C OMMITTEE S TRUCTURE 15 Office of Financial Management Education Research & Data Center (ERDC) Data Stewards Committee Experts directly familiar with data from their agency used in research. Data Custodians Committee Technical experts responsible for the technical delivery of data to and from the warehouse. Research Coordination Committee Policy experts who interact with agency decision-makers, stakeholders, and researchers. ERDC Guidance Committee Agency directors or deputies from agencies contributing data

CONTACT INFORMATION Marc Baldwin, OFM Assistant Director, Forecasting John Sabel, Education Research Analyst Melissa Beard, Data Governance Coordinator