M AY 21, 2014 I DENTITY M ATCHING : SSN S ARE NOT ENOUGH ! J OHN S ABEL ERDC ARRA SLDS Conference.

Slides:



Advertisements
Similar presentations
EVAULATION OF THE NSCRG SCHOOL SAMPLE Donsig Jang and Xiaojing Lin Third International Conference on Establishment Surveys Montreal, Canada, June 21, 2007.
Advertisements

Statewide Vendor Registration Instructions
Creating CSUIDs for Associates Eric Galyon ACNS
Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21,
How to Apply for an ODE License Initial School Speech-Language Pathologists License Jacqueline Wilbanks, Ph.D. College of Education, Office of Student.
W ASHINGTON S TATE E DUCATION R ESEARCH & D ATA C ENTER, O FFICE OF F INANCIAL M ANAGEMENT 2014 ERDC ARRA SLDS Grant Conference | May 21, 2014 G OVERNOR.
SST Webinar SLDS Webinar 9/29/ The presentation will begin at approximately 2:00 p.m. ET Information on joining the teleconference can be found on.
Graduate Application Project Design Concept Walkthrough
Data Collection An overview of how data are collected and used in Washington state.
Please help us protect the integrity, security and confidentiality of Privacy Act information. Social Security Numbers are confidential information and.
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Colorado Information Marketplace Overview Dianna Anderson
ADDICTIONS AND MENTAL HEALTH DIVISION Adult New Investment Quarterly Reports Wendy Chavez, MPA April 23, 2014 Developed By: Wendy Chavez, MPA Adult New.
Learn How States Are Finding “Hard-to-reach” Students for Post-school Outcome Data Collection! How the Heck Do We Contact Some of Our Former Students?
Beyond the Campus Gates: Bringing Alumni, Parents, and Prospects into the Campus Portal William P. Wilson Mark R. Albert John C. Duffy Gettysburg College.
April 2, 2013 Longitudinal Data system Governance: Status Report Alan Phillips Deputy Director, Fiscal Affairs, Budgeting and IT Illinois Board of Higher.
Direct Certification Nonpublic Schools School Year Office of School Support Services School Nutrition Programs June
Mandatory Annual ACE Training Fiscal Year 2011 – 2012.
1 IRIS Participation Referral Process June What is PPS? The Program Participation System (PPS) is a system used by the State of Wisconsin for many.
DIRECT CERTIFICATION Patricia Winders Director’s Conference July 29, 2015.
2013 MIS Conference 1 F EDERATED AND C ENTRALIZED M ODELS Wednesday, February 13, 2013 Facilitator: Jeff Sellers (SST) Panelists: Charles McGrew, Kentucky.
Maximizing and Monitoring Learner Progress for Children who are Deaf, Deafblind, and Hard of Hearing and their Families.
Federal Student Aid Identification username and password – this is how students and parents will sign the FAFSA application. The FSA ID process replaced.
StudentTracker for a service provided by CONFIDENTIAL- ©2011 National Student Clearinghouse. All rights reserved.
Mandatory Annual ACE Training Fiscal Year 2010 – 2011.
The Special Milk Program (SMP) Online Contract Non School Agencies Webcast and Manual Revised: 2/2015.
2012 SLDS P-20W Best Practice Conference 1 M ANAGING V ENDOR R ELATIONSHIPS Monday, October 29, 2012 Facilitator: Jim Campbell (SST) Panelists: John Brandt,
1 EARLY CHILDHOOD DATA SYSTEMS: ESTABLISHING A POLICY AND LEGISLATIVE CONTEXT October 2011.
States’ Plans for ECE Data Systems: Trends & Opportunities in RTTT-ELC October 28, 2012 Child & Families Outcomes Conference Albert Wat National Governors.
Research and Planning Commission 2012Conference November 9, 2012 Katie Weaver Randall Education Research and Data Center Office of Financial Management.
1. Proposal deadline 2. Timeline  A grant opportunity announcement will include a sponsor deadline for receipt of the proposal.  The instructions will.
Hans P. L’Orange State Higher Education Executive Officers October 20, 2009.
Collection of Assessment Results
On-line data submission training California Partnership for Achieving Student Success.
2012 SLDS P-20W Best Practice Conference 1 D EVELOPING AND U SING P-20W L ONGITUDINAL R EPORTS Monday, October 29, 2012 Carol Jenner, Washington State.
Graduate Degree Progress & Clearance Graduate School Office Amy Gillett and Amy Corr.
The Middle Years Development Instrument (MDI) Steps to Teacher Administration.
The Community Collaboration Coaches Roles, Strategies, and Tools.
1 Charting the Course: Smoother Data Sharing for Effective Early Childhood Transition Wisconsin’s Journey Lori Wittemann, Wisconsin Department of Health.
1 Free Help: State Support Team Technical Assistance Services 2012 MIS Conference February 15, 2012 Corey Chatis, State Support Team Jan Petro, CO Department.
SLDS State Support Team Webinar 1 Linking K12 and Early Childhood Data: A State Example from Kentucky The webinar will begin at approximately 11:00 AM.
Using Name Change and Non-Education Administrative Data to Assist in Identity Matching 26th Annual Management Information Systems (MIS) Conference February.
5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN.
Research Across Multiple Systems: Probabilistic Population Estimation (PPE) Diane Haynes, University of South Florida Rebecca Larsen, University of South.
OBTAINING WIOA COMMON MEASURES BEFORE AND AFTER WDQI Strengthening Washington workforce development data.
Washington’s Education Research & Data Center 26 th Annual Management Information Systems Conference Concurrent Session I-B: Using a Research Center or.
Amber Johnson U.S. Department of Education WVASFAA Fall 2015 Conference October 29, 2015 FSA ID: The FSA PIN Replacement.
1 Post-Secondary Security and Confidentiality November 15, :45 – 3:45 Dawn Ressel, KS Dr. Domenico "Mimmo" Parisi, MS Arron Frerichs, OR.
PRESERVING YOUR PAST AND YOUR PRESENT FOR THE FUTURE.
BENEFICIARY EARNINGS EXCHANGE RECORD (BEER) California Department of Social Services Guide To: 1.
Finding a PersonBOS Finding a Person! Building an algorithm to search for existing people in a system Rahn Lieberman Manager Emdeon Corp (Emdeon.com)
1 Transaction or Issue Clean Up. 2 Customer Protection and 814_08 Issue (Phase 2 – Potentially Late 08s) Background Completed Items Next Steps.
August 14-15, 2003 Crystal Gateway Marriott Arlington, VA Software Developers Conference.
2007 Annual Child Support Training Conference and Expo CSE DATA MANAGEMENT September 18, 2007 Presenters Rick Bermudez (DCSS) Janet Nottley (CSDA) Kim.
Children System of Care Application Process for Behavioral Assistance & Intensive In-Community Department of Children and Families, Children’s System.
THIS TRAINING IS REQUIRED IN ORDER TO OBTAIN SECURITY TO INITIATE HIRING PACKETS FOR NEW EMPLOYEES. Hire Xpress User’s Training NAU’s Automated Hiring.
Enrollment and Degree Verification Form Revised 06/2016 Process The University of Oklahoma Health Sciences Center Office of Admissions and Records Robert.
Creating and submitting Cal-PASS Data files California Partnership for Achieving Student Success.
1 SLDS & CTE: Challenges, Progress & Outcomes Tuesday, November 15, 2011 Matt Koerner (MD) Jill Kroll (MI) Matt Hastings (NE)
Department of Children and Families Care Provider Background Screening Clearinghouse.
Connecting College to Career with State Government Data
Linking information for better lives in Connecticut
Betty McGrath North Carolina North Carolina Department of Commerce
last modified 3/1/12LL->printed November 2012
Parent and Family Engagement Policy
Tennessee Longitudinal Data system (TLDS)
Family Engagement Coordinator Meeting July 25, 2018
Local Government Corporation-2018
Parent and Family Engagement Policy
Parental Involvement Policy
Presentation transcript:

M AY 21, 2014 I DENTITY M ATCHING : SSN S ARE NOT ENOUGH ! J OHN S ABEL ERDC ARRA SLDS Conference

M AY 21, 2014 A BOUT THE ERDC RCW RCW established the Education Research & Data Center (ERDC) in the Washington State Office of Financial Management (OFM). In collaboration with statutory partner agencies, representing education and employment, and the Legislative Evaluation and Accountability Program (LEAP) committee, ERDC conducts analyses of early learning, K-12, higher education programs and education and workforce issues across the P-20W system.Office of Financial Managementstatutory partner agencies,Legislative Evaluation and Accountability Program ERDC Vision To promote a seamless, coordinated preschool-to-career (P-20W) experience for all learners by providing objective analysis and information. ERDC Mission To develop longitudinal information spanning the P-20W system in order to facilitate analyses, provide meaningful reports, collaborate on education research, and share data. ERDC Values 1.Coordinate, facilitate, build upon and enhance the education data collection and analysis already being done by multiple agencies and institutions. 2.Adhere strictly to both the letter and spirit of privacy laws affecting individual student record data and be sensitive to other privacy concerns. 3.Achieve consensus wherever possible among participating agencies and institutions in determining the best data and research available to help guide the implementation of P-20W goals. 4.Conduct all business, data development and research in an open and transparent fashion (to the extent allowed by privacy laws), with the full inclusion of education agencies, organizations, and institutions as well as legislative participants.

M AY 21, 2014 A BOUT THE P20W D ATA W AREHOUSE The ERDC is the owner and user of the State of Washington’s P20W Data Warehouse. The system is hosted by the Department of Enterprise Services. The P20W Data Warehouse is a statewide longitudinal data system that includes de-identified data about people's early childhood, Kindergarten through 12 th grade, higher education and workforce experiences and performances (hence the name P20W). The data are collected and linked from existing state agency data systems. It includes data about the kinds of services they receive, programs in which they participate, and their academic performance and program or degree completion. It also includes a variety of demographic data so we are able to look at a variety of different groups of people. Personally identifiable information, such as names, social security numbers, addresses, and other data which can identify a person as an individual, are not part of the research database.

M AY 21, 2014 IF SSNS W ERE P ERFECT AND U BIQUITOUS … SELECT K12.*, College.* FROM K12 INNER JOIN College ON K12.SSN = College.SSN = K12.SSN

M AY 21, 2014 SSNS ARE N OT P ERFECT People’s actual SSN can be different from the recorded SSN for any number of reasons: Transcription error. Wrong SSN recorded. For example a parent filling in their own SSN for their child’s Running Start application. Intentionally filling in an incorrect SSN on a form.

M AY 21, 2014 M ULTIPLE N UMBER OF SSN S PER P20ID In the ERDC P20W data warehouse, sometimes individual P20IDs (unique person IDs) have more than one SSN:

M AY 21, 2014 M ULTIPLE N UMBER OF P20ID S PER SSN Conversely, some SSNs are shared by more than one P20ID:

M AY 21, 2014 W AYS TO A DDRESS I MPERFECT SSN S ERDC is utilizing or developing a number of ways of to validate/invalidate SSNs. Frequency and use analysis of P20IDs and SSNs in the P20W data warehouse. Comparison of the last 4 digits of SSNs with Department of Licensing data. Using Social Security Administrations Death Master File and Washington Department of Health Death Names file to find SSN group/area numbers, first 5 digits of SSNs, that have never been issued. Using Social Security Administrations High Group list to find when SSN group/area number have been issued. Data readily available only from November 2003 to June 24, On June 25, 2011, SSN randomization began.

M AY 21, 2014 SSN S ARE NOT UBIQUITOUS Even if SSNs were perfect, less than half the P20IDs in the P20W data warehouse have them:

M AY 21, 2014 A NY IDENTIFIER HAS SIMILAR PROBLEMS. S O WHAT TO DO ? Along with SSNs, any “global” identifiers will have some or all of these problems. So what to do? Add additional identifiers for identity matching: First, middle last names, Birth date Gender School/college codes District codes All said though, SSN really is an excellent identity matching variable to have.

M AY 21, 2014 U SING A L ARGER S ET OF I DENTIFIERS FOR I DENTITY M ATCHING Identity matching is split into three phases: 1.Deterministic matching, automerge: Always strive first to minimize false positives and then try to minimize false negatives. Matches are automatically matched and merged. 2.Probabilistic matching, automerge: Additional matches are matched and merged. 3.Probabilistic matching, manual merge: Additional matches are manually reviewed, and then selectively matched and merged.

M AY 21, 2014 D ETERMINISTIC M ATCHING, A UTOMERGE E XAMPLE * Collapsed DOB is a birth date that has been transformed so that birth dates that have the same birth year, but inverted birth months and days have the same value. Set of all true positive matches

M AY 21, 2014 M ANUAL R EVIEW OF P OTENTIAL MATCH PAIRS Potential match pairs are brought into Excel for manual review. Cell pairs that are not alike are color coded. Red means the cells are different. Yellow means one cell has no data. Each potential match pair is classified in the “Class” variable according to similarities in the different identifiers. This allows the potential match pairs to be sorted (Invented, example data)

M AY 21, 2014 O THER M ETHODS TO I MPROVE M ATCH R ATE ERDC uses several other techniques to improve the match rate: Rigorous standardization of all name fields. Bringing into manual review additional fields such as school history over time. County affinity matrices. Use of name change data.

M AY 21, 2014 H OW TO O BTAIN P-20W D ATA ERDC Data Request Process Please go to the ERDC’s “Accessing P-20W Data” page at: 1. Fill out the Data Request Form send to ERDC 2. ERDC calls requestor to clarify request if necessary 3. If request is changed, ERDC will send changes to requestor for approval 4. ERDC sends the data request that includes study questions and data requested to data contributors 5. Data contributors have 5 days to review and respond to requestor about the data requested 6. Requestor works with ERDC to revise request based on feedback, if necessary 7. ERDC creates a data sharing agreement with requestor to share the linked, de-identified data a. Copy of signed DSA will be made available by ERDC via the website or 8. ERDC works to get the data to requestor 9. Requestor works with the data and contacts data contributors with questions about their data 10. Requestor sends draft report to ERDC for distribution to data contributors. 11. Data contributors have 10 days to review report and respond to requestor with comments about use of data 12. Requestor releases report

M AY 21, 2014 C ONTACT THE ERDC ERDC Website ERDC Mailing Address P.O. Box Olympia, WA ERDC Phone/Fax Phone: (360) Fax: (360) John Sabel