Achieving Better Data Quality: Reducing Duplicate Records Sandra Schulthies, M.S. Yukiko Yoneoka, M.S. Barry Nangle, Ph. D. 11/20/2018
Many Sources of Data Vital Records Public Health Clinics WIC Private Provider Billing Systems WebKIDS application
Data Load/Match Procedure First Name Middle Name Last Name Date of Birth Mother’s First Name Mother’s Maiden Name Social Security Number “Big 7”
New Record OR Match Does not match Matches less than 3 of the 7 main identifiers Exactly matches First Name, Last Name and Date of Birth and at least 1 more of the 7 main identifiers
Possible Duplicate Records Two patient records that may or may not be the same patient Determined by matching First Name, Last Name and Date of Birth but no other main identifier Soundex match score of 30 to top cutoff
Manual De-duplication Possible duplicate records Patient records One full time and one part-time staff 60 hours per week
Plan to Reduce Duplicate Records New Load/Matching Procedure De-duplication with Research Center Data Entry Education Improve Manual Matching Procedure
New Load/Match Procedure Incoming records checked against purged records Non-alpha characters ignored Additional matching tests Algorithms that were used can be traced
Intermountain Injury Control Research Center Customized matching Advice on new load procedure Proprietary
Data Entry Education Quarterly Newsletters Monthly E-mails Website Information Semi-annual Meetings Data Quality Incentive Awards Focus Training on Data Quality
Improve Manual Matching Improve de-duplication forms Automated manual matching Combine Possibles and Patient de-dup forms Make forms more time efficient
Results Possible duplicate records are expected to be reduced significantly Future data analysis Load/matching Research Center Data entry education Manual matching improvement Compare old load with new load. Load same sample of records in each load. See what percentage matches in each. Use Algorithm data to see which matching methods are working the best and used the most. Evaluate usefulness of research group by using a data set and comparing manual intervention with algorithms used by research group. Evaluate intervention with users to see how much improvement there is in possible duplicate records from each provider. Measure the time it takes to de-duplicate a record before and after the changes in the form.
Slaying the dragon of delay is no sport for the short-winded Anonymous LESSONS LEARNED De-duplication is complicated Computer programming always takes more time than anticipated If you keep at it, good things can happen.
For More Information Contact: Sandra Schulthies at 801-538-6114 or sschulthies@utah.gov