Download presentation
Presentation is loading. Please wait.
Published byCordelia Dulcie Garrett Modified over 6 years ago
1
Achieving Better Data Quality: Reducing Duplicate Records
Sandra Schulthies, M.S. Yukiko Yoneoka, M.S. Barry Nangle, Ph. D. 11/20/2018
2
Many Sources of Data Vital Records Public Health Clinics WIC
Private Provider Billing Systems WebKIDS application
3
Data Load/Match Procedure
First Name Middle Name Last Name Date of Birth Mother’s First Name Mother’s Maiden Name Social Security Number “Big 7”
5
New Record OR Match Does not match
Matches less than 3 of the 7 main identifiers Exactly matches First Name, Last Name and Date of Birth and at least 1 more of the 7 main identifiers
6
Possible Duplicate Records
Two patient records that may or may not be the same patient Determined by matching First Name, Last Name and Date of Birth but no other main identifier Soundex match score of 30 to top cutoff
7
Manual De-duplication
Possible duplicate records Patient records One full time and one part-time staff 60 hours per week
11
Plan to Reduce Duplicate Records
New Load/Matching Procedure De-duplication with Research Center Data Entry Education Improve Manual Matching Procedure
12
New Load/Match Procedure
Incoming records checked against purged records Non-alpha characters ignored Additional matching tests Algorithms that were used can be traced
13
Intermountain Injury Control Research Center
Customized matching Advice on new load procedure Proprietary
14
Data Entry Education Quarterly Newsletters Monthly E-mails
Website Information Semi-annual Meetings Data Quality Incentive Awards Focus Training on Data Quality
15
Improve Manual Matching
Improve de-duplication forms Automated manual matching Combine Possibles and Patient de-dup forms Make forms more time efficient
16
Results Possible duplicate records are expected to be reduced significantly Future data analysis Load/matching Research Center Data entry education Manual matching improvement Compare old load with new load. Load same sample of records in each load. See what percentage matches in each. Use Algorithm data to see which matching methods are working the best and used the most. Evaluate usefulness of research group by using a data set and comparing manual intervention with algorithms used by research group. Evaluate intervention with users to see how much improvement there is in possible duplicate records from each provider. Measure the time it takes to de-duplicate a record before and after the changes in the form.
17
Slaying the dragon of delay is no sport for the short-winded Anonymous
LESSONS LEARNED De-duplication is complicated Computer programming always takes more time than anticipated If you keep at it, good things can happen.
18
For More Information Contact:
Sandra Schulthies at or
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.