Andrew Borthwick, Ph.D. Martin Buechi, Ph.D. ChoiceMaker Technologies

Slides:



Advertisements
Similar presentations
Bilateral Communication With the New York Citywide Immunization Registry Angel Aponte Computer Specialist (Software) Contact:
Advertisements

Distributed Scoring of Regents Exams: NYC 2012 Pilots.
Data Quality Class 10. Agenda Review of Last week Cleansing Applications Guest Speaker.
Databases and Processing Modes. Fundamental Data Storage Concepts and Definitions What is an entity? An entity is something about which information is.
Andrew Borthwick, PhD§ Vikki Papadouka, PhD, MPH* Deborah Walker, PhD* *New York City Department of Health
EHRS as a Tool to Improve BP Control 1.Brief history of OQIUN, CCI. Began 1999 using data cards. Started working with multiple practice sites using different.
Improving Data Quality and Quality Assurance in Newborn Screening by Including the Bloodspot Screening Collection Device Serial Number on Birth Certificates.
Session Number 7 Duplicate PIDM Panel Discussion Cuesta Community College Lori McLain - System Administrator/Operator.
A Smart-Pen Product VariSearch A Unique, Cross-language, Spelling-tolerant Search Engine Features and Application Area.
C LIENT R EGISTRY OpenEMPI: Operations Support Training SYSNET International, Inc.
A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH
Components of HIV/AIDS Case Surveillance: Case Report Forms and Sources.
ACL Duplicate Invoices Detection Overview Using ACL to detect and report Duplicate Invoices within and between a Rail Entity’s Ariba procurement, Ellipse.
1 n 1 n 1 1 n n Schema for part of a business application relational database.
Integrating Systems: The Impact on an Immunization Registry The Impact on an Immunization Registry 2002 Immunization Registry Conference Paul Schaeffer,
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Component 6 - Health Management Information Systems Unit 9-2 Administrative, Billing, and Financial Systems This material was developed by Duke University,
2007 Annual Child Support Training Conference and Expo CSE DATA MANAGEMENT September 18, 2007 Presenters Rick Bermudez (DCSS) Janet Nottley (CSDA) Kim.
1 Linking Social Security Death Index (SSDI) Data with Registry Data to Update Demographics and Vital Status David O’Brien, PhD, GISP Alaska Cancer Registry.
Interoperability Between Electronic Health Records and Immunization Information Systems: The New York City Experience National Immunization Conference.
Impact of Childhood Hepatitis A Vaccination: New York City Vikki Papadouka, PhD, MPH Jane R. Zucker, MD, MSc Sharon Balter, MD Vasudha Reddy, MPH Kristen.
Evolution of Data Exchange with the New York Citywide Immunization Registry: From Paper to Electronic Messaging Amy Metroka, Vikki Papadouka, Paul Schaeffer,
1 Ambient Monitoring Program PM 2.5 Data Lean 6 Sigma Air Director’s Meeting May 2015.
Pleiades Software Development, Inc. Automatic Merging of Pedigree Information Annual Workshop on Family History Technology April 3, 2003 Sue Dintelman.
Improving Reporting and IIS-Based Coverage by Conducting VFC Accountability Through an IIS: The New York City Experience Michael Andreas Hansen, MPH, Melissa.
Data Exchange Between IISs: Benefits and Barriers Presented by: Amy Metroka, Director Rezaul Kabir, City Research Scientist Citywide Immunization Registry.
GRITS – Katrina Project Thomas M. Moss GRITS Manager NIC 2006.
Keys and adding, deleting and modifying records in an array ● Record Keys ● Reading and Adding Records ● Partition or Sentinels Marking Space in Use ●
MEDICAL RECORD BROKER -LAVANYA GUNDAMARAJU Introduction Introduction n Database and database systems have become an essential part of everyday life.
Sample Registration - Introduction
DBS Update Service Stakeholder presentation
Functional EHR Systems
Research Master ID (RMID)
Evaluation Anisio Lacerda.
Set Collection A Bag is a general collection class that implements the Collection interface. A Set is a collection that resembles a Bag with the provision.
Patient Centered Medical Home
Databases.
A more efficient you. Introducing EmployerAccess
Electronic Health Records (EHR)
44th National Immunization Conference April 21, 2010
Overview of MDM Site Hub
BACKGROUND New Jersey Immunization Information
System Design Ashima Wadhwa.
Keys to Practice-Based Immunization Recall
6th Annual PHIN Conference August 25-28, 2008
The Role of the Immunization Registry in a Measles Outreak, New York City The Role of the Immunization Registry in a Measles Outbreak, New York City Ynolde.
PIC + TransNet.
2008 Physiological Measurements
Department of Health and Mental Hygiene Bureau of Immunization
Handling Data Using Databases
Research Master ID (RMID)
New Functionality in ARIN Online
Semantic Interoperability and Data Warehouse Design
Chapter 9 Database and Information Management.
Functional EHR Systems
Programming Logic and Design Fourth Edition, Comprehensive
Databases Software This icon indicates the slide contains activities created in Flash. These activities are not editable. For more detailed instructions,
Patient details Health number: this may be a national identifier (preferred), hospital, clinic, or program number Name: full name is preferable as an accurate.
SCHS and Health Statistics
MBUG 2017 Session Title: Preventing/Correcting Multiple Records
S. Findley, M. Irigoyen, P. Sternfels, F. Chimkin, M. Sanchez
Microsoft Access Understanding Relationships
UF Directory Coordinator Training
Mingling Modules Tips & Tricks March 2019.
Data Use Within The Navigation Center
May 9, 2006 National STD Prevention Conference
Methods for Evaluating Deduplication
National Immunization Conference
Auditing Techniques for Ensuring Quality Data in a Registry
Presentation transcript:

Accurate, Customizable Matching: The Heart of the NYC Master Child Index Andrew Borthwick, Ph.D. Martin Buechi, Ph.D. ChoiceMaker Technologies Phone: 212 905 6031 http://www.ChoiceMaker.com Andrew.Borthwick@choicemaker.com Vikki Papadouka, Ph.D. Paul Schaeffer, M.P.A. Deborah Walker, Ph.D. Alexandra Ternier, M.P.H. NYC Department of Health Phone: 212 676-2323 vpapadou@health.nyc.gov

Goals of Talk Background Overview of NYC DOH Master Child Index project History of ChoiceMaker’s use at NYC DOH ChoiceMaker 2.0 ChoiceMaker 2.0 improvements over 1.0 ChoiceMaker’s new “ClueMaker” programming language

Master Child Index (MCI) Integrates the New York Citywide Immunization Registry (CIR) with other child health databases starting with LeadQuest (LQ) to produce a comprehensive Child Health Registry Benefits: Improved surveillance Information sharing with providers Identification of children in need of immunizations and/or lead tests Information sharing among different DOH programs (e.g. addresses, providers, etc.) Improved data quality

CIR and LQ CIR contains 2 million records, and >13 million immunization events 1250 pediatric provider sites report ~170,000 immunization events reported monthly also loads Vital records: ~125,000 children/year LeadQuest contains 1.9 million records and 4 million blood tests 60 laboratories report ~39,000 test results reported monthly

MCI system Records in LQ and CIR will be linked with each other through the MCI New records will be checked against MCI first ChoiceMaker on front end Duplicates caught before they enter LQ CIR MCI

Accurate Matching Is Key to MCI Matching system correlates records in the 2 databases data from systems can be linked Matching not only prevents duplicates from entering the system but also reduces duplication problem in existing databases creates more complete, less fragmented records and hence increases usefulness of data Matching allows for approximate searches more successful identification of records by doctors and DOH staff who are doing lookups Accurate matching is the heart of the MCI

Record Matching Challenges No unique IDs: SSN not allowed Incorrectly submitted data: Borthwick vs. Borthwich Data change over time: names change, addresses change Variations in data received: Andrew vs. Andy Incomplete data: fields aren’t filled in or are filled in with generic values Data in wrong fields: first and last names reversed Large volume of data: need for automation

The MCI’s Record Matching Solution ChoiceMaker™ 2.0 (formerly known as “MEDD”) Identifies and merges fragmented database records Searches databases for approximate matches, decides if same child, and merges if appropriate Links related records across multiple databases (MCI-CIR-LQ)

ChoiceMaker 1.0’s Use by CIR Used successfully in Batch mode by the CIR since 1999 Merged over 700,000 records Accuracy measured at 99.7% in tests supervised by NYC DOH

ChoiceMaker 1.0 vs. 2.0 Both ChoiceMaker 1.0 and 2.0 are: wholistic - take into account simultaneously many aspects of the record modular: clues can be added and taken out depending on the data based on an Artificial Intelligence technique called “maximum entropy modeling” that lets the system learn from examples (hand scored by people)

ChoiceMaker 1.0 vs. 2.0 Improvements in ChoiceMaker 2 vs. ChoiceMaker 1 Handles “stacked” data (more than one value for a field) Is written in Java (vs. C++), which makes it more portable Includes “Blocking” component Includes a new programming language called “ClueMaker” for writing clues Can be called online from MCI, CIR, and LQ applications, not just in batch mode

Technology: Production Matching Search Record Blocking Many Possible Matches Maximum Entropy Matching Match Probabilities of Likely Matches Non-Match Match Probability Match Low High Intermediate Human Review

ClueMaker Clues Encode business rules for matching Take a pair of records and suggest that they match or differ Written in a Java-based language, ClueMaker™ Importance, or weight, of each clue determined by maximum entropy training Clues can also be used as filters (rules) Clue weights are combined to get probability ClueMaker is new to ChoiceMaker with version 2.0

Generic clues Do first names match? clue mFirstNames { match same(r.firstName); } Do first names match approximately using “phonetic matches” such as Soundex? clue mSoundexFirstNames { match same(Soundex.soundex(r.firstName)); Do uncommon first names match? clue mFrequencyFirstName { match foreach(int freq : {0, 1, 2, 3}; same(r.firstName) && Maps.lookupInt("firstFreq", q.firstName) == freq);

Healthcare Clue Do we have indication of a twin—matching last names and birthdates, but different first names and consecutive medical record numbers? clue dTwin { differ same(r.last_name) && same(r.birthday) && different(r.first_name) && valid(q.medical_rcrd) && valid(m.medical_rcrd) && Math.abs(q.medical_rcrd – m.medical_rcrd) == 1; }

Clues for Peculiarities of Data Do month and year of birth match and the record comes from facility XYZ? Due to some error in their system, they always report the day of birth as '1'. clue mDobXyz { match same(DateUtils.MonthAndYear(r.dob)) && ("XYZ" == q.facility || "XYZ" == m.facility); }

ClueMaker: Built for Stacked Data Stacked data: multiple values for single field Examples: Stacked first names: Name, nickname, misspelling, middle name that grandma prefers and reports as first name, etc. Stacked addresses: address history; some doctors may continue to report old address Stacking of data improves matching accuracy ChoiceMaker and ClueMaker built for stacking // There exists a valid matching first name clue mFirstNames { match same(r.names.firstName); }

Benefits of ChoiceMaker for MCI Integrates easily into DOH computing environment Can capture peculiarities of data Fast & inexpensive to modify if new peculiarity comes (e.g., new data provider with quirks) Can easily create many clues Designed to handle stacked data well Bottom line: High accuracy