"The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention/the Agency for Chronic Disease."

1 "The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention/the Agency for Chronic Disease." Link Plus Version 2: An Essential Central Cancer Registry Linkage Tool NAACCR 2008 Annual Conference Denver, Colorado June 10, 2008 Kathleen Thoburn, David Gu (CDC/NPCR Contractors), Tom Rawson, Joe Rogers, CDC

2 Record linkage is a fundamental activity for CCRs –Casefinding, linking new reports, duplicate detection, follow-up, special studies Failure in the linkage process leads to –Over- or under-counting of cancers for the CCR –Generation of inaccurate counts and rates –Missed information obtained via linkage with other data sources (e.g., vital status) Central Cancer Registry (CCR) Record Linkage

3 Record linkage is becoming easier Efficiency is a key feature –Faster, more efficient linkage process allows more linkages for less $$ and staff time More accurate counts More research Increased utilization of registry data Central Cancer Registry Record Linkage

4 Link Plus Software Stand-alone probabilistic record linkage program Combines ease of use and statistical sophistication Detects duplicates within a data file, or links two data files together Supports fixed width files, delimited files, and North American Association of Central Cancer Registries files Provides powerful support for manual review of uncertain matches

5 Link Plus Is Free $0.00

6 Link Plus Is Easy To Use Designed especially for cancer registry work –HOWEVER, can be used with any data Mathematics largely hidden from user Practical default values supplied for many tasks Familiar Windows interface Includes Help and test examples

7 Link Plus Is Easy To Use Link Plus gets you from HERE: Last nameFirst NameSiteSSNDOBSexDateDx SMITHJOHNC61912365478902111934 106152004 Last nameFirst NameDOBDate of DeathCODDeath Cert # SMITHJOHN020119340320200612365478901234 Cancer Registry data for John Smith: Vital Statistics data for John Smith: To HERE: Last nameFirst NameSiteSSNDOBSexDateDxDeath DateCODDeath Cert # SMITHJOHNC61912365478902011934 10615200403202006C10001234 Linked data for John Smith:

8 Link Plus Is Easy To Use Without having to go HERE:

9 Link Plus Linkage Overview Two main types of linkage: External Linkage –Probabilistically link one file to another file Deduplication –Special case of record linkage –Records in the same file are blocked, compared, and scored against each other –Result is a ranked list of record pairs –High-scoring pairs may be duplicates

10 Link Plus Matching Methods Exact Generic String Last Name/First Name SSN (Social Security Number) Zip Code Date Middle Name Value-Specific (Frequency-Based)

11 Link Plus Version 2 Overview of Improvements Improved file import process Enhanced support for deduplication linkages New Zip Code Matching Method –Matches 5 digit zip code to 9 digit zip code Use of nicknames in First Name Matching Method

12 Link Plus Version 2 Overview of Improvements SSN Matching Method now accepts 4 digit SSNs Linkage Process Progress Window New and powerful manual review New merged file export functions Improved context-sensitive and online Help

13 Link Plus Linkage Overview 1.Select Data Type for File 1 2.Locate/Identify File 1 3.Data Import for File 1 4.Select Data Type for File 2 5.Locate/Identify File 2 6.Data Import for File 2 7.Select Blocking Variables & Phonetic System 8.Select Matching Variables & Matching Methods 9.Select ID Variables 10.Define Missing Values 11.Select Direct/EM Method 12.Enter Cut-off Value 13.Specify Linkage File Name and Location 14.Run Linkage 15.Perform Manual Review of Uncertain Matches 16.Export Merged File External Linkage Steps:

14 Identify/Import Data Files Specify Missing Values Direct Method/EM Algorithm Enter Cutoff Specify Linkage File Name and Location Save Linkage Configuration Run Linkage! Specify Data Type Select Blocking Variables/Phonetic System Select Matching Variables/ Methods Select ID Variables Link Plus Linkage Configuration

15 Link Plus Manual Review

16 Link Plus File Export

17 Link Plus Future Development Refine name matching methods Allow user to provide names frequency file Allow CRS Plus users to select additional variables for manual review and export For external linkages, allow user to choose whether to write all comparison pairs, or just comparison pair with highest score, to linkage report

18 Link Plus Future Development Output NAACCR record format Develop API; enable call from other software Develop additional feature to enable use in production mode; including pre-analysis for selection of most effective cut-off Write papers (including research on record linkage methods)

19 CDC–NPCR Link Plus Contacts Kathleen K. Thoburn, CDC/NPCR Contractor E-mail: David Gu, CDC/NPCR Contractor E-mail: Tom Rawson, CDC Computer Programmer

20 Obtaining Link Plus Version 2 1.Go to NPCR Home Page: 2.In the ‘Tools’ Section - click on Registry Plus 3.Under ‘Registry Plus Components’ - click on Link Plus 4.Click on Technical Information and InstallationTechnical Information and Installation

21 Link Plus Version 2 Linkage Demonstration

