Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biolink NL A national infrastructure for linkage of biobanks to medical and socioeconomic registries Adelaide Ariel SHIP Conference 28th-30th August 2013.

Similar presentations


Presentation on theme: "Biolink NL A national infrastructure for linkage of biobanks to medical and socioeconomic registries Adelaide Ariel SHIP Conference 28th-30th August 2013."— Presentation transcript:

1 Biolink NL A national infrastructure for linkage of biobanks to medical and socioeconomic registries Adelaide Ariel SHIP Conference 28th-30th August 2013

2 2 The Dutch Biolink Project (Biolink NL) Main goals: To improve the efficiency and quality of linkage of biobanks to medical and socioeconomic registries, in conformity with statutory and consent obligations to participants; To set up a national infrastructure to enable these linkages The Biolink Project is a collaboration project of Dutch universities, University Medical Centers, Statistics Netherlands, and health care institutions. www.biolink-nl.eu

3 3 Linking Challenges in the Biolink NL Unique identifier is lacking Linking would be performed on personal identifiers Privacy concerns Surname might not be allowed for use Personal identifiers have to be encrypted Both availability and quality of the personal identifiers may vary across registries

4 4 Linking Approaches in the Biolink NL Personal identifiers as linking variables: Surname, the date of birth, sex, postal code Take into consideration: Surname might not be allowed for use Research questions: which personal identifier would be a ‘must’ in which situation a deterministic/probabilistic method would perform best

5 5 Project Approach Development Evaluation Testing Conduct a literature survey on record linkage methodology & applications Develop a prototype for the linkage strategy by using simulated data Test the linkage strategy on real data Evaluate the linking results by means of other identifier (encrypted Dutch-ID) content variable (content-validation)

6 6 Current Presentation Development Evaluation Testing Develop a prototype for linkage strategy by using simulated data. Real data were used as blueprints for simulated data. Overview: Our motivations Factors considered in the simulation Findings Prototype for the linkage strategy

7 7 Our motivations: We want to experiment with different approaches, without violating privacy concerns. The simulated data sets are modelled after the real data sets. We want to include “what-if” scenarios: What if not all identifiers are available for linking? What if the amount of shared records is small? What if the error rate is high? Using Simulated Data

8 8 Factors Considered for the Simulation The linkages in the Biolink NL deal with registries of varying size and population covered Pathology Data Cancer Registry General Population Registry Female Cohort Children Cohort

9 9 Factors Considered for the Simulation The amount of shared records (overlap) may vary Cancer Registry General Population Registry Cancer Registry Female Cohort Large Overlap Small Overlap

10 10 Factors Considered for the Simulation Personal identifiers are not 100% accurate or consistent; for instance due to: Typing errors Changing address Using different surnames (married vs maiden name) We vary the amount of errors up to 30%

11 11 Linking Methods Preferably practical and applicable for encrypted identifiers. Deterministic linkage method Partial matching Probabilistic linkage method Simple probabilistic Jaro-Winkler Bigram Implemented in SAS 9.2 and RecordLinkage (R package)

12 12 Simulation Findings (1) The identifier date of birth should be included.

13 13 Simulation Findings (2) Together, deterministic and probabilistic method can be used to help detect possible overlap size.

14 14 Simulation Findings (3) Deterministic method appears to be particularly more suitable for: Small overlap size (< 60%) Probabilistic method appears to perform best when the following conditions are met: Large overlap size (more than 60%) All identifiers are taken as linkage variables

15 15 Linking Strategy 15 Less than 20,000 records? Include surname? Deterministic Probabilistic Possible overlap size < 50%? Deterministic Probabilistic Include surname? Yes No

16 16 Next Steps The following linkages will be chosen for testing and evaluation: A Dutch female cohort – the Dutch Cancer Registry Dutch twin-children cohort – Health Insurance Database Dutch children cohort – the Dutch National Pharmacy Database


Download ppt "Biolink NL A national infrastructure for linkage of biobanks to medical and socioeconomic registries Adelaide Ariel SHIP Conference 28th-30th August 2013."

Similar presentations


Ads by Google