Challenges in data linkage: error and bias

Slides:



Advertisements
Similar presentations
NHS Health Scotland Linking the Scottish Health Survey to hospitalisation and mortality records in Scotland Bruce Whyte July 2003.
Advertisements

USE OF EVIDENCE IN DECISION MODELS: An appraisal of health technology assessments in the UK Nicola Cooper Centre for Biostatistics & Genetic Epidemiology,
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
US Berkeley 2/12/2013 linking population-based data to child welfare records: a public health approach to surveillance Emily Putnam-Hornstein, PhD University.
The Linked PDD-Death Product More than you want to know David Zingmond, MD, PhD Division of General Internal and Health Services Research UCLA School of.
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
BACKGROUND AND AIM Website: Challenges in conducting a systematic review of the diagnostic accuracy of genetic tests: an example.
Which women are having a Hysterectomy and why? A plain English presentation of the methodology and findings of a database linkage study Dr Helen Stokes-Lampard.
Linked Data Products Vital Statistics Death/PDD Presenter: Jan Morgan.
Linking Mortality and Inpatient Discharge Records: Comparing Deterministic and Probabilistic Methodologies Richard Miller Office of Health Informatics.
Record Linkage Simulation Biolink Meeting June Adelaide Ariel.
Pregnancy-associated Crashes and Birth Outcomes: Linking birth/fetal death records to motor vehicle crash data Lisa Hyde, Larry Cook Lenora Olson, Hank.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Where have the hospital delivered babies gone? Factors associated with hospital births not recoded in the NSW Admitted Patient Data collection The Faculty.
Capturing Sensitive Data & Data Linkage. Capturing Sensitive Data Data Protection Act 1998 (Section 33) – Allows data to be used for research purposes.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Notes  Data are presented as a pair of overlying bars, the outer, wider bar representing the period 1st Oct 2007 to 30th September 2008, and the inner,
Who is having intended births: Analysis of two adolescent birth cohorts ( and ) Isia Rech Nzikou Pembe and Ann Dozier, RN PhD University.
P erinatal P eriods o f R isk Analytic Issues: Frequently Asked Data & Analytic Questions A CityMatCH “How-to-Do” Workshop.
Following lives from birth and through the adult years The Role of Record Linkage in the UK Millennium Cohort Study Eucconet Workshop.
Health Information Solutions Gaining Insights through Data Linkage: The VS-PDD Linked Data Files Presenters: Beate Danielsen & Jan Morgan.
© Nuffield Trust 22 June 2015 Matched Control Studies: Methods and case studies Cono Ariti
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
NIHR using systematic reviews to inform funding decisions Matt Westmore, Director of Finance and Strategy Sheetal Bhurke, Research Fellow NIHR Evaluation,
Developing job linkages for the Health and Retirement Study John Abowd, Margaret Levenstein, Kristin McCue, Dhiren Patki, Ann Rodgers, Matthew Shapiro,
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
A ssociation of Public Health Observatories Hospital Activity data Roy Maxwell SWPHO & Bristol University Dr Richard Wilson Sandwell PCT.
Evaluating the UK Breast Screening Programme Study design and practicalities Louise Johns, Sue Moss, Tony Swerdlow Department of Health Cancer Screening.
Results Introduction There are social inequalities in breast cancer survival with those in more deprived and Black ethnic groups having lower survival.
How can we use geographic variation in unplanned admissions to improve efficiency? John Busby CLAHRC West.
Can Linking Motor Vehicle Crash (MVC) Data Improve MVC Injury Surveillance? Jennifer Jones, MPH Anna Waller, ScD August 8, Traffic Records Forum.
Epidemiological Study Designs And Measures Of Risks (1)
QUALITY OF CARE TRENDS FOR CALIFORNIA CHILDREN
Dr Helen Stokes-Lampard
Funded by the NIHR HSDR Programme
Is retention on ART underestimated due to patient transfers
Parallel Sessions: Pathways & Prediction
Royal Society of Tropical Medicine and Hygiene Manson House
Differentially Private Verification of Regression Model Results
Colin Fischbacher Information Services Division (ISD)
Emergency Department Visits in the Neonatal Period:
Assessing Disclosure Risk in Microdata
Patterns of psychiatric hospital admission for schizophrenia and related psychosis in England: A retrospective cross-sectional survey Thompson A. D.¹,
(95% confidence interval)
Conclusions Context Long-Term Conditions Questionnaire Results
Clinical Trials Research Unit, University of Leeds, UK
School for Primary Care Research
BRIGHTLIGHT: from first glow to now – what, why and how
The NIHR Southampton Clinical Research Facility was established by the Wellcome Trust and the Department of Health in The NIHR Southampton Clinical.
Interim findings from Severe Obesity Study
James Doidge | Administrative Data Research Centre for England
  International comparison of health system performance and self-reported data Nigel Rice, Silvana Robone, Peter Smith Centre for Health Economics, University.
School for Primary Care Research
1. Bristol Medical School, University of Bristol
Title The NIHR Southampton Clinical Research Facility was established by the Wellcome Trust and the Department of Health in The NIHR Southampton.
Jessica Wood, Graeme MacLennan & Jemma Hudson
School for Primary Care Research
Alcoholic liver disease in intensive care
(95% confidence interval)
Professor Deborah Baker
This study/project is funded by/ supported by the National Institute for Health Research (NIHR) [name of NIHR programme (Grant Reference Number XXX)/name.
This study/project is funded by/ supported by the National Institute for Health Research (NIHR) [name of NIHR programme (Grant Reference Number XXX)/name.
The NIHR Southampton Clinical Research Facility was established by the Wellcome Trust and the Department of Health in The NIHR Southampton Clinical.
This study/project is funded by/ supported by the National Institute for Health Research (NIHR) [name of NIHR programme (Grant Reference Number XXX)/name.
Investigating the effects of socioeconomic factors on cancer treatment patterns and outcomes in Canada using individual-level linked national data Presenters:
Measuring Health Disparities in Healthy People 2010
Pnina ZADKA Central Bureau of Statistics Israel
Pnina ZADKA Central Bureau of Statistics Israel
State Consumer Health Information and Policy Advisory Council Meeting
Research Showcase: Combining multiple methods
Presentation transcript:

Challenges in data linkage: error and bias  Katie Harron October 2014 UCL Institute of Child Health k.harron@ucl.ac.uk

The linkage problem Match status Match Non-match Link status Link   Match status Match (pair from same individual) Non-match (pair from different individuals) Link status Link Identified match False match Non-link Missed match Identified non-match 2

Deterministic linkage in Hospital Episode Statistics (HES) Few false-matches 1 Sex Date of Birth NHS Number 2 Postcode Local Patient Identifier within Provider 3 More missed-matches What kind of linkage is this? Which patients might be missed?

Quality of unique identifiers 166,406 records of admissions to paediatric intensive care (PICANet) 85,137 non-matches 81,269 matches 46 (0.1%) same NHS number 3,207 (4%) different NHS numbers Hagger-Johnson et al. Causes and consequences of data linkage errors: False and missed matches following linkage of hospital data (under review)

Deterministic linkage with pseudonymisation at source Courtesy of Peter Jones, ONS Beyond 2011 programme

Probabilistic linkage Highest weight is retained pair 1 pair 2 pair 3 Low match High match weight weight Primary File Ronald Fisher Linking File Ronald Fisher Linking File Karl Pearson Linking File Carl Gauss

Probabilistic linkage Highest weight is retained pair 3 Low match High match weight weight P(γ=1 | M) = m-probability = sensitivity   the probability of agreement given the records from same subject  P(γ=1 | U) = u-probability= 1-specificity the probability of agreement given the records from different subjects Log ratio = w = log2 (m/u) if identifiers agree log2 [(1-m)/(1-u)] if identifiers disagree Match weight = W = ∑wi

Probabilistic linkage Matches agreement on NHS number Non-matches agreement on sex Low match High match weight weight disagreement on date of birth agree on some ids disagree on some ids Chance (same date of birth) Recording errors Missing data

Matches Non-matches Two thresholds Missed matches False matches Links Low match High match weight weight Links Links Two thresholds

Evaluating linkage quality Small amounts of linkage error can result in substantially biased results The impact of linkage error on results is rarely reported Linkage error affects different types of analysis in different ways

Why it’s important to evaluate linkage error Schmidlin et al (2013) Impact of unlinked deaths and coding changes on mortality trends in the Swiss National Cohort. BMC Med Inform Decis Mak 13 (1):1

Why it’s important to evaluate linkage error Hobbs, G. & Vignoles, A., 2007. Is free school meal status a valid proxy for socio-economic status (in schools research)? Centre for the Economics of Education; London School of Economics and Political Science.

Why it’s important to evaluate linkage error Highly sensitive Highly specific Lariscy (2011). Differential Record Linkage by Hispanic Ethnicity and Age in Linked Mortality Studies: Implications for the Epidemiologic Paradox. J Aging Health 23(8):1263-84

Why it’s important to evaluate linkage error Ford et al 2006. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatric and Perinatal Epidemiology 20(4): 329-337.

Evaluating linkage quality i) Sensitivity analysis using different linkage criteria ii) Subset of gold-standard data to quantify linkage bias Highly sensitive Highly specific iii) Comparisons of linked and unlinked data iv) Imputation for uncertain links

Imputation for linkage Primary file Linking file Variable of interest Record 1 Exact match high Record 2 low Record 3 Prior-informed imputation high low Record 4 Match weight=10 med Match weight=1 low Record n Match weight=5 Match weight=4 high Match weight=3 med high Goldstein et al Stat Med 2012;31(28):3481-3493 Harron et al BMC Med Res Method 2014;14(1):36

Implications for data providers i) Sensitivity analysis using different thresholds Availability of all candidate records (linked and unlinked) ii) Subset of gold-standard data to quantify linkage bias iii) Comparisons of linked and unlinked data Subset of data where true match status is known (gold-standard) iv) Imputation for uncertain links Harron et al 2012. Opening the black box of record linkage. J Epidemiol Commun H 66(12):1198

Summary Data linkage is a powerful tool for enhancing administrative data Linkage error has important effects on analyses Results vary according to choice of thresholds and methods Taking error into account is possible without releasing identifiable data Communication between linkers and data users is vital

Acknowledgements and funding Harvey Goldstein, Ruth Gilbert, Gareth Hagger-Johnson and Angie Wade, UCL Institute of Child Health Berit Muller-Pebody, Public Health England Roger Parslow, Tom Fleming, Lee Norman and the PICANet team, University of Leeds This work was supported by funding from the National Institute for Health Research Health Technology Assessment (NIHR HTA) programme (project number 08/13/47). The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the HTA programme, NIHR, NHS or the Department of Health. The authors state no conflicts of interest.