The Clinical Practice Research Datalink Methodological Challenges in using Routine Clinical Data Dr Alison Nightingale, University of Bath
What is the CPRD and how are data generated? Database challenges Methodological challenges caused by missing data Overview
Clinical Practice Research Datalink
15.6 million patients from 684 general practices since ,773,422,644 records in 10 separate tables 6TB storage currently required Big Data? 1,605,948,604 consultations 903,530,787 tests 1,423,921,076 prescriptions
Data loads Downloading datasets from CPRD online Time taken to run complex algorithms Analysis of datasets where the study population is large Processing and analysis challenges
Methodological challenges in using the CPRD
Data generated through routine clinical care Missing data: Data are censored to the left and right Symptoms / diagnoses / results that are not clinically relevant Characteristics of patients or tests that are ‘normal’ ‘Take as directed’ prescriptions Confirmation or exclusion of diagnoses CPRD: missing data
Incidence Prevalence Mortality Risk factors Disease epidemiology
Incidence Prevalence Mortality Risk factors Disease epidemiology
Prevalence = incidence * average disease duration Increasing incidence rates Awareness Changes in diagnostic criteria Decreasing mortality (increase in disease duration) Improved treatment Earlier diagnosis Artefact? : misclassification / relapsing-remitting disease Increasing prevalence
What is the truth? Adjustment? Disease epidemiology
Study to estimate the diagnostic accuracy of RF testing in primary care for the diagnosis of rheumatoid arthritis. Investigate the impact of negative RF tests on time to diagnosis of RA Accuracy of diagnostic tests Disease Test result PositiveNegative PositiveTPFP NegativeFNTN
Identified potential issue of missing data due to lack of clinical significance of a negative test Suspected the data were MNAR = issue with imputing Accuracy of diagnostic tests Disease Test result PositiveNegative PositiveTPFP NegativeFN (Seronegative) TN
Practice-year stratified data % tests with ‘null’ result value % positive test results Results from RF tests submitted to labs in Oxford indicated that on average 7% were positive. Maximum positive rate was 23% from a rheumatology clinic
Practice-year stratified data % tests with ‘null’ result value % positive test results MNAR Reference range MNAR Tests not recorded -ve results not recorded
Practice-year stratified data % tests with ‘null’ result value % positive test results MNAR Reference range MNAR Tests not recorded -ve results not recorded
Practice-year stratified data % tests with ‘null’ result value % positive test results MNAR Reference range MNAR Tests not recorded -ve results not recorded
What do we do with MNAR data? % tests with ‘null’ result value % positive test results MNAR Reference range MNAR Tests not recorded -ve results not recorded Tests and test results more likely to be MAR
No current statistical methods of identifying MNAR data Reference values not always available No current consensus on how to handle data that are MNAR Likely to be present throughout: Prescribing records – missing prescribing information Medical diagnoses – missing diagnoses Patient characteristics: smoking, alcohol, body mass index Data that are likely to be MNAR
Large population Cohort and nested case-control studies Post-marketing safety surveillance: adverse outcomes Comparative effectiveness: treatment outcomes Methodological issues surrounding timing of exposures and the use of cumulative drug exposures Confounding by indication Comparative effectiveness & safety
Professor Neil McHugh, Professor of Pharmacopidemiology and Consultant Rheumatologist Dr Gavin Shaddick, Reader in Statistics Dr Anita McGrogan, Lecturer in Pharmacoepidemiology Dr Rachel Charlton, Research Fellow Dr Alison Nightingale, Research Fellow Julia Snowball, Database manager and programmer Amelia Jobling, Research Assistant and PhD student (statistics) CPRD team at Bath
Thank you