Download presentation
Presentation is loading. Please wait.
Published byKristina Morrison Modified over 9 years ago
1
MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010 East African Regional consortium
2
2 Objectives Describe level of missing data for key variables Factors associated with missing for patients on Antiretroviral therapy (ART) Assess missing data assumptions in observational databases
3
3 Assumptions of missing data “missing completely at random” [MCAR] - not dependent on anything important –blood sample lost or not taken in error “missing at random” [MAR] - dependent only on other measured factors, not on the missing (unobserved) value –study specifies blood pressure below a threshold, so after registering a high value, patient is withdrawn [blood pressure at this visit] “missing not at random” [MNAR] related to the missing outcome itself –patient withdrew from study because they "didn't feel well“
4
4 Study population 04/2000 – 04/2010 Registered 23121 Active 15070 Non-ART 13310 ART 9811 DART 300 Before 2005 1043 After 2005 8468 9511
5
5 VariablesData recorded Recorded at every clinic visit Recorded only when event occurs Demographic Gender a Date of birth a Weight Height a XXXXXXXX Clinical WHO stage Karnofsky score XXXX Laboratory CD4 T-cell CBC (Hemoglobin Lymphocytes ) XXXX Other variables Opportunistic infections Toxicity ART regimen Reason for ART switch/stop ART Switch date Adherence score XXXXXXXXXXXX
6
6 Source of CD4 data Electronic download (86146 (95%) Recorded (3085 (5%))
7
7 Missing baseline variables VariableN = 9511 Number missing (%) Age0 (0) Gender0 (0) WHO clinical stage0 (0) Weight (Kg)13 (0.1) Height (cm)1032 (10.8) CD4 + count (cell/μL) 1 3126 (32.8) CD4 + count (cell/μL) 2 1641 (17.2) CD4 + count (cell/μL) 3 1350 (14.2) ART regimen0 (0) Note:1=3mth pre-ART, 2=6mths pre-ART, 3=12mths pre-ART
8
8 Number of missing baseline variables Note: a variables include weight, height and CD4 count Year of ART start Number of missing variables, n(%) 012 ≤ 2004 198 ( 3.5 ) 648 (19.2)197 (47.4) 20051878 ( 32.9 ) 848 (25.1)100 (24.0) 2006 971 (17.0) 419 (12.4) 24 (5.8) 20071154 (20.2) 627 (18.6) 30 (7.2) 2008 739 (12.9) 379 (11.2) 33 (7.9) 2009 694 (12.1) 385 (11.4) 26 (6.3) 2010 81 (1.4) 74 (2.2) 6 (1.4) Total5715 (100)3380 (100)416 (100)
9
9 Factors associated with missing baseline CD4 count No association with gender, age, weight VariableMissing N=3167 NOT missing N=6344 p Missing baseline height, n(%) 411 (13) 621 (9.8)<0.0001 ART regimen; Nevirapine Efavirenz PI Other 1798 (56.8) 1143 (36.0) 154 (4.9) 72 (2.3) 3698 (58.3) 2425 (38.2) 83 (1.3) 138 (2.2) <0.0001 Year of ART initiation; ≤2004 2005 2006 2007 2008 2009 2010 562 (17.7) 720 (22.7) 415 (13.2) 614 (19.4) 390 (12.3) 387 (12.2) 79 (2.5) 481 (7.6) 2106 (33.2) 999 (15.7) 1197 (18.9) 761 (12.0) 718 (11.3) 82 (1.3) <0.0001 OUTCOMES Study status Active Dead Lost Transferred Missing 2365 (33.8) 199 (24.0) 76 (65.5) 354 (33.1) 173 (34.0) 4623 (72.9) 629 (9.9) 40 (0.6) 716 (11.3) 336 (5.3) <0.0001
10
10 CD4 counts at follow-up visits CD4 tested 6 monthly (± 2 months) Exclude baseline CD4 counts Complete CD4 data No. of cd4 test expected >= No. total cd4 Given duration on ART counts observed Missing CD4 data No. of cd4 test expected ≠ No. total cd4 Given duration on ART counts observed 1423 (15%)- insufficient follow-up 8088 (85%) assessed for missing CD4
11
11 Categorization of follow-up CD4 data (N= 8088) Categorization | Freq. Percent -------------------------------------+------------------------ complete baseline+ complete follow-up | 2,878 35.58 complete baseline + missing follow-up | 2,529 31.27 missing baseline + complete follow-up | 1,315 16.26 missing baseline + missing follow-up | 1,366 16.89 ------------------------------- -----+------------------------ Total | 8,088 100.00 Complete baseline + complete f/up + cd4 testing + timely cd4 tests = 864 (10.7%) Included all nested research cohort patients
12
12 n=995n=2487n=1174n=1555n=960n=917 Categorization of follow-up CD4 data year of ART initiation for patients with atleast 6 months follow-up
13
13 Validation of incident Post-ART Tuberculosis cases Tuberculosis most common opportunistic infection (rate (95% CI) 2.79 (2.45-3.16)) in first 24 months after ART initiation Merged flagged TB cases with TB drug database Identified patients on TB treatment 334 incident post-ART cases
14
14 Log rank P<0.435 Assumption 1 Baseline CD4 data Missing completely at random Probability of development of Tuberculosis (TB) by baseline CD4 data 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.511.522.5 analysis time Complete baseline CD4 dataMissing baseline_ CD4 data
15
15 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.511.522.5 analysis time Missing follow cd4 dataComplete follow up cd4 data Assumption 2 Baseline CD4 data missing at random Probability of development of Tuberculosis (TB) by follow-up CD4 data
16
16 Preliminary Insights from analysis Reconcile local and IeDEA wide analyses Baseline CD4 missing completely at random (MCAR) Follow-up CD4 data missing at random Ignoring the missing data will lead to biased estimates of ART Strategies needed to identify patterns and mechanisms of missing data in observational data prior to analysis
17
17 Planned analyses missing data and other HIV outcomes e.g. immune response Incidence of other opportunistic infections toxicity treatment changes/switches Strength of nested research cohort can be used to validate imputed data in large database CD4 trajectories versus mortality -estimate the distribution of CD4 marker trajectories and the distribution of log survival time using mixed-effects models, measuring time from the first pre-HAART CD4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.