TCGA Clinical Data Analysis of Sources of Error Mary E. Edgerton, MD, PhD Department of Pathology UT MD Anderson Cancer Center October 7, 2010
Ship Date Distribution by Site Clinical Aliquot MEE
Tissue Type as a Function of Ship Date-No Biased Relationship Aliquot_MEE
Ship Date for Samples Receiving Targeted Therapy from Different Institutions Clinical_drug_OC_3
Days to Death Should be Date of Last Follow-up: Scatter Plot of Differences by Institution
Difference between Age entry in years and DAYSTOBIRTH converted to years by Institution
DAYSTODEATH Relationship to VITALSTATUS All DECEASED values have corresponding DAYSTODEATH entries LIVING entries all have null DAYSTODEATH (correct) 18 null entries for vital status, one of which has DAYSTODEATH T/C QC checks or installed procedures to normalize db entries: Not null value of days to death should trigger vital status change
DAYSTOTUMORPROGRESSION should be orthogonal to DAYSTO TUMORRECURRENCE if progression is defined as during therapy and recurrence after therapy
Mismatch of DAYSTOTUMORPROGRESSION with SITEOFTUMORFIRSTRECURRENCE Site is METASTASIS Has entry DAYSTOTUMORPROGRESSION With SITEOFTUMORFIRSTRECURRENCE
Mismatch by Institution 10 13 24 29 59
Need to better define Progression and Recurrence While different disciplines may view these differently, TCGA needs to determine a single definition to use across sites and install DB checks to insure quality control
TUMORRESIDUALDISEASE and PRIMARYTHERAPYOUTCOMES Null, COMPLETE RESPONSE, PARTIAL RESPONSE, PROGRESSIVE DISEASE, AND STABLE DISEASE ALL HAVE AS ENTRIES No Macroscopic Disease 1-10 11-20 >20
Residual Disease as a Function of Therapy Outcomes CR PR SD PD
Consider Take TUMORRESIDUALDISEASE to be a measure after surgery and PRIMARYTHERAPYOUTCOMESUCCESS to be the response after adjuvant therapy or after additional therapy (complete response, partial response, stable disease, or progression PR should never start with No Macroscopic Disease as anything with NMD that developed disease would be Progressive Disease (or PD)
Consider Both TUMORRESIDUALDISEASE and PRIMARYTHERAPYOUTCOME are measures after surgery Complete Response (CR) should only have entries of No Macroscopic Disease (NMD)
Most common definition TUMORRESIDUALDISEASE in mm would be after primary surgery and PRIMARYTHERAPYOUTCOME would be after Chemotherapy or after Additional Therapy CR after Chemo can have any starting TUMORRESIDUALDISEASE PR and Stable Disease (SD) would never have NMD as a starting point Given these rules, then there are incorrect entries
Incorrect entries based on my definitions Institution 4 20 61 13 30 1 5 Number of Incorrect entries
PERSONNEOPLASMCANCERSTATUS Is this at the end of primary therapy, as of last follow-up date, or after additional therapy? Does not appear to correlate with at least end of primary therapy or as of last follow-up date, e.g. patient with CR is WITH TUMOR, patient with recurrence is TUMOR FREE. Suggests that this is not computed from follow-up information but is an independent entry Db lacks normalization or internal QC
Drug Therapy Patient clinical data file (clinical_patient_public_OV.txt) has different therapy choices from drug file (clinical_drug_public_OV.txt) such that targeted therapy becomes other drugs There is nothing about salvage therapy vs primary therapy in clincal_drug_public_OV.txt Should Regimen Indication be split into to so Regimen has values Neoadjuvant, Adjuvant and Salvage while Indication has values Primary Diagnosis, Progression and Recurrence. Can DAYSTODRUGTREATMENT be used to define this as salvage, etc. What is the meaning of INITIALCOURSE All of the entries are null
Adjuvant Therapies:RADIATION, CHEMOTHERAPY, IMMUNOTHERAPY, and TARGETEDMOLECULARTHERAPY Database normalization queries: Are the entries seen in the clinical_drug_public.txt files generated from the same table elements as clinical_patient_public (i.e. is the database normalized)? Definition issues Within the clinical_drug_public.txt there are entries that do not match with NCI definitions for special therapies, e.g. Pt 1666 received oca rex oregovomab, an antibody targeted against ca-125, and this was entered as immunotherapy. This does not match NCI definition of immunotherapy in which the patient’s immune system is boosted, but matches the definition for targeted therapy. How are these entries being QC’d?
Tumor Residual Disease Kaplan-Meier Curve for TUMORRESIDUALDISEASEFIELD values of No Macroscopic disease and null overlap Is null being used for No Macroscopic Disease
Clinical_slide_public Null being used for Not Applicable In order to track completion of entries, should nulls be avoided as correct responses
Other General Q/A of Clinical Data FIles Should defaults be employed, such as female for ovarian Allowable values for initial pathologic diagnosis method is a mix of pathology sample types and procedures-should these be split into procedures and sample types If Informed Medical Consent Verified is not checked yes, should we be seeing this patient’s clinical data
Where is ca-125