1 Research Data Marts In Support Of Cancer Personalized Medicine Jack London, PhD and Devjani Chatterjee, PhD Jefferson Kimmel Cancer Center, Philadelphia PA Mid-Atlantic Healthcare Informatics Symposium, April 25, 2014
“Cancer” From the late 14th to the late 19th century, the word “apoplexy” referred to any sudden death that began with a sudden loss of consciousness. Ruptured aortic aneurysms, and even heart attacks and strokes were referred to as apoplexy in the past. Like the term “apoplexy,” the word “cancer” was used broadly in the 20 th century to describe people having unrestrained tumor growth. We now know that “cancer” refers to different diseases, with different cellular mechanisms. Although “cancer” was (and still is) often differentiated by its anatomic primary site of origin, such as “breast cancer,” the genomic alterations and pathways affected for a patient are more directly related to the cause and treatment of their “cancer.” All breast cancer patients do not have the same disease mechanism, and therefore all will not respond to the same treatment.
Cancer Personalized Medicine Cancers are often highly heterogeneous with many different subtypes. These subtypes confer different outcomes including prognosis, response to treatments, recurrence, and metastasis. These subtypes are often associated with different genetic mutations, epigenetic events, gene expression profiles, molecular signatures, tissue and organ morphologies, and clinical phenotypes. Effective treatment, and the research needed to develop these treatments, requires a personalized characterization of cancer patients, including their genetic, molecular and clinical data. Cancer research, diagnosis, and treatment also often require biospecimens to obtain these data which characterizes the patient.
Research Data Mart o Cancer translational research and cancer treatment now requires clinical data describing the diagnoses, treatments, and outcomes for patient populations. Additionally, genomic and other data – such as available research biospecimens – need to be integrated with the patient’s clinical data. o A research data mart (RDM) is a data repository (i.e., database) that integrates clinical and research data for use by investigators. o The data are often de-identified (or sometimes anonymized). The de- identified data may be re-identified by an honest broker. o Possible uses of RDMs are hypothesis generation cohort identification
Information Flow from Source to Data Mart
Work Flow for Research Data Access
RDM data are de-identified. Re-identification possible via honest broker. Currently > 34 million observations on > 400,000 patients. Data refreshed weekly. Built on “informatics for integrating biology and the bedside” (i2b2) framework from NIH-funded National Center for Biomedical Computing based at Partners HealthCare System. Current Jefferson i2b2 RDM
Available cancer patient and specimen annotation includes Demographics gender race ethnicity vital status (alive, deceased) Primary cancer diagnosis (ICD-03) age at diagnosis date of diagnosis primary tumor sequence survival (months from diagnosis) primary disease site (ICD-03) histology (ICD-03) AJCC stage (clinical and path) grade TNM (clinical and pathological) Recurrence (distant, local, regional ) Multiple Primary Diagnoses Treatment chemotherapy diagnostic (biopsy) endocrine palliative radiation surgery transplant Site-specific factors, including ER, PR, HER2 CEA, KRAS, CA 19-9, PSA Gleason score Specimen anatomic origin Specimen class ( tissue, fluid ), path ( normal, malignant ), type ( frozen, fixed, paraffin block ) reports ( surgical pathology, cytology, molecular/genomic diagnostics)
Available Genomic Annotation Includes ABL1 APC ATM BRAF CSF1R ERBB4 FBXW7 FGFR2 FGFR3 FLT3 G11 GQ HNF1A HRAS IDH1 JAK3 KDR KRAS MET MPL NOTCH1 NPM1 NRAS PDGFRA PIK3CA PTEN RB1 RET SMAD4 SMO SRC STK11 TP53 VHL GENES Patient genomic annotation includes: gene mutation result (POSITIVE or NEGATIVE) alternate allele frequency mutation type nucleotide change protein change COSMIC ID DBSNP ID
Drag-and-drop i2b2 query tool
Biospecimen Ontology solid, fluid, etc. frozen, paraffin, etc. malignant, normal
Chemo
Tumor Registry – Primary Breast Cancer Annotation
How many patients are ER-PR-Her2 negative, with infiltrating breast cancer, and have frozen tissue available for researchers?
Demographics of patients that are ER-PR-Her2 negative, with infiltrating breast cancer, and have frozen tissue available for researchers
Honest brokers have tools to re-identify data.
i2b2 query interface is the primary access portal for identifying biospecimens available for research. RDM provides cohort size estimates for prospective studies – grant applications – design phase of clinical trials (estimate of recent patient population satisfying proposed eligibility rules) RDM provides comprehensive patient annotation for ongoing research projects. – Next Generation Sequencing (NGS) studies to discover cancer biomarkers Current Usage of Jefferson i2b2 RDM