The PedcBioPortal & DiseaseXpress CBTTC Investigator Meeting Pichai Raman on behalf of D3B Monday, May 8, 17
Downstream Analytics and Visualization Focus on RNA-Seq with common standard of processing & normalization DiseaseXpress Raw Data SBG Cavatica PedcBioPortal Processed Data Takes all data types with a focus on integrative analysis & summary
Downstream Analytics and Visualization PedcBioPortal DiseaseXpress Original cBioPortal developed at MSKCC for TCGA data and other large-scale cancer profiling efforts Lowers barrier to access and visualize complex genomic data for research Hypothesis driven analysis cBioPortal Development now shared across 5 teams : DFCI, MSKCC, Princess Margaret, and CHOP, the Hyve PedcBioPortal : CHOP Implementation has a focus on Pediatric cancer data sets Data Mining D3B Powered application that has greater than 20K samples all processed the same way Leverages work performed by UCSD and the broad and builds upon it by providing web-api access and additional data sets Accessible by R CURL DEXTER Cohort analysis and summary Machine Learning Pan-cancer Analysis Sample level views What genes are most often mutated in my cohort? And what is the association of that mutation with survival? What are some potential CAR therapy targets for Group 3 Medulloblastomas? Are these targets expressed in adult tumors as well? What are the known actionable genetic lesions for this sample? Are there gene expression signatures that are predictive of survival or relapse in brain cancers? What are the hotspot mutations or domains most commonly mutated on this gene?
PedcBio - Key Functionality & Features PedcBioPortal+ has a number of visualizations based on one of three entry points Study View Display of frequent / recurrent mutations or lesions within a study When creating virtual co-horts of molecular subtypes will be able to quickly identify “potental” drivers Patient View Get an overview of all of a patients genetic lesions, connections to Path Reports, clinical trials, drugs, etc.. Has COSMIC data as well as internal statistics to aid in determining if a mutation is likely causal Gene View Look at gene data (mutation / expression etc..) across or within study Correlate genes to other genes within a study or compare to normal tissue expression Can be used to identify targets for immunotherapy
PedcBio - Visualizations & Analytics - Gene Determine mutational hotspots on a gene by looking at mutational landscape across the gene Compare copy number vs mRNA or Risk/Stage vs mRNA CNA vs mRNA Mutation Mapper Single Gene View Look at relationships between genes and identify other potentially interesting candidates by association analysis Determine if genes are mutually exclusive or recurrently associated. Find patients without canonical lesions. OncoPrints Network Multi-Gene Views Survival Analysis
PedcBio - Visualizations & Analytics - Patient Global view of tumor sample lesions & CNA PedcBio - Visualizations & Analytics - Patient Relevant mutations Copy Number Changes
PedcBio - Visualizations & Analytics – Study Summary Survival Analysis Relevant mutations Cohort Characteristics
PedcBio - Current Development & Connectivity Virtual Cohort Creation Biorepository Integration Synthetic or virtual cohorts can serve to bridge together data form different studies to focus on subsets of patients or samples. Examples Include Patients with rare cancers & diseases Patients belonging to specific population or demographic Cases originating from a particular locale Users can move seemlessly between visualization on the portal and samples in the biorepository applications Connecting to other applications on the cloud currently
Downstream Analytics and Visualization PedcBioPortal DiseaseXpress Original cBioPortal developed at MSKCC for TCGA data and other large-scale cancer profiling efforts Lowers barrier to access and visualize complex genomic data for research cBioPortal Development now shared across 5 teams : DFCI, MSKCC, Princess Margaret, and CHOP, the Hyve PedcBioPortal : CHOP Implementation has a focus on Pediatric cancer data sets D3B Powered application that has greater than 20K samples all processed the same way Leverages work performed by UCSD and the broad and builds upon it by providing web-api access and additional data sets Accessible by R CURL DEXTER
DiseaseXpress - Overview
DiseaseXpress – Boxplots by Study / Disease Collapse a study to represent one box Set the collapsed study or other disease as reference box
DiseaseXpress – Scatterplot of 2 Genes Disease specific correlation estimates And P-values
DiseaseXpress – Current Development & Connectivity Cancer Pathway Analysis (single sample & cohort) Addition of all available public RNA-Seq datasets (GEO, ArrayExpress) Increasing speed of access Enhancing annotation to include tags Tie to variant, fusion, and copy number warehouses
Acknowledgements Karthik Kalletla / Anthony Cros : Building DiseaseXpress datastore and engine Komal Rathi : Developing DEXTER Shiny Application DAMEU Bo Zhang Allison Heath Yuankun Zhu D3B Team Members
BACKUPS
Discovery Data Cavatica Mission Pipelines Hypotheses Visualizations To develop a sustainable application eco-system that supports many of the aspects associated with basic & translational research. To be the premier portal for pediatric disease research and serve as a central hub to promote collaborative research between investigators to support sharing and creation of … Pipelines Hypotheses Data Visualizations Algorithms
Services Applications Users Cavatica Eco-System Services Applications Users Bio Repository [ HARVEST, …] Physical Specimens Raw Data Storage & Processing [SBG, …] FASTQ / BAM Files Processed Data Informatics [GitHub LFS, …] Matrices, Data Frames Data Visualizations & Analytics [cBioPortal & DiseaseXpress] Graphs, tables, summaries Data Scientists Data Tracker Investigators / Post-Doc User Management System Data Engineers