Environmental Cancer Genomics from High-Throughput Assays to Prevention Stefano Monti Art beCAUSE consortium meeting 8/4/2015 – 9/1/2015
Overall Goal long-term in vivo carcinogenicity Predict long-term in vivo carcinogenicity of chemical compounds from short-term in vitro genomic assays short-term in vitro genomic assays of exposure Carcinogenicity Screening
Underlying Hypotheses Short-term long-term Short-term exposure assays can predict long-term phenotype In-vitro in-vivo In-vitro assays can predict in-vivo response
The Case for Carcinogenicity Screening prevention vs. care Prevention Care
Cancer and Exposure to Chemicals Cancer 2 nd leading cause of death Cancer remains the 2 nd leading cause of death in the US. role played by environmental under-studied The role played by environmental (chemical and biological) pollutants in human cancer is under-studied. effective prevention Accurate prediction of the consequences of exposure more effective prevention. ~2% of the 80,000+ chemicalstested for safety Only ~2% of the 80,000+ chemicals have been tested for safety. Complex mixtures Complex mixtures. Number of compound pairs 80,000 2 = 6,400,000,000 Number of compound triplets = 80,000 3 = 512,000,000,000,000 (5.12e+14) Number of compound quadruples = 80,000 4 = 4.1e+19
High-profile Reports PCP recommendations (2009) precautionary, prevention-oriented “A precautionary, prevention-oriented approach should replace current reactionary approaches to environmental contaminants.” “High-throughput screening data interpretation models “High-throughput screening technologies and related data interpretation models should be developed and used to evaluate multiple exposures simultaneously” IBCERCC recommendations (2013) Prevention is the key to reducing the burden of breast cancer. Enhanced investments with the objective of reducing or eliminating harmful environmental exposures high-throughput technologiesmultiple risk factors Utilize high-throughput technologies capable of evaluating multiple risk factors simultaneously.
Carcinogenicity Testing Epidemiology studies Rodent assays In vitro assays observational, not randomized trial incomplete/unstandardized exposure data difficult to control for confounders two year rat bioassay (“gold standard”) time and resource consuming Imperfect mapping to human carcinogenicity human cell lines less time and resource consuming allows large sample size of chemical perturbations translation to in vivo relevance?
Goals genomic predictive models carcinogenic potential environmental chemical compounds Development of genomic predictive models of the carcinogenic potential of environmental chemical compounds in-vitro Use of in-vitro models (cell lines/iPSC) high-throughput screens Use of high-throughput screens (HTS) computational models Development of computational models to predict carcinogenicity (and toxicity) mechanisms of action Use of chemical perturbations to study mechanisms of action.
Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Pathways affected Pathways affected Driver alterations Driver alterations Biomarkers Biomarkers … Understand Why
Goals Development of “Carcinogenicity Biomarker(s)” Movie Like Don’t Like Carcinogenicity Prediction Model Carcinogenicity Director Scriptwriter Genre Actor Period Foreign Length Color/BW … Movie 1 Movie 2 … Movie n (Machine) Learning from Known Examples
Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Non-carcinogens Carcinogens gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 … To generate this ‘matrix’ 10K/100K's of experiments need to be performed To generate this ‘matrix’ 10K/100K's of experiments need to be performed
Expression Profiling measuring transcriptional “activity” genome-wide DNADNA RNARNA mRNAmRNA ProteinsProteins Transcription /Post-transcription Translation Low High expression
sort non-carcinogens carcinogens Low High expression Compound 1 Compound 2 Compound 3 Compound 4 Compound 5 Compound 6 Compound 7 Compound 8 Compound 9 Compound 10 Compound 11 Compound 12 Compound 13 Compound 14 Compound 15 Compound 16 Compound 1 Compound 2 Compound 3 Compound 4 Compound 5 Compound 6 Compound 7 Compound 8 Compound 9 Compound 10 Compound 11 Compound 12 Compound 13 Compound 14 Compound 15 Compound 16 Expression Profiling to predict chemical carcinogenicity
Transcriptional Signatures Stefano Monti − BUSM ? non-carcinogens carcinogens Expression Profiling to predict chemical carcinogenicity
Project Design Overview … Genotoxicity Carcinogenicity Compound 1 Compound 2 Compound 3 Compound N … Prediction Evaluation Classification Accuracy Sensitivity/Specificity ROC curve … Biology of Exposure Exposure MoA Pathways “Drivers” Exposure risk models Carcinogenicity Prediction “New” compound Carcinogen Non-Carcinogen Cell lines/iPSC treated w/ compounds ….. and profiled on L1000/ 3’DGE / SFL Project depends on high-throughput, cost-effective gene expression assay
Project Design Overview … Genotoxicity Carcinogenicity Compound 1 Compound 2 Compound 3 Compound N … Prediction Evaluation Classification Accuracy Sensitivity/Specificity ROC curve … Biology of Exposure Exposure MoA Pathways “Drivers” Exposure risk models Carcinogenicity Prediction “New” compound Carcinogen Non-Carcinogen Cell lines/iPSC treated w/ compounds ….. and profiled on L1000/ 3’DGE / SFL Project depends on high-throughput, cost-effective gene expression assay Long-term Phenotypes Short-term Assay
Deliverables The Carcinogenome DB The Carcinogenome DB (CGDB) Genome-wide transcriptional profiles of 10,000s of compounds and mixtures on multiple cell types and at multiple doses/times Carcinogenicity Biomarker Carcinogenicity Biomarker(s) Predictive models of carcinogenicity from in-vitro profiling Signatures and Pathways Signatures and Pathways of Carcinogenicity An annotated compendium of biological pathways whose (aberrant) activation is associated with carcinogenicity/cancer induction
Can Carcinogenicity be predicted from GEP? The DrugMatrix/TG-GATEs answer Rat-based datasets from NIEHS & Japan ( thanks Scott Auerbach & Ray NTP )
Can Carcinogenicity be predicted from GEP? The DrugMatrix/TG-GATEs answer Rat-based datasets from NIEHS & Japan ( thanks Scott Auerbach & Ray NTP ) … Genotoxicity Carcinogenicity Compound 1 Compound 2 Compound 3 Compound N … Prediction Evaluation Classification Accuracy Sensitivity/Specificity ROC curve … Biology of Exposure Exposure MoA Pathways “Drivers” Exposure risk models Carcinogenicity Prediction “New” compound Carcinogen Non-Carcinogen Cell lines/iPSC treated w/ compounds ….. and profiled on Luminex-1000 Rats exposed to compounds ….. and profiled on Affymetrix … Gusentleitner et al., PLoS ONE 2014
Long-Term Carcinogenicity can be Predicted from Short-term Expression Assays Dose-independent labeling Dose-dependent labeling Gusentleitner et al., PLoS ONE 2014
Long-Term Carcinogenicity can be Predicted from Short-term Expression Assays
Carcinogenicity Prediction can be Improved by increasing the number of chemicals used to build model “Predictive Accuracy” Gusentleitner et al., PLoS ONE 2014
Genomic Modeling Helps Identify Pathways of Carcinogenicity PathwaysCHEMICALS non-Carcinogens Carcinogens
Carcinogenicity can be Captured by in-vitro (human) models Enrichment Score p < carc non-carc L1000-based gene ranking DrugMatrix signature genes Rat carcinogenicity signature can be mapped to human data Significant similarity of Rat and Human signatures 36 genes (FDR≤.05 | FC≥2) 121 samples (39 C vs. 82 NC) Human lung cell lines exposed to carcinogens and non-carcinogens Statistically significant markers identified Luminex-1000 data
In Progress high-throughput data generation Multi-platform (mirror) experiments Multiple platform comparison Luminex-1000 (L1000) 3’ Digital Gene Expression (3’DGE) Sparse Full Length/RNA-tag seq (SFL) Experimental design Chemicals selection (and dose/concentration) Tissue types (liver – HepG2; breast – MCF7, MCF10; lung – A549) challenging set-up (chemical procurement and dose determination) Multiple funding sources Evans ARC BUSRP admin supplement NIH/LINCS 1-year grant w/ Broad Art beCAUSE
Network-Based Analysis of Chemical Perturbations Discrimination carcinogens/non-carcinogens Genes driving the response to chemical exposure Predictive model control state perturbation states Compare control state to multiple perturbation states aggregate differences Capture aggregate differences difficult to see with standard analysis New goals Differential expression (standard) Differential connectivity control chemically perturbed control chemically perturbed 2015 ACS Meeting
Network Analysis Overview Module 1 Module 2 … Module p Compound 1 Compound 2 … Compound n lossgain connectivity Annotation Wild-Type Network 2015 ACS Meeting
Results Summary Networks structure captures grouping of compounds with similar functions and genotoxicity/carcinogenicity Differentially connected gene modules enriched for pathways related to chemicals’ action Statins Cholesterol biosynthesis, Lipid Metabolism, Steroid biosynthesis,... Chemoterapeutics Cell cycle, DNA replication, DNA damage response (P53) …
The Team Broad Institute Aravind Subramanian Xiaodong Lu cMap/LINCS team NTP/NIEHS Scott Auerbach Ray Tice BU CBM/Bioinformatics/SPH David Sherr Amy Li Daniel Gusenleitner Francesca Mulas
The End