Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti Section of Computational.

Similar presentations


Presentation on theme: "Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti Section of Computational."— Presentation transcript:

1 Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational BioMedicine Boston University School of Medicine Biostatistics, BUSPH Bioinformatics Program, BU Graduate Program in Genetics & Genomics, BU Broad Institute of MIT & Harvard

2 Abstract network inference and differential analysis multiple genomic data types mechanism(s) of cancer induction Development and application of novel methods of network inference and differential analysis from multiple genomic data types toward the elucidation of a chemical's mechanism(s) of cancer induction

3 Abstract network inference and differential analysis high-dimensional data types functionally relevant modules Development and application of novel methods of network inference and differential analysis from high-dimensional data types toward the elucidation of functionally relevant modules (generalization) high-dimensional data types functionally relevant modules domain specific

4 The Motivating Problem

5 Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Pathways affected Pathways affected Driver alterations Driver alterations Biomarkers Biomarkers … Understand Why Manuscript under Review

6 Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Non-carcinogens Carcinogens gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 … To generate this ‘matrix’ 100,000s of experiments need to be performed To generate this ‘matrix’ 100,000s of experiments need to be performed 1,000 of controls generated

7 In Progress high-throughput data generation 384-well plate 100,000s profiles  Phase I 24 plates (liver and lung) ~200 compounds  ~10,000 profiles Future plans …  Phase II More tissue types (breast, prostate, etc.) More compounds (~1,500) Mixtures  100,000s profiles  Phase III iPSC-derived cells & 3D cultures “personalized exposure” models

8 Generalization of the Motivating Problem control stateperturbation states  Comparison of a control state to multiple perturbation states  Standard approaches of gene-based differential analysis might miss salient (aggregate) differences  High-dimensional data (1000s of ‘features’) Usually representable as 2D [10K x 1K] matrices  Large sample size for the ‘control state’ ≥1000 observations  Small sample size for each of the ‘perturbation states’ ~10-100 observations/perturbation

9 Generalization of the Motivating Problem: an example The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations >10,000 compounds (most FDA approved drugs) ~5,000 genetic perturbation (RNAi, CRISPR) 18 cell types, multiple doses, time-points > 1,000,000 profiles Main Goal: Drug Discovery The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations >10,000 compounds (most FDA approved drugs) ~5,000 genetic perturbation (RNAi, CRISPR) 18 cell types, multiple doses, time-points > 1,000,000 profiles Main Goal: Drug Discovery

10 Approach Overview Module 1 Module 2 … Module p Compound 1 Compound 2 … Compound n lossgain connectivity Annotation Wild-Type Network

11 Approach Overview Module 1 Module 2 … Module p Compound 1 Compound 2 … Compound n lossgain connectivity Network construction Module Identification Annotation Wild-Type Network Module/Network Comparison

12 Approach Details networks’ construction  Correlations Networks clustering vs. topology-based ‘module’ identification  Gaussian models Inverse covariance matrix  partial correlations  Correlation networks + “scale-free transformations” mostly for comparison w/ existing methods

13 Approach Details networks’ comparison  Covariance matrices comparison  Probabilistic Model Selection Bayes Factor  Network topology Diffusion State Distance (M. Crovella) and related

14 The Data  Gene expression profiles networks’ inference  Protein-protein interaction networks’ priors  “Cell painting” profiles networks’ annotation 100K samples 10K features (genes)

15

16 Deliverables  Computational Toolbox Network inference and visualization Module (i.e., sub-network) identification/comparison Network/module-based clustering/annotation  Analysis and cataloguing of chemical perturbations mechanisms of action Chemicals’ putative mechanisms of action Interpretable Interpretable carcinogenicity predictor(s) sandbox  A sandbox for researchers to develop and test new methods richly annotated multi-type data domain expertise to evaluate relevance/usefulness pursuit of further funding  Preliminary results for pursuit of further funding

17 The Team Stefano Monti, Ph.D. (Assoc. Professor) Computational Biology, Cancer Genomics, Machine Learning (Bayesian Networks) Paola Sebastiani, Ph.D. (Professor) Biostatistics, Genetics/Genomics, Bayesian Graphical Models Mark Cravella, Ph.D. (Professor) Computer Science, Network Analysis Simon Kasif (Professor) Computational Biology, Systems Biology, Machine Learning Francesca Mulas, Ph.D. (Post-doctoral Fellow) Computational Biology/Bioinformatics, Computer Science Daniel Gusenleiter, M.S. (Ph.D. student) Bioinformatics, Computer Science, Machine Learning

18 “Background” Team BU-SRP David Ozonoff Basra Komal Heather Henry (NIEHS) Evans Foundation - ARC Katya Ravid Robin MacDonald NTP/NIEHS Scott Auerbach Ray Tice Broad Institute Aravind Subramanian Xiaodong Lu Todd Golub cMAP team BU CBM/Bioinformatics/SPH David Sherr (co-PI) Daniel Gusenleitner Jessalyn Ubellacker Tisha Meila Harold Gomez Yuxiang Tan Liye Zhang Elizabeth Moses Teresa Wang Marc Lenburg Avi Spira

19 The End


Download ppt "Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti Section of Computational."

Similar presentations


Ads by Google