Download presentation
Presentation is loading. Please wait.
Published byMadison Cummings Modified over 9 years ago
1
Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational BioMedicine Boston University School of Medicine Biostatistics, BUSPH Bioinformatics Program, BU Graduate Program in Genetics & Genomics, BU Broad Institute of MIT & Harvard
2
Abstract network inference and differential analysis multiple genomic data types mechanism(s) of cancer induction Development and application of novel methods of network inference and differential analysis from multiple genomic data types toward the elucidation of a chemical's mechanism(s) of cancer induction
3
Abstract network inference and differential analysis high-dimensional data types functionally relevant modules Development and application of novel methods of network inference and differential analysis from high-dimensional data types toward the elucidation of functionally relevant modules (generalization) high-dimensional data types functionally relevant modules domain specific
4
The Motivating Problem
5
Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Pathways affected Pathways affected Driver alterations Driver alterations Biomarkers Biomarkers … Understand Why Manuscript under Review
6
Goals Development of “Carcinogenicity Biomarker(s)” Carcinogenicity Prediction Model Carcinogenicity ChemicalChemical Carcinogen Non-carcinogen Non-carcinogens Carcinogens gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 … To generate this ‘matrix’ 100,000s of experiments need to be performed To generate this ‘matrix’ 100,000s of experiments need to be performed 1,000 of controls generated
7
In Progress high-throughput data generation 384-well plate 100,000s profiles Phase I 24 plates (liver and lung) ~200 compounds ~10,000 profiles Future plans … Phase II More tissue types (breast, prostate, etc.) More compounds (~1,500) Mixtures 100,000s profiles Phase III iPSC-derived cells & 3D cultures “personalized exposure” models
8
Generalization of the Motivating Problem control stateperturbation states Comparison of a control state to multiple perturbation states Standard approaches of gene-based differential analysis might miss salient (aggregate) differences High-dimensional data (1000s of ‘features’) Usually representable as 2D [10K x 1K] matrices Large sample size for the ‘control state’ ≥1000 observations Small sample size for each of the ‘perturbation states’ ~10-100 observations/perturbation
9
Generalization of the Motivating Problem: an example The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations >10,000 compounds (most FDA approved drugs) ~5,000 genetic perturbation (RNAi, CRISPR) 18 cell types, multiple doses, time-points > 1,000,000 profiles Main Goal: Drug Discovery The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations >10,000 compounds (most FDA approved drugs) ~5,000 genetic perturbation (RNAi, CRISPR) 18 cell types, multiple doses, time-points > 1,000,000 profiles Main Goal: Drug Discovery
10
Approach Overview Module 1 Module 2 … Module p Compound 1 Compound 2 … Compound n lossgain connectivity Annotation Wild-Type Network
11
Approach Overview Module 1 Module 2 … Module p Compound 1 Compound 2 … Compound n lossgain connectivity Network construction Module Identification Annotation Wild-Type Network Module/Network Comparison
12
Approach Details networks’ construction Correlations Networks clustering vs. topology-based ‘module’ identification Gaussian models Inverse covariance matrix partial correlations Correlation networks + “scale-free transformations” mostly for comparison w/ existing methods
13
Approach Details networks’ comparison Covariance matrices comparison Probabilistic Model Selection Bayes Factor Network topology Diffusion State Distance (M. Crovella) and related
14
The Data Gene expression profiles networks’ inference Protein-protein interaction networks’ priors “Cell painting” profiles networks’ annotation 100K samples 10K features (genes)
16
Deliverables Computational Toolbox Network inference and visualization Module (i.e., sub-network) identification/comparison Network/module-based clustering/annotation Analysis and cataloguing of chemical perturbations mechanisms of action Chemicals’ putative mechanisms of action Interpretable Interpretable carcinogenicity predictor(s) sandbox A sandbox for researchers to develop and test new methods richly annotated multi-type data domain expertise to evaluate relevance/usefulness pursuit of further funding Preliminary results for pursuit of further funding
17
The Team Stefano Monti, Ph.D. (Assoc. Professor) Computational Biology, Cancer Genomics, Machine Learning (Bayesian Networks) Paola Sebastiani, Ph.D. (Professor) Biostatistics, Genetics/Genomics, Bayesian Graphical Models Mark Cravella, Ph.D. (Professor) Computer Science, Network Analysis Simon Kasif (Professor) Computational Biology, Systems Biology, Machine Learning Francesca Mulas, Ph.D. (Post-doctoral Fellow) Computational Biology/Bioinformatics, Computer Science Daniel Gusenleiter, M.S. (Ph.D. student) Bioinformatics, Computer Science, Machine Learning
18
“Background” Team BU-SRP David Ozonoff Basra Komal Heather Henry (NIEHS) Evans Foundation - ARC Katya Ravid Robin MacDonald NTP/NIEHS Scott Auerbach Ray Tice Broad Institute Aravind Subramanian Xiaodong Lu Todd Golub cMAP team BU CBM/Bioinformatics/SPH David Sherr (co-PI) Daniel Gusenleitner Jessalyn Ubellacker Tisha Meila Harold Gomez Yuxiang Tan Liye Zhang Elizabeth Moses Teresa Wang Marc Lenburg Avi Spira
19
The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.