Probabilistic Models that uncover the hidden Information Flow in Signalling Networks Achim Tresch.

Slides:



Advertisements
Similar presentations
Polynomial dynamical systems over finite fields, with applications to modeling and simulation of biological networks. IMA Workshop on Applications of.
Advertisements

A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Prediction of Therapeutic microRNA based on the Human Metabolic Network Ming Wu, Christina Chan Bioinformatics Advance Access Published January 7, 2014.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Introduction of Probabilistic Reasoning and Bayesian Networks
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Phylogenetic Trees Lecture 4
Florian Markowetz markowetzlab.org Joining the dots… Network analysis of gene perturbation data.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Phylogenetic Trees Presenter: Michael Tung
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Evaluation of Bayesian Networks Used for Diagnostics[1]
6. Gene Regulatory Networks
1 gR2002 Peter Spirtes Carnegie Mellon University.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html Computational Inference of Regulatory Networks from.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Probabilistic Models that uncover the hidden Information Flow in Signalling Networks.
Naive Bayes Classifier
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department,
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Shankar Subramaniam University of California at San Diego Data to Biology.
Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Probabilistic models for interpreting perturbations in networks: Nested effect models Sushmita Roy Computational Network Biology.
Learning gene regulatory networks in Arabidopsis thaliana
How to understand the cell by breaking it
Data Mining Lecture 11.
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Regulation Analysis using Restricted Boltzmann Machines
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Boltzmann Machine (BM) (§6.4)
Machine Learning: Lecture 6
Presentation transcript:

Probabilistic Models that uncover the hidden Information Flow in Signalling Networks Achim Tresch

-2- A model that explains the data merely finds associations E.g.: Epidemiology (predict colon cancer risk from SNPs) Which model? A model that explains the mechanism finds explanations E.g.: Physics, Systems Biology (predict the signal flow through a cascade of transcription factors)

-3- Which model? ? Our choice: Graphical Models nodes correspond to physical entities, arrows correspond to interactions Need for inter- ventional data Two different types of nodes: Observable components Perturbed components (signals) 1 st Idea

-4- How do marionettes walk?

-5- How do marionettes walk? This is what we observeThis is the true model ? Both models explain the observations perfectly. What makes the right model (biologically) more plausible?

-6- How do marionettes walk? This is what we observeThis is the true model ? Both models explain the observations perfectly. What makes a model (biologically) more plausible? Signal transmission is expensive! Find a consistent model with a most parsimonious effects graph Signals, Signal graph Γ Observables, Effects graph Θ 2 nd Idea

-7- Signal graph, Adjacency matrix Γ= (with 1´s in the diagonal) Effects graph, Adjacency matrix Θ = Signals Predicted effects F t Observables Parsimony Assumption: Each observable is linked to exactly one action Definition [Markowetz, Bioinformatics 2005]: A Nested Effects Model (NEM) is a model F for which F = Γ Θ Nested Effects Models

-8- Signals Predicted effects F t Observables Nested Effects Models Why „nested“ ? If the signal graph is transitively closed, then the observed effects are nested in the sense that a → b implies effects(a)  effects(b) The present formulation of a NEM drops the transitivity requirement. █  █  █ Predicted effects

-9- s a Signals Observables Effect of signal s on observable a R a,s Predicted effects = F t Measured effects = R t Nested Effects Models The final ingredient: A quantification of the measured effect strength R a,s > 0 if the data favours an effect of s on a

-10- Assuming independent data, it follows that Note: Missing data is handeled easily: set R s,a = 0 Nested Effects Models

-11- NEM Estimation There are two ways of finding a high scoring NEM: Maximum Likelihood: Bayesian, posterior mode: For n≤5 signals, an exhaustive parameter space search is possible. For larger n, apply standard optimization strategies: Gradient ascent, Simulated annealing or heuristics tailored to NEMs: Module networks [Fröhlich et al., BMC Bioinformatics 2007], Triplet search [Markowetz at al., Bioinformatics 2007] Theorem (Tresch, SAGeMB 2008): For ideal data, is unique up to reversals (Corollary: if Γ is a DAG).

-12- True graphs Γ,Θ simulated measure- ments (R) ideal measure- ments (ΓΘ) R/Bioconductor package: Nessy Simulation

-13- True graphEstimated graph Distribution of the likelihoods 12 edges, 2 12 =4096 signal graphs, ~ 4seconds Simulation

-14- a b Hypotheses: SL between two genes occurs if the genes are located in different pathways Genes sharing the same synthetic lethality partners have an increased chance of being located in the same pathway [Ye, Bader et al., Mol.Systems Biology 2005] Pathway I Pathway II Pathway I Pathway II synthetic lethality Consequence: A gene b whose SL partners are nested into the SL partners of another gene a is likely to be located beneath a in the same pathway. Application: Synthetic Lethality

-15- Application: Synthetic Lethality Pan et al., Cell 2006

-16- Application: Synthetic Lethality 7 of 10 Genes directly linked to DNA repair Tresch, unpublished

-17- References: Structure Learning in Nested Effects Models. A. Tresch, F. Markowetz, to appear in SAGeMB 2008, avaliable on the ArXive Nested Effects Models as a Means to learn Signaling Networks from Intervention Effects. H. Fröhlich, A. Tresch, F. Markowetz, M. Fellmann, R. Spang, T. Beissbarth, in preparation Computational identification of cellular networks and pathways F. Markowetz, Olga G. Troyanskaya, Dennis Kostka, Rainer Spang. Molecular BioSystems, Bioinformatics 2007 Non-transcriptional Pathway Features Reconstructed from Secondary Effects of RNA Interference. F. Markowetz, J. Bloch, R. Spang, Bioinformatics 2005 R/Bioconductor packages: NEM (Markowetz, Fröhlich, Beissbarth) Nessy (Tresch) Software, References

-18- Research & Teaching Activities Research related to the theory of NEMs Integration of multiple data sources Time-dependent NEMs Allow for arbitrary signalling model Teaching Lectures & Exercises in Bioinformatics, Machine Learning, Statistics for Physicians, Group Theory, Microarray Analysis E-learning Core Group of the Faculty Bachelor-/Master- and PhD theses Other Research Topics Software for data acquisition, -processing & -visualization for high-density technologies Design and analysis of biological/clinical experiments, consulting

-19- Florian Markowetz Lewis-Sigler Institute, Princeton Tim Beissbarth, Holger Fröhlich German Cancer Research Center, Heidelberg Rainer Spang Computational Diagnostics Group, Regensburg Acknowledgements

-20- Thank You! Conclusion Exercise: Why is this administration model inefficient? Construct a model that scores better!

-21-

-22- What I did not show … Automatic Feature Selection, without Control experiment: Estimated graph (120 genes selected)

-23- The „observed“ graph of the Fellmann estrogen receptor dataset What I did not show …

Genes 17 Knockdown Experiments 6 of them double Knockdowns What I did not show …

-25- Same Data, With prior knowledge. What I did not show …