Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.

Slides:



Advertisements
Similar presentations
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Advertisements

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Introduction of Probabilistic Reasoning and Bayesian Networks
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
6. Gene Regulatory Networks
Bayesian Networks Alan Ritter.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Structure Learning for Inferring a Biological Pathway Charles Vaske Stuart Lab.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Bayes Net Perspectives on Causation and Causal Inference
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
A Brief Introduction to Graphical Models
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Gene Set Enrichment Analysis (GSEA)
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Introduction to biological molecular networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Using Bayesian Networks to Analayze Expression Data
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Learning gene regulatory networks in Arabidopsis thaliana
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Building and Analyzing Genome-Wide Gene Disruption Networks
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Class #19 – Tuesday, November 3
LECTURE 15: REESTIMATION, EM AND MIXTURES
CS 188: Artificial Intelligence Spring 2007
Presentation transcript:

cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University

cs726 Outline Regulatory networks Expression data Bayesian Networks What Why How Learning networks from expression data Using Bayesian networks to analyze expression data (Friedman et al)

cs726 Regulatory networks KEGG Regulatory Pathways

cs726

Metabolic pathways KEGG Metabolic Pathways

cs726

Expression Arrays Measure the expression levels of thousands of genes in a cell under specific conditions (e.g. cell cycle) simultaneously Each cell has the same genomic data but different subsets of proteins are being expressed in different cells and at the same cell under different conditions. Protein level is controlled by controlling –transcription initiation –mRNA transcription –mRNA transport –splicing –post-translational modifications –degradation of mRNA and proteins. Microarray measure the level of mRNA, thus providing an indirect evidence for the control of protein levels

cs726 Micro Spotting pin

cs726

Some are over-expressed (red), some under-expressed (green) measured with respect to a control group of genes (“fixed” genes) Different pathways are activated under different conditions

cs726 Goals Recover protein interactions and sub-networks that correspond to regulatory networks in the cell. Basic assumption: some genes are dependent on others while others exhibit independence or conditional independence The means: Bayesian networks. Capable of modeling the statistical dependencies between different variables (genes) Different from clustering analysis.. Applicable when the dependency between genes is “local” Problems: data is noisy, partial, sometimes misleading (translation, activation), not enough to ensure statistically significant models, time scale

cs726 Bayesian Networks A compromise between the assumption of complete dependency and complete conditional independence (Naïve Bayes) Less constraining yet still tractable We know something about the statistical dependencies between features but not necessarily about the type of the underlying distributions

cs726 Example Oil pressure In engine Fan speed Coolant temp. Engine temp. Oil temp. Air pressure In tire Smoke

cs726 Bayesian Networks Also called belief nets A graph description of dependencies and independencies between variables. Each node corresponds to a variable (gene). The graph is directed and acyclic The variables are discrete A variable can take on a value from a set of values {a1,a2,…} e.g. on/off The probability of a specific value P(a i ) and  i P(a i ) = 1 A link joining node A to node C is directional and represents the set of conditional probabilities P(c j /a i ) – causality (the probability that C is on when A is off) The network is described in terms of its nodes and edges and the conditional probability distributions associated with each node.

cs726 Network structure For every node A The parents of A is the set of immediate predecessors of A The children of A is the set of immediate successors of A B is a descendant of A if there is a directed path from A to B Conditional probability Network assertions: The value of a variable depends on its parents A variable is conditionally independent of it non-descendants given its parents Eng cold,Eng cold, Eng hot, Eng hot Fan fastFan slowFan fastFan slow High Low Coolant temp. Parents Fan speed Coolant temp. Engine temp. Oil temp. Smoke

cs726 Calculating the probability of an assignment The network describes the joint probability distribution of all variables (some conditionally independent and some are not) Depends on the structure! The probability of a specific assignment of values y 1,y 2,…y n for the variables Y 1,Y 2,…,Y n This is the likelihood of the data given the model. All you need to know is..

cs726 Learning Bayesian network from data Given the data set with specific assignments for variables (on/off for each gene), how can we find the most probable network structure that explains the data (the best “match” to the data)? How to quantify a match? Note that there are two aspects of the network that we need to learn –Structure (nodes, edges) –Conditional probability distributions Common strategy: assign a score to each network G

cs726 Common strategy: assign a score to each network G Pick the network that maximizes the score Likelihoodprior

cs726 Learning The likelihood of the data given the model is estimated by averaging over all possible assignments of parameters (conditional probabilities) to G Summation over all possible assignments for conditional probabilities. The major contribution is from the set estimated from the data Given a specific structure, for every node we lookup its parents and calculate the empirical conditional probability distribution

cs726 Model selection The second term (log prior) is a measure for the complexity of the model (through uncertainty) Occam razor : entia non sunt multiplicanda praeter necessitatem (thou shall not multiply entities) MDL principle In the papers discussed here it is being ignored

cs726 In search for the best network In theory: test different structures, calculate the probability of assignment to variables for each network structure, and output the network that maximizes the likelihood of the data given the network. Impossible in practice – the number of possible networks over n genes is For the yeast genome with 6000 genes this is > 10 5,000,000

cs726 Possible solution Apply a heuristic local greedy search: Start with a random network and locally improve it, by testing perturbations over the original structure. Test one edge at a time, by adding, removing or reversing the edge, and testing its affect on the score. If the score improves - accept

cs726 How to learn from expression data Two types of features learned from multiple networks First - a gene Y is in the Markov blanket of X (two genes are involved in the same biological process. No other gene mediates the dependence) Problem of unobserved variables that can intermediate the interaction Second type – a gene X is ancestor of Y (based on all networks that are learned)

cs726 Application to the Yeast Cell cycle data Expression level measurements for 6177 genes along different time points in six cell cycles – altogether 76 measurements for each gene Only 800 genes vary during cell cycle and 250 cluster into 8 fairly distinct classes. Networks are learned for the 800 genes Confidence values based on the set of networks learned from different bootstrap sets

cs726 Typical sub-network

cs726 Biological significance Order relations: there are a few dominant genes that appear before many others, e.g. genes that are involved in cell cycle control and initiation.

cs726 Most are nuclear proteins, but also cytoplasm membrane proteins (budding and sporulation) Some DNA repair proteins (prerequisite for transcription) RSR1 – initiator of signal trunsduction cascades in the cell

cs726 Biological significance Markov connection: functionally related

cs726 Most pairs have similar functions (verified sometimes through transitivity) Some are physically adjacent on the chromosome Some relations cannot be detected directly from expression data Detect conditional independence – group of genes that are expressed similarly, but one is a parent of all others and there are no connections between the others the parent is a control gene (e.g. CLN2 early cell cycle control gene, that controls RNR3, SVS1, SRO4 and RAD41 that are functionally unrelated).

cs726 Conclusions A powerful tool, but –not enough data –Computational problems –Learning algorithms –Authors decompose networks into basic elements again Many possible extensions