Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Bayesian network for gene regulatory network construction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Dynamic Bayesian Networks (DBNs)
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Visual Recognition Tutorial
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
6. Gene Regulatory Networks
Today Logistic Regression Decision Trees Redux Graphical Models
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Analysis of the yeast transcriptional regulatory network.
Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Conditional Probability Distributions Eran Segal Weizmann Institute.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
Bayesian Networks for Regulatory Network Learning BMI/CS 576 Colin Dewey Fall 2015.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Module Networks BMI/CS 576 Mark Craven December 2007.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Today Graphical Models Representing conditional dependence graphically
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Representation, Learning and Inference in Models of Cellular Networks
Learning gene regulatory networks in Arabidopsis thaliana
Bayesian Networks Applied to Modeling Cellular Networks
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Estimating Networks With Jumps
1 Department of Engineering, 2 Department of Mathematics,
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computational Network Biology Biostatistics & Medical Informatics 826
Evaluation of inferred networks
Presentation transcript:

Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy

Goals for today Introduction – Different types of cellular networks Methods for network reconstruction from expression – Per-gene vs Per-module methods – Sparse Candidates Bayesian networks – Regression-based methods GENIE3 L1-DAG learn Assessing confidence in network structure

Why networks? “A system is an entity that maintains its function through the interaction of its parts” – Kohl & Noble

To understand cells as systems: measure, model, predict, refine Uwe Sauer, Matthias Heinemann, Nicola Zamboni, Science 2007

Different types of networks Physical networks – Transcriptional regulatory networks: interactions between regulatory proteins (transcription factors) and genes – Protein-protein: interactions among proteins – Signaling networks: protein-protein and protein-small molecule interactions to relay signals from outside the cell to the nucleus Functional networks – Metabolic: reactions through which enzymes convert substrates to products – Genetic: interactions among genes which when perturbed together produce a significant phenotype than when individually perturbed

Transcriptional regulatory networks Regulatory network of E. coli. 153 TFs (green & light red), 1319 targets Vargas and Santillan, 2008 A B Gene C Transcription factors (TF) C A B Directed, signed, weighted graph Nodes: TFs and Target genes Edges: A regulates B’s expression level DNA

Reactions associated with Galactose metabolism Metabolic networks Metabolites N M a b c O Enzymes d O M N KEGG Unweighted graph Nodes: Metabolic enzyme Edges: Enzymes M and N share a compound

Protein-protein interaction networks Barabasi et al Yeast protein interaction network XY X Y Un/weighted graph Nodes: Proteins Edges: Protein X physically interacts with protein Y

Challenges in network biology Network structure analysis Network reconstruction/in ference (today) AB X Y A B A XY Hubs, degree-distributions, Network motifs ? ? ? Identifying edges and their logic X=f(A,B) Y=g(B) Node attributes 2 1 Structure Parameters Network applications f f g ? Predicting function and activity of genes from network 3

Goals for today Introduction – Different types of cellular networks Methods for network reconstruction from expression – Per-gene vs Per-module methods – Sparse Candidates Bayesian networks – Regression-based methods GENIE3 L1-DAG learn Assessing confidence in network structure

Computational methods to infer networks We will focus on transcriptional regulatory networks – These networks control what genes get activated when – Precise gene activation or inactivation is crucial for many biological processes – Microarrays and RNA-seq allows us to systematically measure gene activity levels These networks are primarily inferred from gene expression data

What do we want a model to capture? X 3 =ψ(X 1,X 2 ) Function X1X1 X2X2 X3X3 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. How they determine expression levels? Sko1 Structure HSP12 Hot1 Who are the regulators? Hot1 regulates HSP12 HSP12 is a target of Hot1 Input: Transcription factor level (trans) HSP12 Output: mRNA levels Sko1 Hot1 Output: expression levels

Mathematical representations of regulatory networks X1X1 X2X2 X3X3 f Output expression of target gene Models differ in the function that maps regulator input levels to target levels Input expression/activity of regulators Rate equations Probability distributions Boolean NetworksDifferential equationsProbabilistic graphical models X1X X Input Output X1X2 X3

Regulatory network inference from expression Expression-based network inference Genes Experiments X2X2 Structure X3X3 X1X1 X 3 =f(X 1,X 2 ) Function X1X1 X2X2 X3X3 Expression level of gene i in experiment j

Two classes of expression-based methods Per-gene/direct methods (Today) Module based methods (Thursday) X5X5 X3X3 X1X1 X2X2 Module X3X3 X1X1 X2X2 X5X5 X3X3 X4X4

Per-gene methods X3X3 X1X1 X2X2 X5X5 X3X3 X4X4 Key idea: find the regulators that “best explain” expression level of a gene Probabilistic graphical methods – Bayesian network Sparse Candidates – Dependency networks GENIE3, TIGRESS Information theoretic methods – Context Likelihood of relatedness – ARACNE

Module-based methods Find regulators for an entire module – Assume genes in the same module have the same regulators Module Networks (Segal et al. 2005) Stochastic LeMoNe (Joshi et al. 2008) Per module Y2Y2 Y1Y1 X1X1 X2X2 Module

Goals for today Introduction – Different types of cellular networks Methods for network reconstruction from expression – Per-gene vs Per-module methods – Sparse Candidates Bayesian networks – Regression-based methods GENIE3 L1-DAG learn Assessing confidence in network structure

Notation V : A set of p network components – p genes E : Edge set connecting V G=(V, E). G is the graph we wish to infer X v : Random variable, for v ε V X={X 1,.., X p } D : Dataset of N measurements for X – D: {x 1,…x N } Θ : Set of parameters associated with the network Spang and Markowetz, BMC Bioinformatics 2005

Bayesian networks (BN) Denoted by B={G, Θ} – G : Graph is directed and acyclic (DAG) – Pa(X i ): Parents of X i – Θ : { θ 1,.., θ p } Parameters for p conditional probability distributions (CPD) P(X i | Pa(X i ) ) Vertices of G correspond to random variables X 1 … X p Edges of G encode directed influences between X 1 … X p

A simple Bayesian network of four variables Adapted from “Introduction to graphical models”, Kevin Murphy, 2001 Random variables: Cloudy ε {T, F} Sprinker ε {T, F} Rain ε {T, F} WetGrass ε {T,F}

A simple Bayesian network of four variables Conditional probability distributions (CPD) Adapted from “Introduction to graphical models”, Kevin Murphy, 2001 Random variables: Cloudy ε {T, F} Sprinker ε {T, F} Rain ε {T, F} WetGrass ε {T,F}

Bayesian network representation of a regulatory network Bayesian network T ARGET ( CHILD ) R EGULATORS (P ARENTS ) X1X1 X2X2 X3X3 X1X1 X2X2 X3X3 P(X 3 |X 1,X 2 ) Random variables HSP12 Sko1 Hot1 Inside the cell Hot1: Sko1: Hsp12: P(X 1 ) P(X 2 )

Bayesian networks compactly represent joint distributions

Example Bayesian network of 5 variables C HILD P ARENTS X1X1 X2X2 X3X3 X5X5 X4X4 No independence assertions Independence assertions Assume X i is binary Needs 2 5 measurements Needs 2 3 measurements

CPD in Bayesian networks The CPD P(X i |Pa(X i )) specifies a distribution over values of X i for each combination of values of Pa(X i ) CPD P(X i |Pa(X i )) can be parameterized in different ways X i are discrete random variables – Conditional probability table or tree X i are continuous random variables – CPD can be Gaussians or regression trees

Consider four binary variables X 1, X 2, X 3, X 4 Representing CPDs as tables X1X1 X2X2 X3X3 tf ttt ttf tft tff ftt ftf0.5 fft fff P( X 4 | X 1, X 2, X 3 ) as a table X4X4 X1X1 X2X2 X4X4 X3X3 Pa(X 4 ): X 1, X 2, X 3

Estimating CPD table from data Assume we observe the following assignments for X 1, X 2, X 3, X 4 TFTT TTFT TTFT TFTT TFTF TFTF FFTF X1X1 X2X2 X3X3 X4X4 For each joint assignment to X 1, X 2, X 3, estimate the probabilities for each value of X 4 For example, consider X 1 =T, X 2 =F, X 3 =T P(X 4 =T|X 1 =T, X 2 =F, X 3 =T)=2/4 P(X 4 =F|X 1 =T, X 2 =F, X 3 =T)=2/4 N=7

A tree representation of a CPD P( X 4 | X 1, X 2, X 3 ) as a tree X1X1 P(X 4 = t ) = 0.9 f t X2X2 P(X 4 = t ) = 0.5 ft X3X3 P(X 4 = t ) = 0.8 ft X1X1 X2X2 X4X4 X3X3 Allows more compact representation of CPDs, by ignoring some unlikely relationships.

The learning problems in Bayesian networks Parameter learning on known graph structure – Given data D and G, learn Θ Structure learning – Given data D, learn G and Θ

Structure learning using score-based search... A function of how well B describes the data D

Scores for Bayesian networks Maximum likelihood Regularized maximum likelihood Bayesian score

Decomposability of scores The score of a Bayesian network B decomposes over individual variables Enables efficient computation of the score change to local changes Joint assignment to Pa(X i ) in the d th sample

Search space of graphs is huge For N variables there are possible graphs Set of possible networks grows super exponentially NNumber of networks Need approximate methods to search the space of networks

Greedy Hill climbing to search Bayesian network space Input: D={x 1,..,x N }, An initial graph, B 0 ={ G 0, Θ 0 } Output: B best Loop until convergence: – { B i 1,.., B i m } = Neighbors(B i ) by making local changes to B i – B i+1 : arg max j ( Score(B i j ) ) Termination: – B best = B i

Local changes to B i A B CD A B C D add an edge A B C D delete an edge Current network Check for cycles BiBi

Challenges with applying Bayesian network to genome-scale data Number of variables, p is in thousands Number of samples, N is in hundreds

Extensions to Bayesian networks to handle genome-scale networks Sparse candidate algorithm – Friedman, Nachman, Pe’er Bootstrap to identify high scoring graph features – Friedman, Linial, Nachman, Pe’er Module networks (subsequent lecture) – Segal, Pe’er, Regev, Koller, Friedman Add graph priors (subsequent lecture, hopefully)

The Sparse candidate Structure learning in Bayesian networks Key idea: Identify k “promising” candidate parents for each X i – k<<p, p : number of random variables – Candidates define a “skeleton graph” H Restrict graph structure to select parents from H Early choices in H might exclude other good parents – Resolve using an iterative algorithm

Sparse candidate algorithm Input: – A data set D – An initial Bayes net B 0 – A parameter k : max number of parents per variable Output: – Final B Loop until convergence – Restrict Based on D and B n-1 select candidate parents C i n-1 for X i This defines a skeleton directed network H n – Maximize Find network B n that maximizes the score Score(B n ;D) among networks satisfying Termination: Return B n

Selecting candidate parents in the Restrict Step A good parent for X i is one with strong statistical dependence with X i – Mutual information provides a good measure of statistical dependence I(X i ; X j ) – Mutual information should be used only as a first approximation Candidate parents need to be iteratively refined to avoid missing important dependences A good parent for X i has the highest score improvement when added to Pa(X i )

Mutual Information Measure of statistical dependence between two random variables, X i and X j

Mutual information can miss some parents Consider the following true network If I(A;C)>I(A;D)>I(A;B) and we are selecting k<=2 parents, B will never be selected as a parent How do we get B as a candidate parent? If we used mutual information alone to select candidates, we might be stuck with C and D A B C D True network

Computational savings in Sparse Candidate Ordinary hill climbing – O(2 n ) possible parent sets – O(n 2 ) initial score change calculations – O(n) for subsequent iterations Complexity of learning constrained on a skeleton directed graph – O(2 k ) possible parent sets – O(nk) initial score change calculations – O(k) for subsequent iterations

Sparse candidate learns good networks faster than hill-climbing Dataset 1Dataset variables200 variables Greedy hill climbing takes much longer to reach a high scoring bayesian network Score (higher is better)

Some comments about choosing candidates How to select k in the sparse candidate algorithm? Should k be the same for all X i ? If the data are Gaussian could be do something better? Regularized regression approaches can be used to estimate the structure of an undirected graph L1-Dag learn provides an alternate – Schmidt, Niculescu-Mizil, Murphy 2007 – Estimate an undirected dependency network G undir – Learn a Bayesian network constrained on G undir

Dependency network A type of probabilistic graphical model As in Bayesian networks has – A graph component – A probability component Unlike Bayesian network – Can have cyclic dependencies Dependency Networks for Inference, Collaborative Filtering and Data visualization Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Selecting candidate regulators for the i th gene using regularized linear regression XiXi = X 1 …… X p-1 bibi 1 N 1 p-1 1 N 1 Regularization term ??? … XiXi Candidates Everything other than X i L1 norm, sparsity imposing Sets many regression coefficients to 0 Also called Lasso regression

Learning dependency networks Learning: estimate a set of conditional probability distributions, one per variable. P(X,|X -j ) could be estimated by solving A set of linear regression problem Meinhausen & Buhlmann, 2006 TIGRESS (Haury et al, 2010) A set of non-linear regression problems Non-linearity captured by Regression Tree (Heckerman et al, 2000) Non-linearity captured by Random forest GENIE3, (Huynh-Thu et al, 2010)

Where do different methods rank? Marbach et al., 2012 Community Random

Goals for today Introduction – Why should we care? – Different types of cellular networks Methods for network reconstruction from expression – Per-gene methods Sparse Candidates Bayesian networks Regression-based methods – GENIE3 – L1-DAG learn – Per-module methods Assessing confidence in network structure

Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the “right” network One can however estimate the confidence of specific features of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X, Y ’s ancestor?

How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network B i For each feature of interest f, calculate confidence

Does the bootstrap confidence represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes Experimental conditions

Application of Bayesian network to yeast expression data 76 experiments/microarrays 800 genes Bootstrap procedure on 200 subsampled datasets Sparse candidate as the Bayesian network learning algorithm

Bootstrap-based confidence differs between real and actual data f f Random Real

Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary Network inference from expression provides a promising approach to identify cellular networks Bayesian networks are one representation of networks that have a probabilistic and graphical component – Network inference naturally translates to learning problems in Bayesian networks. Successful application of Bayesian networks to expression data requires additional considerations – Reduce potential parents statistically or using biological knowledge – Bootstrap based confidence estimation

Linear regression with N inputs Y : output interceptParameters/coefficients Given: Data= Estimate:

Information theoretic concepts Kullback Leibler (KL) Divergence – Distance between two distributions Mutual information – Measures statistical dependence between X and Y – Equal to KL Divergence between P(X,Y) and P(X)P(Y) Conditional Mutual information – Measures the information between two variables given a third

KL Divergence P(X), Q(X) are two distributions over X

Measuring relevance of Y to X M Disc (X,Y) – D KL (P(X,Y)||P B (X,Y)) M Shield (X,Y) – I(X;Y|Pa(X)) M score (X,Y) – Score(X;Y,Pa(X),D)

Conditional Mutual Information Measures the mutual information between X and Y, given Z If Z captures everything about X, knowing Y gives no more information about X. Thus the conditional mutual information would be zero.

What do the Bayesian network edges represent? Spang and Markowetz, BMC Bioinformatics 2005 Is it just correlation? No. High correlation could be due to any of the three possible regulatory mechanisms