Bayesian Networks Applied to Modeling Cellular Networks

Slides:

Advertisements

Similar presentations

Bayesian network for gene regulatory network construction

Advertisements

A Tutorial on Learning with Bayesian Networks

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Dynamic Bayesian Networks (DBNs)

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Introduction of Probabilistic Reasoning and Bayesian Networks

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Lecture 5: Learning models using EM

Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

6. Gene Regulatory Networks

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Slides for “Data Mining” by I. H. Witten and E. Frank.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Bayesian Networks for Regulatory Network Learning BMI/CS 576 Colin Dewey Fall 2015.

Flat clustering approaches

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

Module Networks BMI/CS 576 Mark Craven December 2007.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

Introduction on Graphic Models

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Hidden Markov Models BMI/CS 576

Oliver Schulte Machine Learning 726

CS 2750: Machine Learning Directed Graphical Models

Representation, Learning and Inference in Models of Cellular Networks

Qian Liu CSE spring University of Pennsylvania

Inference in Bayesian Networks

Read R&N Ch Next lecture: Read R&N

Bayes Net Learning: Bayesian Approaches

Learning Sequence Motif Models Using Expectation Maximization (EM)

Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001

CSCI 5822 Probabilistic Models of Human and Machine Learning

Hierarchical clustering approaches for high-throughput data

Still More Uncertainty

Bayesian Networks: Motivation

Read R&N Ch Next lecture: Read R&N

Hidden Markov Models Part 2: Algorithms

Bayesian Models in Machine Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Propagation Algorithm in Bayesian Networks

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CSCI 5822 Probabilistic Models of Human and Machine Learning

Learning Sequence Motif Models Using Gibbs Sampling

Evaluation of inferred networks

Artificial Intelligence Chapter 20 Learning and Acting with Bayes Nets

Class #19 – Tuesday, November 3

Class #16 – Tuesday, October 26

Parameter Learning 2 Structure Learning 1: The good

Chapter 20. Learning and Acting with Bayes Nets

Read R&N Ch Next lecture: Read R&N

Read R&N Ch Next lecture: Read R&N

Presentation transcript:

Bayesian Networks Applied to Modeling Cellular Networks BMI/CS 576 www.biostat.wisc.edu/bmi576/ Colin Dewey cdewey@biostat.wisc.edu Fall 2009

Probabilistic Model of lac Operon suppose we represent the system by the following discrete variables L (lactose) present, absent G (glucose) present, absent I (lacI) present, absent C (CAP) present, absent lacI-unbound true, false CAP-bound true, false Z (lacZ) high, low, absent suppose (realistically) the system is not completely deterministic the joint distribution of the variables could be specified by 26 × 3 - 1 = 191 parameters

A Bayesian Network for the lac System Pr ( L ) Z L G C I lacI-unbound CAP-bound absent present 0.9 0.1 Pr ( lacI-unbound | L, I ) L I true false absent 0.9 0.1 present Pr ( Z | lacI-unbound, CAP-bound ) lacI-unbound CAP-bound absent low high true false 0.1 0.8

Bayesian Networks Also known as Directed Graphical Models a BN is a Directed Acyclic Graph (DAG) in which the nodes denote random variables each node X has a conditional probability distribution (CPD) representing P(X | Parents(X) ) the intuitive meaning of an arc from X to Y is that X directly influences Y formally: each variable X is independent of its non-descendants given its parents

Bayesian Networks a BN provides a factored representation of the joint probability distribution Z L G C I lacI-unbound CAP-bound 1 + 1 + 1 + 1 + 4 + 4 + 4 x 2 = 20 this representation of the joint distribution can be specified with 20 parameters (vs. 191 for the unfactored representation)

Representing CPDs for Discrete Variables CPDs can be represented using tables or trees consider the following case with Boolean variables A, B, C, D Pr( D | A, B, C ) A Pr(D = T) = 0.9 F T B Pr(D = T) = 0.5 C Pr(D = T) = 0.8 Pr( D | A, B, C ) A B C T F 0.9 0.1 0.8 0.2 0.5

Representing CPDs for Continuous Variables we can also model the distribution of continuous variables in Bayesian networks one approach: linear Gaussian models X normally distributed around a mean that depends linearly on values of its parents ui

The Parameter Learning Task Given: a set of training instances, the graph structure of a BN Do: infer the parameters of the CPDs this is straightforward when there aren’t missing values, hidden variables L I G C L G I C lacI-unbound CAP-bound Z present true false low absent high ... lacI-unbound CAP-bound Z

The Structure Learning Task Given: a set of training instances Do: infer the graph structure (and perhaps the parameters of the CPDs too) L G I C lacI-unbound CAP-bound Z present true false low absent high ...

The Structure Learning Task structure learning methods have two main components a scheme for scoring a given BN structure a search procedure for exploring the space of structures

Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000 expression levels in populations of yeast cells 800 genes 76 experimental conditions

Learning Bayesian Network Structure given a function for scoring network structures, we can cast the structure-learning task as a search problem figure from Friedman et al., Journal of Computational Biology, 2000

Structure Search Operators B C D add an edge reverse an edge delete an edge A B C D A B C D A B C D

Bayesian Network Structure Learning we need a scoring function to evaluate candidate networks; Friedman et al. use one with the form log probability of data D given graph G log prior probability of graph G where they take a Bayesian approach to computing i.e. don’t commit to particular parameters in the Bayes net

The Bayesian Approach to Structure Learning Friedman et al. take a Bayesian approach: How can we calculate the probability of the data without using specific parameters (i.e. probabilities in the CPDs)? let’s consider a simple case of estimating the parameter of a weighted coin…

The Beta Distribution suppose we’re taking a Bayesian approach to estimating the parameter θ of a weighted coin the Beta distribution provides an appropriate prior where # of “imaginary” heads we have seen already # of “imaginary” tails we have seen already 1 continuous generalization of factorial function

The Beta Distribution suppose now we’re given a data set D in which we observe Mh heads and Mt tails the posterior distribution is also Beta: we say that the set of Betas distributions is a conjugate family for binomial sampling

The Beta Distribution assume we have a distribution P(θ) that is Beta(αh, αt) what’s the marginal probability (i.e. over all θ) that our next coin flip would be heads? what if we ask the same question after we’ve seen M actual coin flips?

The Dirichlet Distribution for discrete variables with more than two possible values, we can use Dirichlet priors Dirichlet priors are a conjugate family for multinomial data if P(θ) is Dirichlet(α1, . . . , αK), then P(θ|D) is Dirichlet(α1+M1, . . . , αK+MK), where Mi is the # occurrences of the ith value Conjugate families are used for mathematical convenience, not necessarily the correct form of prior one would want, but is natural if interpreting as posterior after some previous observations

Scoring Bayesian Network Structures when the appropriate priors are used, and all instances in D are complete, the scoring function can be decomposed as follows thus we can score a network by summing terms (computed as just discussed) over the nodes in the network efficiently score changes in a local search procedure D

The Bayesian Approach to Scoring BN Network Structures we can evaluate this type of expression fairly easily because parameter independence: the integral can be decomposed into a product of terms: one per variable Beta/Dirichlet are conjugate families (i.e. if we start with Beta priors, we still have Beta distributions after updating with data) the integrals have closed-form solutions

Bayesian Network Search: The Sparse Candidate Algorithm [Friedman et al., UAI 1999] Given: data set D, initial network B0, parameter k

The Restrict Step In Sparse Candidate to identify candidate parents in the first iteration, can compute the mutual information between pairs of variables where denotes the probabilities estimated from the data set

The Restrict Step In Sparse Candidate B C D suppose true network structure is: we’re selecting two candidate parents for A and I(A;C) > I(A;D) > I(A;B) the candidate parents for A would then be C and D; how could we get B as a candidate parent on the next iteration? A D C

The Restrict Step In Sparse Candidate Kullback-Leibler (KL) divergence provides a distance measure between two distributions, P and Q mutual information can be thought of as the KL divergence between the distributions (assumes X and Y are independent)

The Restrict Step In Sparse Candidate we can use KL to assess the discrepancy between the network’s estimate Pnet(X, Y) and the empirical estimate true distribution current Bayes net A B C D A D C B

The Restrict Step in Sparse Candidate important to ensure monotonic improvement

The Maximize Step in Sparse Candidate hill-climbing search with add-edge, delete-edge, reverse-edge operators test to ensure that cycles aren’t introduced into the graph

Efficiency of Sparse Candidate possible parent sets for each node changes scored on first iteration of search changes scored on subsequent iterations ordinary greedy search greedy search w/at most k parents Sparse Candidate Rationale for last column: after we apply an operator, the scores will change only for the parents of the node with the new impinging edge.

Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000 expression levels in populations of yeast cells 800 genes 76 experimental conditions used two representations of the data discrete representation (underexpressed, normal, overexpressed) with CPTs in the models continuous representation with linear Gaussians

Bayes Net Structure Learning Case Study: Two Key Issues Since there are many variables but data is sparse, there is not enough information to determine the “right” model. Instead, can we consider many of the high-scoring networks? How can we tell if the structure learning procedure is finding real relationships in the data? Is it doing better than chance?

Representing Partial Models How can we consider many high-scoring models? Use the bootstrap method to identify high-confidence features of interest. Friedman et al. focus on finding two types of “features” common to lots of models that could explain the data Markov relations: is Y in the Markov blanket of X? order relations: is X an ancestor of Y X A B D

Markov Blankets every other node Y in the network is conditionally independent of X when conditioned on X’s Markov blanket MB(X) the Markov blanket for node X consists of its parents, its children, and its children’s parents A B C X D E F

Markov Blankets why are parents of X’s children in its Markov blanket? suppose we’re using the following network to infer the probability that it rained last night Rained Grass-wet Sprinkler-on we observe the grass is wet; is the Sprinkler-on variable now irrelevant? no – if we observe that the sprinkler is on, this helps to “explain away” the grass being wet

Estimating Confidence in Features: The Bootstrap Method for i = 1 to m randomly draw sample Si (with replacement) from N expression experiments learn a Bayesian network Bi from Si some expression experiments will be included multiple times in a given sample, some will be left out. the confidence in a feature is the fraction of the m models in which it was represented

Permutation Testing: Do the Networks Represent Real Relationships how can we tell if the high-confidence features are meaningful? compare against confidence values for randomized data – genes should then be independent and we shouldn’t find “real” features conditions genes randomize each row independently

Confidence Levels of Features: Real vs. Randomized Data Markov features order features figure from Friedman et al., Journal of Computational Biology, 2000

Bayes Net Structure Learning Case Study: Sachs et al., Science 2005 measured levels of key molecules in (thousands of ) single cells, discretized to low, medium, high 11 phosphoproteins and phospholipds 9 specific perturbations Figure from Sachs et al., Science, 2005

A Signaling Network Figure from Sachs et al., Science 2005

Causality given only observational data, there are many cases in which we can’t determine causality Smokes Lung-Cancer T F Lung-Cancer Smokes Lung-Cancer Smokes these two networks explain the observations in the table equally well

How to Get at Causality? Observing how events are related in time can provide information about causality. time smokes cancer

How to Get at Causality? Interventions -- manipulating variables of interest -- can provide information about causality.

Sachs et al. Computational Experiments Simulated-annealing search with add-edge, delete-edge, reverse-edge operators repeated 500 times with different initial random graphs final model includes edges with confidence > 85% to evaluate importance of large data set of single-cell measurements with interventions, constructed control data sets small observation-only population-average

Interventions

Evaluating the Model Learned by Sachs et al.

The Value of Interventions, Data Set Size and Single-Cell Data 5400 data points (9 conditions x 600 cells each) Observational data: 1200 points from general stimulatory conditions Truncated: 420 data points Averaged: 420 data points, each an average of 20 single-celled measurements

Summary of BN Structure Learning structure learning often cast as a search problem the sparse candidate algorithm is more efficient than a naïve greedy search -- takes advantage of assumption that networks in this domain are sparsely connected we can score candidate networks efficiently because of parameter independence (overall score is a sum of local terms) we can score candidate networks efficiently with a Bayesian approach if we use conjugate priors high scoring network structures can be characterized using a bootstrapping approach that considers order and Markov-blanket features the significance of such features can be calculated using permutation testing we can gain more information about causality via time-series and interventions

Scoring Bayesian Network Structures with Interventions recall that the scoring function can be decomposed as follows the score for Xi is don’t count cases where an intervention manipulated Xi

Nachman et al., Bioinformatics 2004 this work extends what we’ve seen so far in three important ways important, but unmeasured states are represented by hidden variables the relationships between regulators and the genes they regulate are represented using realistic kinematic functions temporal profile of variables is represented using a dynamic Bayesian network

A Kinematic Model of Transcription Regulation: Single Activator Case H two fractions of cells H assume the cells are at steady state:

A Kinematic Model of Transcription Regulation: Single Activator Case H H P( ) × average transcription rate = +

A Kinematic Model of Transcription Regulation: Single Activator Case = P( ) = + = average transcription rate:

Kinematic Model of Transcription Regulation: Single Activator Case figure from Nachman et al., Bioinformatics, 2004

Generalizing the Kinematic Model to Multiple Regulators the two regulator case here is a vector indicating the binding states that lead to transcription

Viewing an HMM as a Graphical Model T 0.3 1 3 G 0.1 T 0.4 A 0.2 C 0.3 G 0.3 T 0.2 2 start 0.2 0.8 1.0 0.6 0.4 Pr( statei | statei-1 ) s1 s2 s3 s4 x1 x2 x3 x4 statei-1 1 2 3 start 0.8 0.2 0.6 0.4

Dynamic Bayesian Networks a dynamic Bayesian network is a generalization of an HMM in which there are multiple hidden variables in the networks of Nachman et al., these correspond to the various regulators being modeled. figure from Nachman et al., Bioinformatics, 2004

Nachman’s DBNs hidden variables represent continuous values transition functions given by time normally distributed noise

Nachman’s DBNs transcription-rate variables are continuous and are given by: where normally distributed noise

Structure Learning: “Ideal” Regulators operators: add/delete edges instead of blindly trying edge additions: calculate the “ideal regulator” profile – see if it’s similar to an existing profile suppose gene Rk currently has one regulator; can we find hnew and γk, new to optimally model gene? yes, transcription function is invertible once we set γk, new (which really acts as a scaling parameter)

Structure Learning: The “Ideal” Regulator Method Illustrated figure from Nachman et al., Bioinformatics, 2004

Learned Regulator Profiles learned profile log mRNA level avg. transcription rates of regulated genes expression and activity profiles shifted not regulated transcriptionally