Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Networks Applied to Modeling Cellular Networks

Similar presentations


Presentation on theme: "Bayesian Networks Applied to Modeling Cellular Networks"— Presentation transcript:

1 Bayesian Networks Applied to Modeling Cellular Networks
BMI/CS 576 Colin Dewey Fall 2009

2 Probabilistic Model of lac Operon
suppose we represent the system by the following discrete variables L (lactose) present, absent G (glucose) present, absent I (lacI) present, absent C (CAP) present, absent lacI-unbound true, false CAP-bound true, false Z (lacZ) high, low, absent suppose (realistically) the system is not completely deterministic the joint distribution of the variables could be specified by 26 × = parameters

3 A Bayesian Network for the lac System
Pr ( L ) Z L G C I lacI-unbound CAP-bound absent present 0.9 0.1 Pr ( lacI-unbound | L, I ) L I true false absent 0.9 0.1 present Pr ( Z | lacI-unbound, CAP-bound ) lacI-unbound CAP-bound absent low high true false 0.1 0.8

4 Bayesian Networks Also known as Directed Graphical Models
a BN is a Directed Acyclic Graph (DAG) in which the nodes denote random variables each node X has a conditional probability distribution (CPD) representing P(X | Parents(X) ) the intuitive meaning of an arc from X to Y is that X directly influences Y formally: each variable X is independent of its non-descendants given its parents

5 Bayesian Networks a BN provides a factored representation of the joint probability distribution Z L G C I lacI-unbound CAP-bound x 2 = 20 this representation of the joint distribution can be specified with 20 parameters (vs. 191 for the unfactored representation)

6 Representing CPDs for Discrete Variables
CPDs can be represented using tables or trees consider the following case with Boolean variables A, B, C, D Pr( D | A, B, C ) A Pr(D = T) = 0.9 F T B Pr(D = T) = 0.5 C Pr(D = T) = 0.8 Pr( D | A, B, C ) A B C T F 0.9 0.1 0.8 0.2 0.5

7 Representing CPDs for Continuous Variables
we can also model the distribution of continuous variables in Bayesian networks one approach: linear Gaussian models X normally distributed around a mean that depends linearly on values of its parents ui

8 The Parameter Learning Task
Given: a set of training instances, the graph structure of a BN Do: infer the parameters of the CPDs this is straightforward when there aren’t missing values, hidden variables L I G C L G I C lacI-unbound CAP-bound Z present true false low absent high ... lacI-unbound CAP-bound Z

9 The Structure Learning Task
Given: a set of training instances Do: infer the graph structure (and perhaps the parameters of the CPDs too) L G I C lacI-unbound CAP-bound Z present true false low absent high ...

10 The Structure Learning Task
structure learning methods have two main components a scheme for scoring a given BN structure a search procedure for exploring the space of structures

11 Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000
expression levels in populations of yeast cells 800 genes 76 experimental conditions

12 Learning Bayesian Network Structure
given a function for scoring network structures, we can cast the structure-learning task as a search problem figure from Friedman et al., Journal of Computational Biology, 2000

13 Structure Search Operators
B C D add an edge reverse an edge delete an edge A B C D A B C D A B C D

14 Bayesian Network Structure Learning
we need a scoring function to evaluate candidate networks; Friedman et al. use one with the form log probability of data D given graph G log prior probability of graph G where they take a Bayesian approach to computing i.e. don’t commit to particular parameters in the Bayes net

15 The Bayesian Approach to Structure Learning
Friedman et al. take a Bayesian approach: How can we calculate the probability of the data without using specific parameters (i.e. probabilities in the CPDs)? let’s consider a simple case of estimating the parameter of a weighted coin…

16 The Beta Distribution suppose we’re taking a Bayesian approach to estimating the parameter θ of a weighted coin the Beta distribution provides an appropriate prior where # of “imaginary” heads we have seen already # of “imaginary” tails we have seen already 1 continuous generalization of factorial function

17 The Beta Distribution suppose now we’re given a data set D in which we observe Mh heads and Mt tails the posterior distribution is also Beta: we say that the set of Betas distributions is a conjugate family for binomial sampling

18 The Beta Distribution assume we have a distribution P(θ) that is Beta(αh, αt) what’s the marginal probability (i.e. over all θ) that our next coin flip would be heads? what if we ask the same question after we’ve seen M actual coin flips?

19 The Dirichlet Distribution
for discrete variables with more than two possible values, we can use Dirichlet priors Dirichlet priors are a conjugate family for multinomial data if P(θ) is Dirichlet(α1, , αK), then P(θ|D) is Dirichlet(α1+M1, , αK+MK), where Mi is the # occurrences of the ith value Conjugate families are used for mathematical convenience, not necessarily the correct form of prior one would want, but is natural if interpreting as posterior after some previous observations

20 Scoring Bayesian Network Structures
when the appropriate priors are used, and all instances in D are complete, the scoring function can be decomposed as follows thus we can score a network by summing terms (computed as just discussed) over the nodes in the network efficiently score changes in a local search procedure D

21 The Bayesian Approach to Scoring BN Network Structures
we can evaluate this type of expression fairly easily because parameter independence: the integral can be decomposed into a product of terms: one per variable Beta/Dirichlet are conjugate families (i.e. if we start with Beta priors, we still have Beta distributions after updating with data) the integrals have closed-form solutions

22 Bayesian Network Search: The Sparse Candidate Algorithm [Friedman et al., UAI 1999]
Given: data set D, initial network B0, parameter k

23 The Restrict Step In Sparse Candidate
to identify candidate parents in the first iteration, can compute the mutual information between pairs of variables where denotes the probabilities estimated from the data set

24 The Restrict Step In Sparse Candidate
B C D suppose true network structure is: we’re selecting two candidate parents for A and I(A;C) > I(A;D) > I(A;B) the candidate parents for A would then be C and D; how could we get B as a candidate parent on the next iteration? A D C

25 The Restrict Step In Sparse Candidate
Kullback-Leibler (KL) divergence provides a distance measure between two distributions, P and Q mutual information can be thought of as the KL divergence between the distributions (assumes X and Y are independent)

26 The Restrict Step In Sparse Candidate
we can use KL to assess the discrepancy between the network’s estimate Pnet(X, Y) and the empirical estimate true distribution current Bayes net A B C D A D C B

27 The Restrict Step in Sparse Candidate
important to ensure monotonic improvement

28 The Maximize Step in Sparse Candidate
hill-climbing search with add-edge, delete-edge, reverse-edge operators test to ensure that cycles aren’t introduced into the graph

29 Efficiency of Sparse Candidate
possible parent sets for each node changes scored on first iteration of search changes scored on subsequent iterations ordinary greedy search greedy search w/at most k parents Sparse Candidate Rationale for last column: after we apply an operator, the scores will change only for the parents of the node with the new impinging edge.

30 Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000
expression levels in populations of yeast cells 800 genes 76 experimental conditions used two representations of the data discrete representation (underexpressed, normal, overexpressed) with CPTs in the models continuous representation with linear Gaussians

31 Bayes Net Structure Learning Case Study: Two Key Issues
Since there are many variables but data is sparse, there is not enough information to determine the “right” model. Instead, can we consider many of the high-scoring networks? How can we tell if the structure learning procedure is finding real relationships in the data? Is it doing better than chance?

32 Representing Partial Models
How can we consider many high-scoring models? Use the bootstrap method to identify high-confidence features of interest. Friedman et al. focus on finding two types of “features” common to lots of models that could explain the data Markov relations: is Y in the Markov blanket of X? order relations: is X an ancestor of Y X A B D

33 Markov Blankets every other node Y in the network is conditionally independent of X when conditioned on X’s Markov blanket MB(X) the Markov blanket for node X consists of its parents, its children, and its children’s parents A B C X D E F

34 Markov Blankets why are parents of X’s children in its Markov blanket?
suppose we’re using the following network to infer the probability that it rained last night Rained Grass-wet Sprinkler-on we observe the grass is wet; is the Sprinkler-on variable now irrelevant? no – if we observe that the sprinkler is on, this helps to “explain away” the grass being wet

35 Estimating Confidence in Features: The Bootstrap Method
for i = 1 to m randomly draw sample Si (with replacement) from N expression experiments learn a Bayesian network Bi from Si some expression experiments will be included multiple times in a given sample, some will be left out. the confidence in a feature is the fraction of the m models in which it was represented

36 Permutation Testing: Do the Networks Represent Real Relationships
how can we tell if the high-confidence features are meaningful? compare against confidence values for randomized data – genes should then be independent and we shouldn’t find “real” features conditions genes randomize each row independently

37 Confidence Levels of Features: Real vs. Randomized Data
Markov features order features figure from Friedman et al., Journal of Computational Biology, 2000

38 Bayes Net Structure Learning Case Study: Sachs et al., Science 2005
measured levels of key molecules in (thousands of ) single cells, discretized to low, medium, high 11 phosphoproteins and phospholipds 9 specific perturbations Figure from Sachs et al., Science, 2005

39 A Signaling Network Figure from Sachs et al., Science 2005

40 Causality given only observational data, there are many cases in which we can’t determine causality Smokes Lung-Cancer T F Lung-Cancer Smokes Lung-Cancer Smokes these two networks explain the observations in the table equally well

41 How to Get at Causality? Observing how events are related in time can provide information about causality. time smokes cancer

42 How to Get at Causality? Interventions -- manipulating variables of interest -- can provide information about causality.

43 Sachs et al. Computational Experiments
Simulated-annealing search with add-edge, delete-edge, reverse-edge operators repeated 500 times with different initial random graphs final model includes edges with confidence > 85% to evaluate importance of large data set of single-cell measurements with interventions, constructed control data sets small observation-only population-average

44 Interventions

45 Evaluating the Model Learned by Sachs et al.

46 The Value of Interventions, Data Set Size and Single-Cell Data
5400 data points (9 conditions x 600 cells each) Observational data: 1200 points from general stimulatory conditions Truncated: 420 data points Averaged: 420 data points, each an average of 20 single-celled measurements

47 Summary of BN Structure Learning
structure learning often cast as a search problem the sparse candidate algorithm is more efficient than a naïve greedy search -- takes advantage of assumption that networks in this domain are sparsely connected we can score candidate networks efficiently because of parameter independence (overall score is a sum of local terms) we can score candidate networks efficiently with a Bayesian approach if we use conjugate priors high scoring network structures can be characterized using a bootstrapping approach that considers order and Markov-blanket features the significance of such features can be calculated using permutation testing we can gain more information about causality via time-series and interventions

48 Scoring Bayesian Network Structures with Interventions
recall that the scoring function can be decomposed as follows the score for Xi is don’t count cases where an intervention manipulated Xi

49 Nachman et al., Bioinformatics 2004
this work extends what we’ve seen so far in three important ways important, but unmeasured states are represented by hidden variables the relationships between regulators and the genes they regulate are represented using realistic kinematic functions temporal profile of variables is represented using a dynamic Bayesian network

50 A Kinematic Model of Transcription Regulation: Single Activator Case
H two fractions of cells H assume the cells are at steady state:

51 A Kinematic Model of Transcription Regulation: Single Activator Case
H H P( ) × average transcription rate =

52 A Kinematic Model of Transcription Regulation: Single Activator Case
= P( ) = + = average transcription rate:

53 Kinematic Model of Transcription Regulation: Single Activator Case
figure from Nachman et al., Bioinformatics, 2004

54 Generalizing the Kinematic Model to Multiple Regulators
the two regulator case here is a vector indicating the binding states that lead to transcription

55 Viewing an HMM as a Graphical Model
T 0.3 1 3 G 0.1 T 0.4 A 0.2 C 0.3 G 0.3 T 0.2 2 start 0.2 0.8 1.0 0.6 0.4 Pr( statei | statei-1 ) s1 s2 s3 s4 x1 x2 x3 x4 statei-1 1 2 3 start 0.8 0.2 0.6 0.4

56 Dynamic Bayesian Networks
a dynamic Bayesian network is a generalization of an HMM in which there are multiple hidden variables in the networks of Nachman et al., these correspond to the various regulators being modeled. figure from Nachman et al., Bioinformatics, 2004

57 Nachman’s DBNs hidden variables represent continuous values
transition functions given by time normally distributed noise

58 Nachman’s DBNs transcription-rate variables are continuous and are given by: where normally distributed noise

59 Structure Learning: “Ideal” Regulators
operators: add/delete edges instead of blindly trying edge additions: calculate the “ideal regulator” profile – see if it’s similar to an existing profile suppose gene Rk currently has one regulator; can we find hnew and γk, new to optimally model gene? yes, transcription function is invertible once we set γk, new (which really acts as a scaling parameter)

60 Structure Learning: The “Ideal” Regulator Method Illustrated
figure from Nachman et al., Bioinformatics, 2004

61 Learned Regulator Profiles
learned profile log mRNA level avg. transcription rates of regulated genes expression and activity profiles shifted not regulated transcriptionally


Download ppt "Bayesian Networks Applied to Modeling Cellular Networks"

Similar presentations


Ads by Google