Dynamic Bayesian Networks Sushmita Roy sroy@biostat.wisc.edu Computational Network Biology Biostatistics & Medical Informatics 826 Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu Oct 18th 2016
Goals for today Dynamic Bayesian Network How does a DBN differ from a static BN? Learning a Dynamic Bayesian Network DBN with structure priors Application of DBN to a phospho-proteomics time course Evaluation and insights
Dynamic Bayesian Networks A Dynamic Bayesian network (DBN) is a Bayesian network that can model temporal/sequential data DBN is a Bayes net for dynamic processes A DBN also has a graph structure and conditional probability distributions The DBN specifies how observations at a future time point may arise from previous time points.
Notation Assume we have a time course with T time points specifying activity of p different variables Let denote the set of random variables at time t A DBN over these variables defines the joint distribution of P(X), where A DBN, like a BN, has a directed acyclic graph G and parameters Θ G typically specifies the dependencies between time points In addition we need to specify dependence (if any) at t=0
A DBN for p variables and T time points X11 X21 Xp1 … X1 X2 Xp … 1 X1 X2 Xp … 2 … X1 X2 Xp … T … p … X2: Variables at time t=2 Dependency at the first time point
Stationary assumption in a Bayesian network The stationarity assumption states that the dependency structure and parameters do not change with t Due to this assumption, we only need to specify dependencies between two sets of variables p t X1t X2t Xpt … X1t+1 X2t+1 Xpt+1 t+1 X1 X2 Xp 1 2 T t=1 t=2 t=T
Computing the joint probability distribution in a DBN Joint Probability Distribution can be factored into a product of conditional distributions across time and variables: Parameters specifying the form of the conditional distributions Graph encoding dependency structure between variables at consecutive time points Parents of Xit defined by the graph G
Learning problems in DBNs Parameter learning: Given known temporal dependencies between random variables estimate the parameters from observed measurements Structure learning: Given data, learn both the graph structure and parameters Complexity of learning depends upon the order of the model
An example DBN Let us consider a simple example of two regulators, B and C and one target gene, A Assume their expression takes on values H, L and NC (for high, low and no-change in expression) A’s expression level depends upon regulator B and C’s expression level B and C mutually regulate each other Let XAt denote the random variable representing the expression level of gene A at time t
DBN for a three node network The collapsed network
Specifying the parameters of the DBN for a three node network Each of these conditional distributions will specify the distribution over {H,L, NC} given the state of the parent variable DBN
Specifying the parameters of the DBN L NC 0.5 0.1 0.4 0.2 0.25
Specifying the parameters of the DBN L NC 0.8 0.1 0.2 0.6 0.7 0.3 0.05 0.75 Parameter estimation: Estimating these numbers
Assume the following CPDs for three variables NC 0.5 0.1 0.4 0.2 0.25 B C H L NC 0.8 0.1 0.2 0.6 0.7 0.3 0.05 0.75 H L NC 0.5 0.1 0.4 0.2 0.25
Computing the probability distribution of an observation Suppose we are given a new observation time course T=0 T=1 T=2 NC L H Assume, P(NC)=0.5 and P(H)=P(L)=0.25 for all variables at T=0. Using the DBN from the previous slides, what is the probability of this time course? First we plug in the formula at the time point level Next, we look at the graph structure of the DBN to further decompose these terms
Computing the probability distribution of an observation Graph structure of the DBN to further decompose these terms NC L H Assume P(NC)=0.5 and P(H)=P(L)=0.25 at T=0.
Parameter estimation in DBNs Parameter estimation approach would differ depending upon the form of the CPD Assume that the variables are discrete, then we need to estimate the entries of the CPD distribution
Parameter estimation example for three node DBN Need to estimate this table H L NC Suppose we had a training time course: T=0 T=1 T=2 T=3 T=4 NC L H To compute these probabilities, we need to look at the joint assignments of {XBt+1,XCt} for all 0≤t≤4 What is P(XBt+1=H|XCt=L)? What is P(XBt+1=NC|XCt=L)?
Structure learning in DBNs We need to learn the dependency structure between two consecutive time points We may also want to learn within time point connectivity Structure search learning algorithms used for BNs, can be used with a simple extension: parents of a node can come from the previous or current time step.
DBN with score-based search Score of a DBN is a function of the data likelihood Data: Collection of time courses Graph prior: This can be uniform, or can encode some form of model complexity
Goals for today Dynamic Bayesian Network How does a DBN differ from a static BN? Learning a Dynamic Bayesian Network DBN with structure priors Application of DBN to a phospho-proteomics time course Evaluation and insights
Bayesian Inference of Signaling Network Topology in a Cancer Cell Line (Hill et al 2012) Protein signaling networks are important for many cellular diseases The networks can differ between normal and disease cell types But the structure of the network remains incomplete Temporal activity of interesting proteins can be measured over time, that can be used infer the network structure Build on prior knowledge of signaling networks to learn a better, predictive network BNs are limiting because they do not model time
Applying DBNs to infer signaling network topology Fig.1: Data-driven characterization of signaling networks. Reverse-phase protein arrays interrogate signaling dynamics in samples of interest. Network structure is inferred using DBNs, with primary phospho-proteomic data integrated with existing biology, using informative priors objectively weighted by an empirical Bayes approach. Edge probabilities then allow the generation and prioritization of hypotheses for experimental validation Hill et al., Bioinformatics 2012
Application of DBNs to signaling networks Dataset description Phospho-protein levels of 20 proteins Eight time points Four growth conditions Use known signaling network as a graph prior Estimate CPDs as conditional regularized Gaussians Assume a first-order Markov model Xt depends on on Xt-1
Integrating prior signaling network into the DBN A Bayesian approach to graph learning Graph prior is encoded as (Following Mukherjee & Speed 2008) Where f(G)=-|E(G)\E*| is defined as the number of edges in the graph G, E(G), that are not in the prior set E* This prior does not promote new edges, but penalizes edges that are not in the prior Data likelihood Graph prior Prior strength Graph features
Calculating posterior probabilities of edges For each edge e, we need to calculate Although this is intractable in general, this work makes some assumptions Allow edges only forward in time The learning problem decomposes to smaller per-variable problems that can be solved by variable selection Assume P(G) factorizes over individual edges To compute the posterior probability, the sum goes over all possible parent sets Assume a node can have no more than dmax parents
Results on simulated data 20 variables, 4 time-courses 8 time points Prior network had 54 extra edges and did not have 10 of the ground truth edges
Results are not sensitive to prior values Sensitivity to choice of hyper parameter Sensitivity to noisy prior graph
Inferred signaling network using a DBN Prior also had self-loops that are not shown Inferred signaling network Prior network
Using the DBN to make predictions Although many edges were expected, several edges were unexpected Select novel edges based on posterior probability and test them based on inhibitors For example, if an edge was observed from X to Y, inhibition of X should affect the value of Y, if X is a regulator of Y Example edges tested MAPKp to STAT3p(S727) with high probability (0.98) Apply MEKi, which is an inhibitor of MAPK, and measure MAPKp and STAT3p post inhibition AKTp to p70S6Kp, AKTp to MEKp and AKTp to cJUNp
Experimental validation of links Add MAPK inhibitor and measure MAPK and STAT3 MAPK is significantly inhibited (P-value 5X10-4) STAT3 is also inhibited (P-value 3.3X10-4) Fig. 4. Validation of predictions by targeted inhibition in breast cancer cell line MDA-MB-468. (a) MAPK-STAT3 crosstalk. Network inference (Fig. 3a) predicted an unexpected link between phospho-MAPK (MAPKp) and STAT3p(S727) in the breast cancer cell line MDA-MB-468. The hypothesis of MAPK-STAT3 crosstalk was tested by MEK inhibition: this successfully reduced MAPK phosphorylation and resulted in a corresponding decrease in STAT3p(S727). (b) AKTp → p70S6Kp, AKT-MAPK crosstalk and AKT-JNK/JUN crosstalk. AKTp is linked to p70S6kp, MEKp and cJUNp. In line with these model predictions, use of an AKT inhibitor reduced both p70S6K and MEK phosphorylation and increased JNK phosphorylation. (RPPA data; MEK inhibitor GSK1120212 and AKT inhibitor GSK690693B at 0 uM, 0.625 uM, 2.5 uMand 10 uM; measurements taken 0, 5, 15, 30, 60, 90, 120 and 180min after EGF stimulation; average values over 3 replicates shown, error bars indicate SEM) Their success is measured by the difference in the levels of the targets as a function of the levels of the inhibitors
Take away points Network dynamics can be defined in multiple ways We have seen two ways to capture network dynamics Skeleton network-based approaches The universe of networks is fixed Nodes become on or off No assumption or model of how the network changes over time Dynamic Bayesian network A type of probabilistic graphical model Describes how the system transitions from one state to another Assumes that the dependency between t-1 and t is the same for all time points Application to cancer signaling data DBNs are powerful for capturing the dynamics However, the prior was important to learn an accurate network
References N. Friedman, K. Murphy, and S. Russell, "Learning the structure of dynamic probabilistic networks," in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, ser. UAI'98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 139-147. [Online]. Available: http://portal.acm.org/citation.cfm?id=2074111 S. M. Hill, Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills, and S. Mukherjee, "Bayesian inference of signaling network topology in a cancer cell line." Bioinformatics (Oxford, England), vol. 28, no. 21, pp. 2804-2810, Nov. 2012. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/bts514