Representation, Learning and Inference in Models of Cellular Networks

Representation, Learning and Inference in Models of Cellular Networks
BMI/CS 576 Colin Dewey Fall 2010

Various Subnetworks within Cells
metabolic: describe reactions through which enzymes convert substrates to products regulatory (genetic): describe interactions that control expression of particular genes signaling: describe interactions among proteins and (sometimes) small molecules that relay signals from outside the cell to the nucleus note: these networks are linked together and the boundaries among them are not crisp

Figure from KEGG database
gene products other molecules

Part of the E. coli Regulatory Network
Caption: This figure illustrates the core of the GRN in E. coli, where TFs regulate other TFs [74]. Short horizontal lines from which bent arrows extend represent cis-regulatory elements responsible for the expression of the genes named below the line. When more than one TF regulates a gene, the order of their binding sites is as given in the figure. An arrowhead indicates activation and a horizontal bar indicates repression when the position of the binding site is known. If only the nature of TF regulation is known, without binding site information, ‘+’ and ‘−’ symbols indicate activation and repression respectively. These examples may be indirect rather than direct regulation. The circles with the different colours as given in the key represent the different families of DNA binding domains. The names of dominant regulators are in bold. FIS, factor for inversion stimulation; IHF, integration host factor. Modified with permission, from Madan Babu, M. and Teichmann, S. A., (2003), Nucleic Acids Res. 31, 1234–1244. © Oxford University Press. Figure from Wei et al., Biochemical Journal 2004

A Signaling Network Figure from Sachs et al., Science 2005
Classic signaling network and points of intervention. This is a graphical illustration of the conventionally accepted signaling molecule interactions, the events measured, and the points of intervention by small-molecule inhibitors. Signaling nodes in color were measured directly. Signaling nodes in gray were not measured, but are presented to place the signaling nodes that were measured within contextual cellular pathways. The interventions classified as activators are colored green and inhibitors are colored red. Intervention site of action is indicated in the figure. Arcs are used to illustrate connections between signaling molecules; in some cases, the connections may be indirect and may involve specific phosphorylation sites of the signaling molecules (see Table 3 for details of these connections). This figure contains a synopsis of signaling in mammalian cells and is not representative of all cell types, with inositol signaling corelationships being particularly complex. Figure from Sachs et al., Science 2005

Two Key Tasks learning: given background knowledge and high-throughput data, try to infer the (partial) structure/parameters of a network inference: given a (partial) network model, use it to predict an outcome of biological interest (e.g. will the cells grow faster in medium x or medium y?) both of these are challenging tasks because typically data are noisy data are incomplete – characterize a limited range of conditions important aspects of the system not measured – some unknown structure and/or parameters

Transcriptional Regulation Example: the lac Operon in E. coli
E. coli can use lactose as an energy source, but it prefers glucose. How does it switch on its lactose-metabolizing genes?

The lac Operon: Repression by LacI
lactose absent  protein encoded by lacI represses transcription of the lac operon

The lac Operon: Induction by LacI
lactose present  protein encoded by lacI won’t bind to the operator (O) region

The lac Operon: Activation by Glucose
glucose absent  CAP protein promotes binding by RNA polymerase; increases transcription

Network Model Representations
directed graphs Boolean networks differential equations Bayesian networks and related graphical models etc.

Probabilistic Model of lac Operon
suppose we represent the system by the following discrete variables L (lactose) present, absent G (glucose) present, absent I (lacI) present, absent C (CAP) present, absent lacI-unbound true, false CAP-bound true, false Z (lacZ) high, low, absent suppose (realistically) the system is not completely deterministic the joint distribution of the variables could be specified by 26 × = parameters

Motivation for Bayesian Networks
Explicitly state (conditional) independencies between random variables Provide a more compact model (fewer parameters) Use directed graphs to specify model Take advantage of graph algorithms/theory Provide intuitive visualizations of models

A Bayesian Network for the lac System
Pr ( L ) Z L G C I lacI-unbound CAP-bound absent present 0.9 0.1 Pr ( lacI-unbound | L, I ) L I true false absent 0.9 0.1 present Pr ( Z | lacI-unbound, CAP-bound ) lacI-unbound CAP-bound absent low high true false 0.1 0.8

Bayesian Networks Also known as Directed Graphical Models
a BN is a Directed Acyclic Graph (DAG) in which the nodes denote random variables each node X has a conditional probability distribution (CPD) representing P(X | Parents(X) ) the intuitive meaning of an arc from X to Y is that X directly influences Y formally: each variable X is independent of its non-descendants given its parents

Bayesian Networks a BN provides a factored representation of the joint probability distribution Z L G C I lacI-unbound CAP-bound x 2 = 20 this representation of the joint distribution can be specified with 20 parameters (vs. 191 for the unfactored representation)

Representing CPDs for Discrete Variables
CPDs can be represented using tables or trees consider the following case with Boolean variables A, B, C, D Pr( D | A, B, C ) A Pr(D = T) = 0.9 F T B Pr(D = T) = 0.5 C Pr(D = T) = 0.8 Pr( D | A, B, C ) A B C T F 0.9 0.1 0.8 0.2 0.5

Representing CPDs for Continuous Variables
we can also model the distribution of continuous variables in Bayesian networks one approach: linear Gaussian models U1 U2 … Uk X X normally distributed around a mean that depends linearly on values of its parents ui

The Inference Task in Bayesian Networks
Given: values for some variables in the network (evidence), and a set of query variables Do: compute the posterior distribution over the query variables variables that are neither evidence variables nor query variables are hidden variables the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables L I G C L G I C lacI-unbound CAP-bound Z present ? low lacI-unbound CAP-bound Z

The Parameter Learning Task
Given: a set of training instances, the graph structure of a BN Do: infer the parameters of the CPDs this is straightforward when there aren’t missing values, hidden variables L I G C L G I C lacI-unbound CAP-bound Z present true false low absent high ... lacI-unbound CAP-bound Z

The Structure Learning Task
Given: a set of training instances Do: infer the graph structure (and perhaps the parameters of the CPDs too) L G I C lacI-unbound CAP-bound Z present true false low absent high ...

Representation, Learning and Inference in Models of Cellular Networks

Similar presentations

Presentation on theme: "Representation, Learning and Inference in Models of Cellular Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Representation, Learning and Inference in Models of Cellular Networks

Similar presentations

Presentation on theme: "Representation, Learning and Inference in Models of Cellular Networks"— Presentation transcript:

Similar presentations

About project

Feedback