Presentation is loading. Please wait.

Presentation is loading. Please wait.

11. Lecture WS 2005/06Bioinformatics III1 V11: Genetic networks Methods to describe genetic networks: (1) boolean networks (today) (2) clustering gene.

Similar presentations


Presentation on theme: "11. Lecture WS 2005/06Bioinformatics III1 V11: Genetic networks Methods to describe genetic networks: (1) boolean networks (today) (2) clustering gene."— Presentation transcript:

1 11. Lecture WS 2005/06Bioinformatics III1 V11: Genetic networks Methods to describe genetic networks: (1) boolean networks (today) (2) clustering gene expression data (  Bioinformatics II lecture) Clustering is a relatively easy way to extract useful information out of large-scale gene expression data sets. However, it typically only tells us which genes are co-regulated, not what is regulating what. Need to reverse engineer networks from their activity profiles! JCell manual, U Tübingen

2 11. Lecture WS 2005/06Bioinformatics III2 Intergenic interaction matrix M Since the introduction detecting gene expression by microarrays, a major problem has been the estimation of the intergenic interaction matrix M. The matrix element m ij of the interaction matrix M is - positive if gene G j activates gene G i - negative if gene G j inhibits gene G i - equal to 0 if gene G j and gene G i have no interaction. G i = +1 if it is expressed, otherwise = 0. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

3 11. Lecture WS 2005/06Bioinformatics III3 simulating the dynamics of regulatory networks Given the interaction matrix M, the change of state x i of gene G i between t and t +1 obeys a threshold rule: Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004) where H is the Heavyside function H(y) = 1 if y  0 and H(y) = 0 if y < 0, and the b i ‘s are threshold values. In the case of small regulatory genetic systems, the knowledge of such a matrix M makes it possible to know all possible stationary behaviors of the organisms having the corresponding genome.

4 11. Lecture WS 2005/06Bioinformatics III4 Example Mendoza, Alvarez-Buylla, JCB, 1998 In the genetic regulatory network which rules Arabidopsis thaliana flower morphogenesis (right), the interaction matrix is a (11,11) matrix with only 22 non zero coefficient. Below: A fixed configuration (attractor) of its Boolean dynamics that is obtained from propagating x i (t).

5 11. Lecture WS 2005/06Bioinformatics III5 Interaction matrix - interaction graph For each genetic regulatory network, we can define an interaction graph built from the interaction matrix M by drawing an edge + (resp. -) between the vertices representing the genes j and i, iff m ij > 0 (resp. < 0). To calculate the m ij ´s, we can either determine the s-directional correlation  ij (s) between the state vector {x j (t – s)} t  C of gene j at time t – s and the state vector {x i (t)} t  C of gene i at time t, t varying during the cell cycle C of length K = | C | and corresponding to the observation time of the bio-array images: Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

6 11. Lecture WS 2005/06Bioinformatics III6 interaction matrix and then take Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004) where  is a de-correlation threshold. Alternatively, one may identify the system with a Boolean neural network. When it is impossible to obtain all the coefficients of M in this manner (either from the literature or from such calculations), it may be possible to complete M by appyling an heuristic approach.

7 11. Lecture WS 2005/06Bioinformatics III7 estimation of interaction values We may randomly choose the missing coefficients by considering - the connectivity coefficient K(M) = I / N, the ratio between the number I of interactions and the number N of genes, and - the mean inhibition weight I(M) = R / I, the ratio between the number of inhibitions R and I. For many known operons and regulation networks, K(M) is between 1.5 and 3, and I(M) between 1/3 and 2/3. If M is structurally stable, then the random estimation of M can be used to obtain an approximate estimation on the control mechanisms of the regulatory network. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

8 11. Lecture WS 2005/06Bioinformatics III8 Mathematical Aspects of the Inverse Problem A network with two or more connected components, i.e. two or more sub-networks, has as fixed configurations the combination (Cartesian product) of all fixed configurations of each sub-network. We say that the fixed configurations are factorizable. Thus, the inverse problem consists of determining whether a fixed configurations set is factorizable. In this way, we can obtain some information on the connectivity of the network. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

9 11. Lecture WS 2005/06Bioinformatics III9 Factorization Given S  {0,1} n and a permutation function  : {1,...,n}  {1,...,n}, we denote by (S) , or simply S  the set {s  (1) s  (2)... s  (n) : s 1 s 2...s n  S }. A set S  {0,1} n is said to be factorizable if there exist sets of vectors S 1  {0,1} j(1) and S 2  {0,1} j(2) and,..., S k  {0,1} j(3) and a permutation function  : {1,..., n}  {1,...,n} such that S can be written as S = (S 1  S 2 ...  S k ) , where the symbol „  “ is the cartesian product between sets. If S is a factorizable set, then j(1) + j(2) +... + j(k) = n. The set defined by F = {S 1,S 2,...,S k } is called a factorization of S and each S j  F a factor of S. F is called a maximal factorization if every factor S j  F is not factorizable. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

10 11. Lecture WS 2005/06Bioinformatics III10 Examples i) S = {0100, 0111, 1000, 1011} = {01, 10}  {00, 11}. Here, the permutation function is the identity. ii) S = {0010, 0111, 1000, 1101} = ({0100, 0111, 1000, 1011})  (2,3) = ({01, 10}  {00, 11})  (2,3), where  (2,3) is the function which permutes the second and third coordinates. Given the sets I  {1,..., n} and S  (0,1) n, let P I (S) be the projection set defined by P I (S) = {(s j(1),s j(2),...,s j(I) ): s  S, j(k)  I, k = 1,..., | I |, and j(k) < j(l) for all k < l }. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

11 11. Lecture WS 2005/06Bioinformatics III11 Proposition 2 If a set S  {0,1} n is factorizable, then the maximal factorization of S is unique. Proof Let F = {S 1,S 2,...,S k } and G = {T 1,T 2,...,T k } be two distinct maximal factorizations of S. S = (S 1  S 2 ...  S k )  1 = (T 1  T 2 ...  T k )  2 Hence, the permutation  = (  1) -1 ○  2 is such that S 1  S 2 ...  S k = (T 1  T 2 ...  T k )  Since F and G are maximal factorizations, there is a factor of F not included into G, which is supposed to be S 1  {0,1} q, q  {1,..., n}. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004) Let T = T 1  T 2 ...  T m, so S 1 = P {1,...,q} (T  ) Hence, if we denote by I(k)  {1,..., n} the set of indices such that P I(k) (T) = T, for every k = 1,...,m and by J = { j  {1,...,m}: I(j)  {  (1),...,  (p)}  } then there exists a permutation function  ‘ such that Therefore, S 1 is factorizable, a contradiction. 

12 11. Lecture WS 2005/06Bioinformatics III12 Algorithm Let  : {0,1} n  {0,1} n  P({1,...,n}) be the function called the difference function where P({1,...,n}) is the set of subsets of {1,...,n} and defined by  (x,y) = {i: x i  y i }, where x,y  {0,1} n. Given S  {0,1} n, the idea of the Factorization algorithm is first to construct a matrix with all the values of  (x,y) for every x,y  S. Next, for each row i of the matrix we construct a finite and undirected graph G i = (V i,E i ), where the set of nodes V i is equal to the set {1,...,n} and the set of arcs E i is determined by the values of each row of the matrix, according to the algorithm. Finally, the connected components of the union of all graphs G i determine the factors of the maximal factorization of S. In the case that S is not factorizable, the output of the algorithm will be a graph with a unique connected component. Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

13 11. Lecture WS 2005/06Bioinformatics III13 Algorithm Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

14 11. Lecture WS 2005/06Bioinformatics III14 Theorem 3 Given a set S  {0,1} n, if I = { I(1), I(2),..., I(k) } is the output of the Factorization algorithm with input S, then F = { P (I) (S): I = 1,..., k) is the maximal factorization of S and the complexity of the algorithm is O(|S| 3 + n 2 ) Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)

15 11. Lecture WS 2005/06Bioinformatics III15 Example 2 Let S = { x 1 = 000, x 2 = 001, x 3 = 100, x 4 = 010, x 5 = 011, x 6 = 110}. The difference matrix is and the partial graphs and the output graph of the algorithm are: Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004) The output is I(1) = {1,3} and I(2) = {2}.  the maximal factorization of S is given by S = (P I(1) (S)  P I(2) (S))  (2,3) = ({00,01,10}  {0,1})  (2,3) where  (2,3) is the permutation of the second and third coordinates.

16 11. Lecture WS 2005/06Bioinformatics III16 Example 3 Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004) The maximal factorization of S is given by S = (P I(1) (S)  P I(2) (S))  (4,5) = ({0010,0001,1100)  (000,111)}  (4,5) The following set of vectors corresponds to the observed fixed points of the A.thaliana regulatory network, considering only genes whose activity is not constant. Let S = { x 1 = 0010000, x 2 = 0011011, x 3 = 0000100, x 4 = 0001111, x 5 = 1100000, x 6 = 1101011 }. The difference matrix is The graph G of the algorithm and the connected components I(1) = {1,2,3,5} and I(2) = {4,6,7} are:

17 11. Lecture WS 2005/06Bioinformatics III17 Design principles of regulatory networks Wiring diagrams of regulatory networks resemble somehow electrical circuits. Try to break down networks into basic building blocks. Search for „network motifs“ as patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. Shen-Orr et al. Nature Gen. 31, 64 (2002) Uri Alon Weizman Institute

18 11. Lecture WS 2005/06Bioinformatics III18 Detection of motifs Represent transcriptional network as a connectivity matrix M such that M ij = 1 if operon j encodes a TF that transcriptionally regulates operon i and M ij = 0 otherwise. Scan all n × n submatrices of M generated by choosing n nodes that lie in a connected graph, for n = 3 and n = 4. Submatrices were enumerated efficiently by recursively searching for nonzero elements. Compute a P value for submatrices representing each type of connected subgraph by comparing # of times they appear in real network vs. in random network. For n = 3, the only significant motif is the feedforward loop. For n = 4, only the overlapping regulation motif is significant. SIMs and multi-input modules were identified by searching for identical rows of M. Shen-Orr et al. Nature Gen. 31, 64 (2002)

19 11. Lecture WS 2005/06Bioinformatics III19 DOR detection Consider all operons regulated by ≥ 2 TFs. Define (nonmetric) distance measure between operons k and j, based on the # of TFs regulating both operons: d(k,j) = 1/ (1+  n f n M k,n M j,n ) 2 ) Where f n = 0.5 for global TFs and f n = 1 otherwise. Cluster operons with average-linkage algorithm. DORs correspond to clusters with more than 10 connections with a ratio of connections to TFs > 2. Shen-Orr et al. Nature Gen. 31, 64 (2002)

20 11. Lecture WS 2005/06Bioinformatics III20 Network motifs found in E.coli transcript-regul network a, Feedforward loop: a TF X regulates a second TF Y, and both jointly regulate one or more operons Z 1...Z n. b, Example of a feedforward loop (L-arabinose utilization). c, SIM motif: a single TF, X, regulates a set of operons Z 1...Z n. X is usually autoregulatory. All regulations are of the same sign. No other transcription factor regulates the operons. d, Example of a SIM system (arginine biosynthesis). e, DOR motif: a set of operons Z 1...Z m are each regulated by a combination of a set of input transcription factors, X 1...X n. DOR-algorithm detects dense regions of connections, with a high ratio of connections to transcription factors. f, Example of a DOR (stationary phase response). Shen-Orr et al. Nature Gen. 31, 64 (2002)

21 11. Lecture WS 2005/06Bioinformatics III21 Significance of motifs Shen-Orr et al. Nature Gen. 31, 64 (2002)

22 11. Lecture WS 2005/06Bioinformatics III22 Regulatory network Shen-Orr et al. Nature Gen. 31, 64 (2002) Each TF appears only in a single subgraph except for global TFs that can appear in several subgraphs.

23 11. Lecture WS 2005/06Bioinformatics III23 Structural organization of transcript-regul networks Modules: observation that reg. Networks are highly interconnected, very few modules can be entirely separated from the rest of the network. Babu et al. Curr Opin Struct Biol. 14, 283 (2004)

24 11. Lecture WS 2005/06Bioinformatics III24 Evolution of the gene regulatory network Larger genomes tend to have more TFs per gene. Babu et al. Curr Opin Struct Biol. 14, 283 (2004)

25 11. Lecture WS 2005/06Bioinformatics III25 Cross-organism comparison Many TF families are specific to individual phylogenetic groups or greatly expanded in some genomes. Babu et al. Curr Opin Struct Biol. 14, 283 (2004) In contrast to the high level of conservation of other regulatory and signalling systems across the crown group eukaryotes, some of the TF families are dramatically different in the various lineages.

26 11. Lecture WS 2005/06Bioinformatics III26 Regulatory interactions across organisms Are regulatory interactions conserved among organisms? Apparently yes. Orthologous TFs regulate orthologous target genes. As expected, the conservation of genes and interaction is related to the phylogenetic difference between organisms. Above: Many interactions of (a) can be mapped to pathogenetic Pseudomonas aeruginosa that is related to E.coli (b). Very few interactions can be mapped from (a) to (c). Babu et al. Curr Opin Struct Biol. 14, 283 (2004)

27 11. Lecture WS 2005/06Bioinformatics III27 Regulatory interactions across organisms Observation: there is no bias towards conservation of network motifs. Regulatory interactions in motifs are lost or retained at the same rate as the other interactions in the network.  The transcriptional network appears to evolve in a step-wise manner, with loss and gain of individual interactions probably playing a greater role than loss and gain of whole motifs or modules. Observation: TFs are less conserved than target genes, which suggests that regulation of genes evolves faster than the genes themselves. Babu et al. Curr Opin Struct Biol. 14, 283 (2004)

28 11. Lecture WS 2005/06Bioinformatics III28 Most research on biological networks has been focused on static topological properties, describing networks as collections of nodes and edges rather than as dynamic structural entities. Here this study focusses on the temporal aspects of networks, which allows us to study the dynamics of protein complex assembly during the Saccharomyces cerevisiae cell cycle. The integrative approach combines protein-protein interactions with information on the timing of the transcription of specific genes during the cell cycle, obtained from DNA microarray time series shown before.  a quality-controlled set of 600 periodically expressed genes, each assigned to the point in the cell cycle where its expression peaks. Analysis of complexome during cell cycle Science 307, 724 (2005) Ulrik LichtenbergPeer Bork

29 11. Lecture WS 2005/06Bioinformatics III29 Temporal protein interaction network in yeast cell cycle Cell cycle proteins that are part of complexes or other physical interactions are shown within the circle. For the dynamic proteins, the time of peak expression is shown by the node color; static proteins are represented as white nodes. Outside the circle, the dynamic proteins without interactions are positioned and colored according to their peak time. Science 307, 724 (2005)

30 11. Lecture WS 2005/06Bioinformatics III30 Just-in-time synthesis vs. just-in-time-assembly Transcription of cell cycle–regulated genes is generally thought to be turned on when or just before their protein products are needed: often referred to as just-in-time synthesis. Contrary to the cell cycle in bacteria, however, just-in-time synthesis of entire complexes is rarely observed in the network. The only large complex to be synthesized in its entirety just in time is the nucleosome, all subunits of which are expressed in S phase to produce nucleosomes during DNA replication. Instead, the general design principle appears to be that only some subunits of each complex are transcriptionally regulated in order to control the timing of final assembly. Science 307, 724 (2005)

31 11. Lecture WS 2005/06Bioinformatics III31 Integrate transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisae. 5 conditions cell cycle sporulation diauxic shift DNA damage stress response Something spectacular at the end Luscombe, Babu, … Teichmann, Gerstein, Nature 431, 308 (2004) Sarah TeichmannMark Gerstein

32 11. Lecture WS 2005/06Bioinformatics III32 SANDY: topological measures + network motifs Luscombe et al. Nature 431, 308 (2004) + some post-analysis

33 11. Lecture WS 2005/06Bioinformatics III33 Dynamic representation of transript. regul. network c, Standard statistics (global topological measures and local network motifs) describing network structures. These vary between endogenous and exogenous conditions; those that are high compared with other conditions are shaded. (Note, the graph for the static state displays only sections that are active in at least one condition, but the table provides statistics for the entire network including inactive regions.) Luscombe, Babu, … Teichmann, Gerstein, Nature 431, 308 (2004) a, Schematics and summary of properties for the endogenous and exogenous sub-networks. b, Graphs of the static and condition-specific networks. Transcription factors and target genes are shown as nodes in the upper and lower sections of each graph respectively, and regulatory interactions are drawn as edges; they are coloured by the number of conditions in which they are active. Different conditions use distinct sections of the network.

34 11. Lecture WS 2005/06Bioinformatics III34 Luscombe et al. Nature 431, 308 (2004) Interpretation Half of the targets are uniquely expressed in only one condition; in contrast, most TFs are used across multiple processes. The active sub-networks maintain or rewire regulatory interactions, over half of the active interactions are completely supplanted by new ones between conditions. Only 66 interactions are retained across ≥ 4 conditions. They are always „on“ and mostly regulate house-keeping functions. The calculations divide the 5 condition-specific networks into 2 categories: endogenous and exogenous. Endogenous processes are multi-stage, operate with an internal transcriptional program Exogenous processes are binary events that react to external stimuli with a rapid turnover of expressed genes.

35 11. Lecture WS 2005/06Bioinformatics III35 Figure 2 Newly derived 'follow-on' statistics for network structures. a, TF hub usage in different cellular conditions. The cluster diagram shades cells by the normalized number of genes targeted by TF hubs in each condition. One cluster represents permanent hubs and the others condition- specific transient hubs. Genes are labelled with four-letter names when they have an obvious functional role in the condition, and seven-letter open reading frame names when there is no obvious role. Of the latter, gene names are red and italicised when functions are poorly characterized. Starred hubs show extreme interchange index values, I = 1. b, Interaction interchange (I) of TF between conditions. A histogram of I for all active TFs shows a uni-modal distribution with two extremes. Pie charts show five example TFs with different proportions of interchanged interactions. We list the main functions of the distinct target genes regulated by each example transcription factor. Note how the TFs' regulatory functions change between conditions. c, Overlap in TF usage between conditions. Venn diagrams show the numbers of individual TFs (large intersection) and pair-wise TF combinations (small intersection) that overlap between the two endogenous conditions. Luscombe et al. Nature 431, 308 (2004)

36 11. Lecture WS 2005/06Bioinformatics III36 Luscombe et al. Nature 431, 308 (2004) Interpretation Most hubs (78%) are transient = they are influential in one condition, but less so in others. Exogenous conditions have fewer transient hubs (different  ). „Transient hub“: capacity to change interactions between connections.

37 11. Lecture WS 2005/06Bioinformatics III37 a, The 70 TFs active in the cell cycle. The diagram shades each cell by the normalized number of genes targeted by each TF in a phase. Five clusters represent phase-specific TFs and one cluster is for ubiquitously active TFs. Both hub and non-hub TFs are included. b, Serial inter-regulation between phase- specific TFs. Network diagrams show TFs that are active in one phase regulate TFs in subsequent phases. In the late phases, TFs apparently regulate those in the next cycle. c, Parallel inter-regulation between phase- specific and ubiquitous TFs in a two-tiered hierarchy. Serial and parallel inter-regulation operate in tandem to drive the cell cycle while balancing it with basic house-keeping processes. Luscombe et al. Nature 431, 308 (2004) TF inter-regulation during the cell cycle time-course

38 11. Lecture WS 2005/06Bioinformatics III38 Luscombe et al. Nature 431, 308 (2004) Summary Integrated analysis of transcriptional regulatory information and condition-specific gene-expression data; post-analysis, e.g. - Identification of permanent and transient hubs - interchange index - overlap in TF usage across multiple conditions.  Large changes in underlying network architecture  in response to diverse stimuli, TFs alter their interactions to varying degrees, thereby rewiring the network  some TFs serve as permanent hubs, most act transiently  environmental responses facilitate fast signal propagation  cell cycle and sporulation proceed via multiple stages Many of these concepts may also apply to other biological networks.


Download ppt "11. Lecture WS 2005/06Bioinformatics III1 V11: Genetic networks Methods to describe genetic networks: (1) boolean networks (today) (2) clustering gene."

Similar presentations


Ads by Google