Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSIE in National Chi-Nan University 1 How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps Speaker: Chuang.

Similar presentations


Presentation on theme: "CSIE in National Chi-Nan University 1 How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps Speaker: Chuang."— Presentation transcript:

1 CSIE in National Chi-Nan University 1 How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187.

2 2CSIE in National Chi-Nan University Outline Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences

3 3CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences Outline

4 4CSIE in National Chi-Nan University Introduction and basic definitions Gene activity includes whether a gene is expressed or not, as mRNA, as protein etc.. Gene network: In this paper, we define a genetic network as a group of genes in which individual gene can influence the activity of other genes. The core task of reconstructing genetic networks is to identify the causal structure of a gene network.

5 5CSIE in National Chi-Nan University To reconstruct a genetic network is to identify, for each network gene, which other genes and their activity the gene influences directly. Now, let’s see an illustration of genetic network.

6 6CSIE in National Chi-Nan University P P DNA Gene 1 Gene 2Gene 3 Gene 4 Gene 5 This is a hypothetical biochemical pathway involving two transcription factors, a protein kinase and a protein phosphatase, as well as the genes encoding them. transcription factor protein kinase protein phosphatase transcription factor protein active inactive active

7 7CSIE in National Chi-Nan University Genetic perturbation: an experimental manipulation of gene activity by manipulating either a gene itself or its product. It includes point mutations, gene deletions, or other interference with the activity of the product.

8 8CSIE in National Chi-Nan University P P DNA Gene 1 Gene 2Gene 3 Gene 4 Gene 5 transcription factor protein kinase protein phosphatase transcription factor protein active inactive active Genetic perturbation: gene deletion Aspect of gene activity: mRNA expressionAspect of gene activity: phosphorlation state G1:G2, G5G1:G3, G4 G2:G5G2:G3, G4 G3: G5G3:G4 G4:G5G4:G5:

9 9CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences Outline

10 10CSIE in National Chi-Nan University Graph theoretical framework As the previous instance indicated, we are concerned with qualitative information on gene interaction. We consider a “digraph”, a graph representation of genetic networks, to this qualitative information. A digraph is a directed graph consisting of nodes and directed edges. Let’s see an example.

11 11CSIE in National Chi-Nan University We use a → b to mean that gene a influence the activity of gene b directly. For brevity, genes will be labeled by numbers from now on. 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 18 19 20 17 0

12 12CSIE in National Chi-Nan University Adjacency list: for each gene i, it simply shows which genes’ activity state the gene i influences directly. We denote Adj (G) to be the adjacency list of graph G and Adj (i) to be the set of nodes (genes) adjacent to (directly influenced by) node i.

13 13CSIE in National Chi-Nan University 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 18 19 20 17 0 0:16 1: 2: 3:2 5 8 4: 5:12 6:5 12 7:2 17 8: 9:10 15 10:1 20 11:20 12:14 13:8 17 14:0 15:0 16:2 17:8 18: 19:8 20:6 18 Adjacency list of G: G

14 14CSIE in National Chi-Nan University Accessibility list: the list of perturbation effects or the list of regulatory effects. It shows all nodes (genes) that can be accessed (influenced in their activity state) from a given gene by paths of direct interactions. We denote Acc (G) to be the accessibility list of the graph G and Acc (i) to be the set of nodes that can be reached (influenced) from node (gene) i.

15 15CSIE in National Chi-Nan University 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 18 19 20 17 0 0:2 16 1: 2: 3:0 2 5 8 12 14 16 4: 5:0 2 12 14 16 6:0 2 5 12 14 16 7:2 8 17 8: 9:0 1 2 5 6 10 12 14 15 16 18 20 10:0 1 2 5 6 12 14 16 18 20 11:0 2 5 6 12 14 16 18 20 12:0 2 14 16 13:8 17 14:0 2 16 15:0 2 16 16:2 17:8 18: 19:8 20:0 2 5 6 12 14 16 18 Accessibility list of G: G

16 16CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences Outline

17 17CSIE in National Chi-Nan University Before proceeding with the algorithm, we have to give some concepts and theorems first.

18 18CSIE in National Chi-Nan University The most parsimonious network An acyclic digraph defines its accessibility list, but an accessibility list may have more than one corresponding acyclic digraph. Let’s see an example first.

19 19CSIE in National Chi-Nan University 0: 1 2 3 4 5 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 (a) (b) (c)(d) (d) is the most parsimonious network of Acc, i.e., (a).

20 20CSIE in National Chi-Nan University An accessibility list Acc and a digraph G are compatible if G has Acc as its accessibility list. Acc is the accessibility list induced by G. G pars is called the most parsimonious network compatible with Acc.

21 21CSIE in National Chi-Nan University We prefer simplest or most parsimonious one of gene network. For any accessibility list Acc of a digraph G, there exists a most parsimonious network G pars. (From a result of a theorem.) Therefore G pars is the core of all the corresponding digraphs. More complicated digraphs make people confused. Why we prefer the most parsimonious network?

22 22CSIE in National Chi-Nan University Theorem 1 Let Acc be the accessibility list of an acyclic digraph. Then there exists exactly one graph G pars that has Acc as its accessibility list and that has fewer edges than any other graph G with Acc as its accessibility list. Before starting the proof, we need to introduce some terminology.

23 23CSIE in National Chi-Nan University Range and shortcut Consider two nodes i and j of a digraph that are connected by an edge e. The range r of the edge e is the length of the shortest path between i and j in the absence of e. If there is no other path connecting i and j, then r : =. An edge e with range r ≥ 2 but is called a shortcut. Let’s see an example.

24 24CSIE in National Chi-Nan University i j zkzk z k-1 z k-2 z2z2 z1z1 e r (e) = k + 1 e is a shortcut. When eliminating e, i and j are still connected by a path of length k + 1, so r (e) = k + 1.

25 25CSIE in National Chi-Nan University Lemma 1 For any accessibility list Acc of a digraph, there exists a compatible graph G pars that is free of shortcuts.

26 26CSIE in National Chi-Nan University Proof of Lemma 1 Assume that there is no such graph G pars. xixi yiyi eiei PiPi Length of P i is greater than 1. xixi yiyi PiPi deleting e i If there exists a shortcut e i between x i and y i, delete e i. Then by the definition of shortcut, we’ll derive that x i and y i are still connected via P i, whose length is greater than 1.

27 27CSIE in National Chi-Nan University Suppose that we have n possible (x i, y i ), i.e., (x 1, y 1 ), …, (x 1, x n ). After repeating all possible (x i, y i ), i = 1, …, n, we’ll derive a shortcut-free graph compatible with the accessibility list. This is a contradiction to the assumption made in the beginning of this proof.

28 28CSIE in National Chi-Nan University Lemma 2 Assume that Acc is the accessibility list of a digraph G. For each node x, the adjacency list Adj (x) of a shortcut-free graph G par compatible with Acc is a subset of the adjacency list Adj (x) of any graph compatible with Acc.

29 29CSIE in National Chi-Nan University Assume that Lemma 2 is false. W. L. O. G., suppose that a shortcut-free graph G pars and some other graph G induce Acc. By assumption, G pars contains at least one node x so that Adj(x) of G pars contains at least one node y that isn’t in Adj(x) of G. Proof of Lemma 2

30 30CSIE in National Chi-Nan University Because G and G pars have the same accessibility list Acc, there must exist some path x → z 1 → z 2 → … → z k → y from x to y in G. For the same reason, z 1 is accessible from x in G pars, z 2 from z 1 in G pars, … and z k from z k-1 in G pars. Therefore we can find two paths (x →…→y) in G pars : (1) the edge e between x and y (2) the path x → z 1 →z 2 →… →z k →y This is in contradiction to the assumption that G pars is shortcut-free because e is a shortcut. Let’s see an example!

31 31CSIE in National Chi-Nan University x: z 1 z 2 y z 1 : z 2 y z 2 : y Acc : Adj(G pars ) : x: z 1 y z 1 : z 2 z 2 : y Adj(G) : x: z 1 z 2 z 1 : z 2 z 2 : y x y z2z2 z1z1 G x y z2z2 z1z1 G pars A shortcut!

32 32CSIE in National Chi-Nan University Corollary 1 The shortcut-free graph G pars compatible with Acc is a unique graph with the fewest edges among all graphs G compatible with Acc. This corollary follows immediately from Lemma 2.

33 33CSIE in National Chi-Nan University Now, we can proceed to the algorithm.

34 34CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences Outline

35 35CSIE in National Chi-Nan University 1:for all nodes i of G 2:Adj(i) = Acc(i) 3:for all nodes i of G 4:if node i hasn’t been visited 5:call PRUNE_ACC(i) 6:end if 7:PRUNE_ACC(i) 8:for all nodes j Acc(i) 9:if Acc(j) = 10:declare j as visited. 11:else 12:call PRUNE_ACC(j) 13:end if 14:for all nodes j Acc(i) 15:for all nodes k Adj(j) 16:if k Acc(i) 17:delete k from Adj(i) 18:end if 19:declare node i as visited 20:end PRUNE_ACC(i) A recursive pruning algorithm to reconstruct the most parsimonious graph from an accessibility list.

36 36CSIE in National Chi-Nan University This algorithm is based on the following theorem, so we have to get something from the theorem.

37 37CSIE in National Chi-Nan University Theorem 2 Let Acc (G) be the accessibility list of an acyclic digraph, G pars its most parsimonious graph, and V (G pars ) the set of all nodes of G pars. Then the following identity holds: In stead of proving the theorem, we give an example later.

38 38CSIE in National Chi-Nan University 0: 1 2 3 4 5 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: Original Acc(G) 0: 1 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: A possible corresponding G 0 1 2 3 4 5 0: 1 1: 2 2: 3 4 5 3: 4: 5 5: 0 1 2 3 4 5 0 via 1, 2, 3, 4, 5 1 via 2, 3, 4, 5 0 1 2 3 4 5

39 39CSIE in National Chi-Nan University 0: 1 1: 2 2: 3 4 5 3: 4: 5 5: 0 1 2 3 4 5 2 via 3, 4, 5 0: 1 1: 2 2: 3 4 3: 4: 5 5: 0 1 2 3 4 5 4 via 5 0: 1 1: 2 2: 3 4 3: 4: 5 5: 0 1 2 3 4 5 The most parsimonious network

40 40CSIE in National Chi-Nan University Actually, the aforementioned example is an illustration of our algorithm. From this theorem, we can derive Corollary 2.

41 41CSIE in National Chi-Nan University Corollary 2 Let i, j and k be any three pairwise different nodes of an acyclic directed shortcut-free graph G. If j is accessible from i, then no node k accessible from j is adjacent to i. i j k A shortcut !!

42 42CSIE in National Chi-Nan University Computational complexity Let k < n − 1 be the average number of entries in a node’s accessibility list. Assume that there are n genes, that is, n entries.

43 43CSIE in National Chi-Nan University During execution, each node accessible from a node j induces one recursive call of PRUNE_ACC, after which the node accessed from j is declared as visited. Thus each entry of the accessibility list of a node is explored no more than once. Line 15 Line 15 of the algorithm loops over all nodes adjacent to a node j. Let a denotes the average number of entries in Adj (j). Line 15 The overall computational complexity would be O (nka).

44 44CSIE in National Chi-Nan University For practical matters, large scale experimental gene perturbations in the yeast Saccharomyces cerevisiae (n ≈ 6300) suggest that k < 50 ([HMJRS2000]), a ≤ 1 ([W2001a]) and thus nka << n 2.

45 45CSIE in National Chi-Nan University Storage complexity The algorithm stores two copies of the accessibility list, as well as a list of the nodes that has been visited. Because the graph is acyclic, the recursion depth can be no greater than n − 1. Note that k < n − 1 is the average number of entries in a node’s accessibility list. The overall storage requirements are O (nk).

46 46CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks Conclusions Outline

47 47CSIE in National Chi-Nan University Dealing with cycles All we have mentioned are restricted on acyclic graphs. Now let us go to see the problems brought by cyclic graphs.

48 48CSIE in National Chi-Nan University Problems that single gene perturbation can’t solve 1 4 2 3 0 2 3 1 0 4 0:1 2 3 4 1:0 2 3 4 2:0 1 3 4 3:0 1 2 4 4:0 1 2 3 They have the same accessibility list. Therefore, we can not reconstruct the gene network uniquely.

49 49CSIE in National Chi-Nan University 1 4 2 3 0 2 3 1 0 4 Note that the order of direct regulatory interactions in these two networks is different, as reflected in the adjacency lists. 0: 3 1: 4 2: 1 3: 2 4: 0 0: 1 1: 2 2: 3 3: 4 4: 0

50 50CSIE in National Chi-Nan University Instead of solving this problem, we collapse the nodes which form a cycle into a single group of nodes with indistinguishable order of regulatory interactions. Such a single group can be also called a strongly connected component or strong component of a directed graph G. Every two nodes in a strong component are mutually accessible. Let us see an example.

51 51CSIE in National Chi-Nan University 1 5 3 10 2 7 11 15 4 8 12 14 13 6 9 0 10 2 7 11 8 14 13 0 1, 3, 4, 5, 15 6, 9, 12 A single group This graph is called a condensation of G.

52 52CSIE in National Chi-Nan University How do we construct a condensation of a gene network? There are a theorem and a corollary before our presenting the algorithm constructing a condensation of a gene network.

53 53CSIE in National Chi-Nan University Theorem 3 Let P be the accessibility matrix of a digraph G with n nodes, x 1, …, x n. The strong component containing x i is determined by the unit entries of ith row in the matrix. xixi

54 54CSIE in National Chi-Nan University Corollary 3 Let i and j (i ≠ j) be two nodes of a digraph G. i and j are in the same component iff and We use corollary 3 because we will work with accessibility lists, not matrices. Now we are going to present the algorithm.

55 55CSIE in National Chi-Nan University 1:for all nodes i of G 2:if component [i] has not been defined 3:create new node x of G * 4:component [i] = x 5:for all nodes j Acc (i) 6:if i Acc (j) 7:component [j] = x 8:end if 9:end if 10:for all nodes i of G * 11: 12:for all nodes i of G 13: for all nodes j Acc (i) 14:if component [i] ≠ component [j] 15:if component [j] 16:add component [j] to 17:end if 18:end if

56 56CSIE in National Chi-Nan University 1 2 3 4 5 6 7 1 2 3 4 5 6 7 x1x1 x3x3 x2x2 1: 2 3 4 5 6 7 2: 1 3 4 5 6 7 3: 1 2 4 5 6 7 4: 5 6 7 5: 6 7 6: 5 7 7: 5 6

57 57CSIE in National Chi-Nan University 1 2 3 4 5 6 7 x1x1 x3x3 x2x2 1: 2 3 4 5 6 7 2: 1 3 4 5 6 7 3: 1 2 4 5 6 7 4: 5 6 7 5: 6 7 6: 5 7 7: 5 6

58 58CSIE in National Chi-Nan University Storage and time complexity The graph G * has at most the same number of nodes and accessibility list. The algorithm generates only one copy of G * and its accessibility list. Therefore both time and storage complexity are O (k), where k is the average number of entries of the accessibility list. (k < n 2 )

59 59CSIE in National Chi-Nan University Introduction and basic definitions Graph theoretical framework Parsimonious network Algorithm and complexity Cycles in genetic networks ConclusionsReferences Outline

60 60CSIE in National Chi-Nan University Conclusions Genetics is concerned with identifying the gene interactions and their biological significance. Function genomics takes this concern to the next level, that is, identifying gene interactions among thousands of genes in a genome. There are other ways to simplify gene networks, such as Boolean logic design, reduction in symbolic logic, graph theory, and etc..

61 61CSIE in National Chi-Nan University References [BB2001] Arabidopsis Gene Knockout: Phenotypes Wanted, Bouche, N. and Bouchez, D., Curr. Opin. Plant Biol., vol. 4, pp. 111-117. [DIB97] Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale, DeRisi, J. L., Iyer, V. R., Brown, P. O., Science, Vol. 278, pp. 680-686. [ESBB98] Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D., Proc. Natl Acad. Sci. USA, vol. 95, pp. 14863-14868. [FW2000] The Small World of Metabolism, Fell, D. and Wagner, A., Nature Biotechnology, Vol. 18, pp. 1121-1122. [FKZMS2000] Functional Genomic Analysis of C. elegans Chromosome I by Systematic RNA Interference, Fraser, A. G., Kamath, R. S., Zipperlen, P., MartinezCampos, M. and Sohrmann, M., Nature, Vol. 408, pp. 325-330.

62 62CSIE in National Chi-Nan University [GEOCJ2000] Functional Genomic Analysis of Cell Division in C. elegans Using RNAi of Genes on Chromosome III, Gonczy, P., Echeverri, C., Oegema, K., Coulson, A. and Jones, S. J. M. et al., Nature, Vol. 408, pp. 331-336. [H69] Graph Theory, Harary, F., Addison-Wesley, Reading, MA., 1969. [HMJRS2000] Functional Discovery via a Compendium of Expression Profiles, Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J. and Stoughton, R. et al., Cell, Vol. 102, 2000, pp. 109-126. [JTAOB2000] The Large-Scale Organization of Metabolic Networks, Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barebasi, A. L., Nature, Vol. 407, pp. 651-654. [MN99] LEDA: a Platform for Combinatorial and Geometric Computing, Mehlhorn, K. and Naher, S., Cambrige Unversity Press, Cambrige, 1999. [SSBRL99] The Berkeley Drosophila Genome Project Gene Disruption Project: Single P-element Insertions Mutating 25% of Vital Drosophila Genes, Spradling, A. C., Stern, D., Beaton, A., Rhem, E. J. and Laverty, T. et al., Genetics, Vol. 153, 1999, pp. 135-177.

63 63CSIE in National Chi-Nan University [THCCC99] Systematic Determination of Genetic Network Architecture, Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. M., Nature Genet., Vol. 22, 1999, pp. 281-285. [W2000] Mutational Robustness in Genetic Networks of Yeast, Wagner, A., Nature Genet., Vol. 24, 2000, pp. 355-361. [W2001a] Genetic Networks Are Sparse: Estimates Based on a Large- Scale Genetic Perturbation Experiment, submitted, Wagner, A., 2001. [W2001b] The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant Duplicate Genes, Wagner, A., Mol. Bio. Evol., Vol. 18, 2001, pp. 1283-1292. [WF2001] The Small World Inside Large Metabolic Networks, Wagner, A. and Fell, D., Proceedings of the Royal Society of London, Series B, Vol. 268, pp. 1803-1810. [W97] The Structure and Dynamics of Small World Networks, Watts, D. J., PhD Dissertation, Cornell University, 1999. [WSALA99] Functional Characterization of the S. cerevisiae Geneome by Gene Deletion and Parallel Analysis, Winzeler, E. A., Shoemaker, D. D., Astromoffm A., Liang, H. and Anderson, K. et al., Science, Vol. 285, pp. 901-906.


Download ppt "CSIE in National Chi-Nan University 1 How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps Speaker: Chuang."

Similar presentations


Ads by Google