Download presentation
Presentation is loading. Please wait.
Published byBradyn Meares Modified over 9 years ago
1
Gene duplication models and reconstruction of gene regulatory network evolution from network structure Juris Viksna, David Gilbert Riga, IMCS, 10.02.2006
2
Gene regulatory networks [J.Rung,T.Schlitt,A.Brazma,K.Freivalds,J.ViloBioinformatics 18 S2 (ECCB), 202-210 ] Yeast network:
3
Gene regulatory networks Directed graph Graph vertices correspond to genes An edge from gene A to B means that gene B is (directly) regulated by gene A
4
Properties of gene networks (1) Believed to be scale-free (vertex degrees satisfy so-called power law): N(k) – number of vertices with degree k N(k) k
5
Properties of gene networks (1) N(k) k [F.Chung,L.Lu,T.Dewey,D.GallasJCB 10, 677-687]
6
Properties of gene networks (2) Believed to have a noticeable modularity i - vertex k i - number of neighbours for vertex i k i - number of direct links between these k i neighbours Clustering coefficient (for vertex i): C i = 2n i /k i (k i 1)
7
Properties of gene networks (2) Clustering coefficient (for vertex i): C i = 2n i /k i (k i 1) [E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.BarabasiScience 297, 1551-1555]
8
Network evolution models (1) [A.Barabasi, R.AlbertScience 286, 509-512] (i)networks expand continuously by the addition of new vertices, (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions.
9
Network evolution models (2) "Hierarchical" model [E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.BarabasiScience 297, 1551-1555] Sample hierarchical networks (scale-free and modular)
10
Network evolution models (3) "Duplication" model Scale-free with < 2 for ½ < p < 1 [F.Chung,L.Lu,T.Dewey,D.GallasJCB 10, 677-687]
11
Network evolution models (4)
12
Network evolution models (M1) M1
13
M1, p = 0.1, 5000 vertices 4.5
14
M1, p = 0.01, 5000 vertices 3
15
M1, p=0.05, d=0.2, 5000 vertices
16
2.5
17
Network evolution models (M1) M1 VE 2040 50200 100700 50015000 100050000 5000800000
18
Network evolution models (M2) A X'X A X genome evolution
19
Network evolution models (M2) A X'X genome evolution A X'X A X or
20
Network evolution models (M2) M2
21
M2, p = 0.1, 20000 vertices
22
1
23
Network evolution models (M2) M2 VE 2040 5080 100150 500700 10001500 50007000
24
Evolution graphs k+2 vertices two types of edges: - for swappable events (black) - for dependent events (grey)
25
Evolution graphs
26
Initial graph G Graph G' obtained from G after k (in this example k=6) evolution steps Intermediate graphs between G and G' correspond to cuts of evolution graph (G and G' can also be obtained in this way) Numbered vertices correspond to evolution steps and are marked by the vertices duplicated in the corresponding steps
27
Evolution graphs – some questions Equivalence Decide whether 2 given evolution graphs are equivalent Irreducible networks – networks that can’t be obtained from simpler networks by evolution graph Uniqueness of evolution Is it possible that D(G 1,E 1 )= D(G 2,E 2 ) for two different irreducible networks G 1 and G 2 ?
28
"Reverse engineering" problems Given: Reconstruct: G' G E
29
"Reverse engineering" problem (1) (Assuming either model M1 or M2.) Reconstruction of evolution graph For a given network N’ find an irreducible network N, the sequence of duplication events D 1,...,D m and the corresponding evolution tree, such that N’=D(N,E).
30
"Reverse engineering" problem (2) (Assuming either model M1 or M2.) Reconstruction of duplication event For a given network N’ find a network N and a duplication event D, such that N’=D(N).
31
"Reverse engineering" problem (3) (Assuming either model M1 or M2.) Reconstruction of the largest duplication event For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N).
32
"Reverse engineering" - complexity For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N). at least as hard as graph isomorphism problem likely NP-hard (maximum clique for reconstruction graphs) reconstruction graphs are much smaller than networks still might be practically solvable for random graphs of reasonable size (few tens of thousands of vertices).
33
Algorithm – stage 1 Partition G' vertices into orbits Can be done e.g. with nauty package One can try to use some property p which is more simple to compute than automorphisms and is such that p(G 1 )=p(G 2 ) for isomorphic graphs G 1 and G 2.
34
Reconstruction graphs Vertices correspond to non-singleton orbits Two types of edges: - (1) have to participate in the same duplication event (solid) - (2) can not participate in the same duplication event (dotted)
35
Algorithm – stage 2 Find reconstruction graph
36
Algorithm – stage 3 Find the largest independent set (according to type 2 edges) in reconstruction graph
37
Algorithm – stage 4 - if all selected orbits contain just 2 nodes, we are practically done - otherwise we have to find a pair of (largest) sets of vertices from selected orbits, which correspond to duplication event [currently exhaustive search]
38
Algorithm Evolution graph can be reconstructed by repeated use of Largest duplication event
39
Algorithm - efficiency - using nauty we can deal with networks with < 200 genes - for larger graphs one can use heuristics to compute orbits - vertex/edge counts at different DFS levels seems to work quite well - likely to find a large part of duplication event - for <200 vertices often gives the exact result
40
Algorithm – Model 2 General case – check automorphisms for all k-tuples of vertices A serious problem even for k=2 However, large components are duplicated not that often Previous algorithm could be used to find "large" part of duplicated genes Still an open problem Also, a question about good heuristics
41
Model 2 – Component sizes Model M2 550 vertices 132 duplications
42
Model 2 – Component sizes Constructing random network with 20000 genes: Component sizes#of events 1177008 2342 397 449 537 618 713 810 10,11,144 9,12,13,15,273 16,242 17,18,21,22,31,271
43
Experiments with yeast network 6270 genes 106 regulators
44
Experiments with yeast network p=0.0001 E=106 V=216
45
Experiments with yeast network 277 pairs of duplication candidates were discovered Few "real": COS5 and COS8, YLR460C and YNL134L All 5962 genes were compared all-v-all using SW Normalized compression score: ssearch_score(P 1,P 2 )/min{length(P 1 ),length(P 2 )} Scores for the found duplication pairs were compared with average values
46
Experiments with yeast network Observed distances vs average, all non-adjacent gene pairs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.