Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3: Genome Rearrangements and Duplications

Similar presentations


Presentation on theme: "Lecture 3: Genome Rearrangements and Duplications"— Presentation transcript:

1 Lecture 3: Genome Rearrangements and Duplications

2 Breakpoint graph 1-dimensional construction
Transform p = < 2, -4, -3, 5, -8, -7, -6, 1 > into g = < 1, 2, 3, 4, 5, 6, 7, 8 > by reversals. Vertices: i ® ia ib i ® ib ia and 0b, 9a Edges: match the ends of consecutive blocks in p, g Superimpose matchings

3 Breakpoint graph Breakpoints
Each reversal goes between 2 breakpoints, so d ³ # breakpoints / 2 = 6/2 = 3. Theorem (Hannenhalli-Pevzner 1995): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = – = 4 Breakpoints are not independent. Breakpoint graph shows dependencies between the breakpoints.

4 Breakpoint graph Þ rearrangement scenario

5 Oriented and Unoriented Cycles
Proper reversal acts on black edges: c(ρ π) – c (π) = 1 C F Unoriented Cycles E No proper reversal acting on an unoriented cycle These are “impediments” in sorting by reversals.

6 Interleaving Edges Interleaving edges are grey edges that cross each other Example: Edges (0,1) and (18, 19) are interleaving Cycles are interleaving if they have an interleaving edge These 2 grey edges interleave

7 Interleaving Graphs An Interleaving Graph is defined on the set of cycles in the Breakpoint graph and are connected by edges where cycles are interleaved A A B B D C C E E F F B D A E F C

8 Interleaving Graphs Label oriented cycles. Component oriented if contains oriented cycle. A A B B D C C E E F F B D A E F C

9 Interleaving Graphs Remove oriented components from interleaving graph. A A B C D E E F C D B A E F

10 Hurdles Hurdle: Minimal or maximal unoriented component under containment partial order. A A E E A E h(π) = 1

11 Reversal Distance with Hurdles
Hurdles are obstacles in the genome rearrangement problem They cause a higher number of required reversals for a permutation to transform into the identity permutation 3 2 1 3 -1 -2 1 -3 -2 c(π) = 2, h(π) = 1 1 2 3 Every hurdle can be transformed into oriented cycles by reversal on arbitrary cycle in hurdle.

12 Reversal Distance with Hurdles
Hurdles are obstacles in the genome rearrangement problem They cause a higher number of required reversals for a permutation to transform into the identity permutation Let h(π) be the number of hurdles in permutation π Taking into account of hurdles, the following formula gives a tighter bound on reversal distance: d(π) ≥ n+1 – c(π) + h(π) Every hurdle can be transformed into oriented cycles by reversal on arbitrary cycle in hurdle. ** Doing so, might cause problems with overlapping hurdles

13 Superhurdles “Protect” non-hurdles
Deletion of superhurdles creates another hurdle

14 Superhurdles “Protect” non-hurdles
Deletion of superhurdles creates another hurdle Superhurdle Superhurdle

15 Superhurdles “Protect” non-hurdles
Deletion of superhurdles creates another hurdle Hurdle Hurdle

16 Fortresses A permutation π with an odd number of hurdles, all of which are superhurdles Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f where c = # cycles; h = # hurdles f = 1 if π is fortress.

17 GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph

18 GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph

19 Coming Next Other rearrangement operations
Duplications Rearrangements and Phylogeny Multiple Genomic Distance Problem: Given permutations 1, …, k find a permutation  such that k=1, k d(1, ) is minimal.

20 Other Types of Rearrangements
So far: Discussed reversals. Also: translocations, fissions, fusions (modeled as reversals in concatenate of chromosomes ( ) (–6 – –2) ( –2) (–6 –1 4 10)

21 Other Types of Rearrangements
Transpositions Duplication Transposition Duplications are very frequent in cancer genomes.

22 Duplications HARD!!! (NP-hard?) What problem to solve?
Given G  {1, .., n}N (“permutation with duplicates”) Find reversals 1, 2, …, t, duplications 1, …, s, and permutation  such that  (1, …, t, 1, …, s) i = G and s + t is minimal ??? HARD!!! (NP-hard?)

23 Duplications (2) What problem to solve?
Given: G  {1, .., n}N (“permutation with duplicates”) , H =  G for some permutation  Find: Reversals 1, 2, …, t such that 1 …t G = H and t is minimal Signed reversal distance with duplicates NP-hard (Chen, et al. 2005) If 1-1 mapping of repeated elements (orthologs) in G to H then problem reduces to reversal distance.

24 Duplications (3) Solution when at most two duplicates per gene
What problem to solve? Given: P {1, .., n}N (permutation with duplicates) Find: Permutation  and reversals 1, 2, …, s, duplications 1, … t such that 1, …, s1, …, t  = P and t minimal. Solution when at most two duplicates per gene El-Mabrouk and Sankoff (2002)

25 Whole Genome Duplication
Genome is doubled – extra copy of each element. Subsequently undergoes reversals. Genome Halving Problem. Given a duplicated genome P, recover the ancestral pre-duplicated genome R minimizing the reversal distance from the perfect duplicated genome R © R to the duplicated genome P. (El-Mabrouk and Sankoff )

26 Whole Genome Duplication
Genome is doubled – extra copy of each element. Subsequently undergoes reversals. If copies of each element labeled uniquely, then problem reduces to reversal distance problem.

27 Reversal Distance and Duplications
Let d(G,H) = reversal distance b/w G and H Problem of computing d(P, R  R) is unsolved minR d(P, R  R) solvable in polynomial time

28 Breakpoint Graph p g G( p,g ) 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t
2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0b 2a 2b 4b 4a 3b 3a 5a 5b 8b 8a 7b 7a 6b 6a 1a 1b 9a

29 Genome Halving: Exhaustive
Doubled genome with 2n genes Compute reversal distance on all 2n labeling of genes.

30 Genome Halving Weak Genome Halving Problem. For a given duplicated genome P, find a perfect duplicated genome R © R and a labeling of gene copies that maximizes the number of black-gray cycles c(G) in the breakpoint graph G(P,R © R) of the labeled genomes P and R  R. (Alekseyev and Pevzner 2006) Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f where c = # cycles; h = # hurdles f = 1 if π is fortress.

31 Contracted Breakpoint Graph
Breakpoint graph construction p 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5h 5t 8h 8t 7h 7t 6h 6t 1t 1h 9t Implicit were obverse edges (xt, xh)  is black-obverse alternating path  is gray-observe alternating path

32 Contracted Breakpoint Graph
With duplicates, pair of vertices with same label. Contract these identical vertices

33 Contracted Breakpoint Graph
P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e G’(P,R © R) Each gray edge is pair of parallel edges

34 Cycle Decompositions Genomes P and Q
G(P,Q) breakpoint graph for some labeling Black-gray cycle decomposition ??? G’(P,Q) contracted breakpoint graph Induced black-gray cycle decomposition Labeling Problem. Given a black-gray cycle decomposition of the contracted breakpoint graph G′(P,Q) of duplicated genomes P and Q, find labeling of P and Q that induces this cycle decomposition.

35 Maximal black-gray cycle decomposition
P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e G’(P,R © R) BG graph corresponding to G’ Maximal black-gray cycle decomposition

36 P as black-observe cycle
Cycle Decomposition P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e P as black-observe cycle

37 Genome Halving Algorithm: Outline
Input: Doubled genome P Construct BO (black-obverse) graph for P by gluing identical edges Introduce gray edges “optimally” to create BOG (black-observe-gray) graph G’ with single gray-observe cycle (!!!) R = gray-observe cycle in G’ Find maximal black-gray cycle decomposition of G’ Q = R  R


Download ppt "Lecture 3: Genome Rearrangements and Duplications"

Similar presentations


Ads by Google