1 Michal Ozery-Flato and Ron Shamir
2 The Genomic Sorting Problem HOW?
3 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case) Preliminaries Reduction to a simpler case The main algorithm (reduced case)
4 Genome Modeling
5 Genome Modeling Chromosome flip
6 Reciprocal Translocations Exchange non-empty ends between two chromosomes Prefix-prefix Prefix-postfix X1X2Y1Y2X1X2 Y1Y2 X1X2Y1Y2-Y1-X2
7 Sorting by Reciprocal Translocations Tails {(1, 2,-4), (-3, 5),(6,-8,-7,9)} = {1, 4, -3,-5, 6, -9 } A B: –genes(A) = genes(B) –Tails (A) = Tails(B) An O(n 3 ) algorithm (Hannenhalli 96, Bergeron et al. 06) reciprocal translocations
8 The Cycle Graph cycle graph(A,B) external internal adjacency #cycles(A,B) =3 A={(4, -1), (-3,-2, 5), (6,-7,8)} B={(1,2,3), (4,5), (6,7,8)}
9 A = (4, -1, -3,-2, 5, 6 -7,8) (concatenation of A’s chrs) The Overlap Graph (with Chromosomes) edge chromosome Overlap graph (A, B, A ) ( 1,2 )( 4,5 )( 2,3 )( 6,7 )( 7,8 )
10 (Connected) Components Overlap graph (A, B, A ) ( 1,2 )( 4,5 )( 2,3 )( 6,7 )( 7,8 ) bad component = non-trivial internal component trivial component = adjacency
11 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case)
12 The Reciprocal Translocation Distance d RT (A,B) = reciprocal translocation distance Theorem [Hannenhalli 96, Bergeron et al. 06] : d RT (A,B) = #genes - #chrs - #cycles(A,B) + F(A,B) –F(A,B) = depends on the topology of the bad components. If there are no bad components then F=0.
13 Reduced Case: No Bad Components Result 1: The problem “Sorting by Reciprocal Translocations” can be reduced to the problem “Sorting by Reciprocal Translocations, No Bad Components” in linear time.
14 Reduction’s Main Idea Isolation: all bad components are found in one chromosome. Goal: eliminate the bad components without creating –Maintain two lists of chromosomes: Exactly one minimal bad component Two or more minimal bad components –Use prefix-prefix translocations (no sign changes)
15 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case)
16 Translocations Defined by External Edges e = external edge (e) = transforms e into an adjacency –Increases #cycles(A,B) –May create a bad component d RT (A,B) = #genes – #chrs – #cycles(A,B) +F(A,B) 1 2 e G y x 1 2 G (e) e yx
17 The Main Algorithm 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 “Farward part” (S) “Backward part” (L) Solution
18 The Main Algorithm LSUnused edgesA 1,3,4,5(1,-5,6) (3,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
19 The Main Algorithm LSUnused edgesA 1 3,4,5(3,-4,-5,6) (1,2) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
20 The Main Algorithm LSUnused edgesA 1 3,4,5(1,-5,6) (3,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
21 The Main Algorithm LSUnused edgesA 143,53,5(3,6) (1,-5,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
22 The Main Algorithm LSUnused edgesA 14,35(-2,6) (1,-5,-4,-3) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
23 The Main Algorithm LSUnused edgesA 14,3 (-2,6) (1,-5,-4,-3) 1.Mark all edges (except adjacencies) as “unused”, S , L 2.While there is an unused external edge e a.Mark e as “used” b.If (e) (FIRST(L)): Apply (e) to A and APPEND (S, e) 3.If all the edges are used return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i
24 Implementation of the Algorithm Simple O(n 2 ) time implementation time implementation using a data structure that: –Maintains a fragmented signed permutation –Allows one to find an external edge e and perform the translocation (e) in time –Based on a data structure by Kaplan & Verbin 05'
25 Thank You !
26 Simulating Translocations by Reversals [Hannenhalli & Pevzner] A translocation can be simulated by: A reversal on A, or A chromosome flip in A + a reversal on A cycle graph(A,B)
27 Working on the overlap graph H = overlap graph(A, B, A ) H is sorted if every component is trivial Operations: – (v) : a reversal on an oriented external vertex v (cost = 1) – (X) : a flip on chromosome X (cost = 0)
28 H● (v) (two chromosome only) unoriented edge oriented edge chromosome H v unoriented edge oriented edge chromosome H● (v) v unoriented edge oriented edge chromosome H v
29 H● (X) unoriented edge oriented edge chromosome H X unoriented edge oriented edge chromosome H● (X) X unoriented edge oriented edge chromosome H X