Download presentation
Presentation is loading. Please wait.
A Linear-Time Algorithm for Computing Inversion Distance between signed Permutations with an experimental Study David Bader, Bernard Moret, Mi Yan Presented by Anat Heilper
Motivation Evolutionary process on many single-chromosome organisms, consists mostly of inversions. Due to that, phylogenies are reconstructed based on gene order, using inversion distance as a measure of evolutionary distance between 2 genomes.
Model assumptions Each genome is an ordering (circular/linear) of a fixed set of genes {g1,g2,…gn}. Each gene is given with an orientation: positive(gi), or negative(-gi).
Inversion Let G be the genome with the signed ordering(linear or circular) g 1, g2,… g n. Inversion(i, j), i j: g 1, g 2,…, g i-1, g i, g i+1,…,g j, g j+1,…, g n g 1, g 2,…, g i-1, -g j, -g j-1,…, -g i, g j+1,…,g n
What happens if j<i? Circular case: rotate the circular ordering until the proper relationship between the indices is received. Linear case: not applicable. Inversion
Inversion Distance Minimum number of inversions needed to transform one permutation into the other. Computing Inversion Distance between unsigned permutations is NP-hard.
Inversion Distance Sorting Computing the shortest sequence of inversions can be also treated as a sorting problem. Also Known as “sorting by reversals”.
Previous work: Hannenhalli and Pevsner, 1995 – first polynomial-time algorithm. Berman and Hannenhalli, 1996 – runs in O(n (n)), where (n) is the inverse Ackerman function Bader, Moret, and Yan, 2000 – runs in linear-time algorithm.
What we shall see today? Present a simple and practical, linear time algorithm to compute the connected components of the overlap graph. Present experimental evidence that this algorithm is really efficient.
Transformation Given a signed permutation of {1…n}, we transform it into an unsigned permutation of {0… 2n+1}: Substitute each positive element x by the ordered pair (2x-1, 2x). Substitute each negative element –x by the ordered pair (2x,2x-1). (0) = 0, (2n+1) = 2n+1.
Example +3 +9 -7 +5 -10 +8 +4 -6 +11 +2 +1 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 0 5 6 17 18 14 13 9 10 20 19 15 16 7 8 12 11 21 22 3 4 1 2 23 X (2x-1, 2x) -X (2x,2x-1) (0) = 0, (2n+1) = 2n+1
Assumptions Both permutations have been turned into unsigned permutations. Both unsigned permutations are transformed so that the first permutation becomes the identity permutation (0,1,..2n,2n+1). This is why this problem can be viewed as sorting: find the number of inversions needed to transform the given permutation into the identity permutation.
Cycle graph We represent the unsigned permutation by an edge-colored graph called the “cycle graph”. Properties of the graph: 2n+2 vertices. For each i, 0 i n, there’s a gray edge between vertices (2i) and (2i+1). There is a black edge between vertices 2i and 2i+1.
Cycle graph 01234567891011121314151617181920212223 05617181413910201915167812112122341223 i (i) The resulting graph consists of disjoint cycles in which edges alternate colors.
Overlapping edges 2 gray edges ( (i), (j)) and ( (k), (t)) overlap if the 2 intervals [i,j] and [k,t] overlap but neither one contains the other. 01234567891011121314151617181920212223 05617181413910201915167812112122341223 i (i)
Overlapping cycles 2 cycles C 1 and C 2 overlap if there exist overlapping gray edges e 1 C 1 and e 2 C 2. Extent of a cycle C: is the interval [C.B, C.E] where C.B = min {i| i C}, B stands for the beginning of the cycle, and C.E = max{i| i C}, E stands for the ending of the cycle. Extent of a set of cycles {c1…ck} is [B,E] where B = min {C i.B| 0 i k} and E = max{C i.E| 0 i k}.
Overlap Graph of permutation One vertex for each cycle in the cycle graph. An edge between any 2 vertices that correspond to overlapping cycles.
Overlap graph 01234567891011121314151617181920212223 05617181413910201915167812112122341223 i (i) a b c d e f One vertex for each cycle in the cycle graph.
Overlap graph a b c d e f afbdc e An edge between any 2 vertices that correspond to overlapping cycles.
By now we know how to: Transform a signed permutation to an unsigned permutation. Create an overlap graph from the cycle graph. Compute the minimum number of inversion and inversion sequence from the overlap graph. The bottleneck of the algorithm is building the overlap graph(o(n^2)). How can we improve the algorithm? Build a much smaller graph which captures the same information as the overlap graph.
Building the overlap forest Creating the overlap forest in two scans of the permutation: First scan: Build a trivial forest F 0 in which each node is its own forest. Second scan: Iterative refinement of the forest, at each iteration we expand the domain where we search for overlapping cycles with one node. (2n+2 iterations overall). Active tree: a tree rooted at f is active at stage j whenever j lies properly within the extent of f; the extent of the active trees is stores in a stack
Let F j-1 be the forest constructed from processing the elements 0 through j-1. Let f be the cycle containing element j of the permutation. Build F j from F j-1 as follows: If j is the beginning of its own cycle f then it is the root of a single node tree. Otherwise if cycle f overlaps with cycle g, add an edge (g,f) and compute the combined extent of g and of the tree rooted at f. Building the overlap forest – refinement stage
Building the overlap graph 1)Scan the permutation, label each position i with C[i].B, and set up [C[i].B, C[i].E]. 2) Initialize empty stack. 3)For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While(top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while} Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)if (i == top.E) then pop top. i is the beginning of its own cycle. Top is the top element of the stack The cycle i belongs to intersection with another cycle The tree at the top of the stack isn’t active anymore
Example of building an overlap forest i (i) C[i].B C[i].E C[i] First stage: setup a b c d e f 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF i (i) C[i].B C[i].E C[i] i = 0: a)i = C[0].B push A b)extent A while(0=top.B>C[i].B=0)?NO C[0].B 0 C[0].E 21 c) if(0==21)?NO A A extent top For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top.
A i (i) C[i].B C[i].E C[i] A extent top i = 1: a)1=i C[1].B=0 b) extent A while(0=top.B>C[i].B=0) C[1].B 0 C[1].E 21 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top.
B A i (i) C[i].B C[i].E C[i] B extent i = 2: a)i = C[2].B push B b) extent B while(2=top.B>C[i].B=2 ) C[2].B 2 C[2].E 13 i = 3: very similar to i = 2. top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top.
C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. C extent i = 4: a)i = C[4].B push C b) extent C while(4=top.B>C[i].B=4) C[4].B 4 C[4].E 11 i = 5: very similar to the case of i = 4. i (i) C[i].B C[i].E C[i] top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
D C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. D extent i = 6: a)i = C[6].B push D b) extent D while(6=top.B>C[i].B=6) C[6].B 6 C[6].E 15 i = 7: very similar to the case of i = 6. i (i) C[i].B C[i].E C[i] top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
E D C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. E extent i = 8: a)i = C[8].B push E b) extent E while(8=top.B>C[i].B=8) C[8].B 8 C[8].E 17 i = 9: very similar to the case of i = 8. i (i) C[i].B C[i].E C[i] top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
E D C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. C extent i = 10: a)i C[10].B b) extent C while(8=top.B>C[i].B=4) extent.B min{4,8} extent.E max{11, 17} parent[ 8] 4 pop top while(top.B=6>C[i].B =4) extent.B min{4,6} extent.E max{17,15} parent[ 6] 4 pop top i (i) C[i].B C[i].E C[i] top D C B A c e c ed [4,17] 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 11 15 17 11 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. E extent i = 10 continue: while(top.B = 4>C[i].B =4)?NO top.B min{4,4} top.E max{17,11} c) )if (10 == 17)? no i (i) C[i].B C[i].E C[i] top c ed [4,17] 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 17 15 17 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. C extent i = 11 : a)11=i C[11].B=4 b)extent C while(4=top.B>C[i].B= 4) top.b min{4,4} top.E max{17,17} c) )if (11 == 17)? NO i (i) C[i].B C[i].E C[i] top c ed [4,17] 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 13 17 15 17 13 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
C B A For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. B extent i (i) C[i].B C[i].E C[i] top c ed [2,17] i = 12 : a)12=i C[12].B=2 b)extent B while(4=top.B>C[i].B= 2) Extent.B min{2, 4} extent.E max{13, 17} Parent[4] 2 Pop top while(2=top.B>C[i].B=2)?NO top.B min{2,2} top.E max{17,13} c) if(12==17)?NO b B A top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 17 15 17 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. B extent i (i) C[i].B C[i].E C[i] c ed i = 13 : a)13=i C[13].B=2?NO b)extent B while(2=top.B>C[i].B= 2) top.B min{2,2} top.E max{17,17} c) if(13==17)?NO b B A top [2,17] 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 17 15 17 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. D extent i (i) C[i].B C[i].E C[i] c ed i = 14 : a)14=i C[14].B=6?NO b)extent D while(6=top.B>C[i].B= 6)?NO top.B min{6,2} top.E max{15,17} c) if(14==17)?NO i = 15..16: similar to i=14 i=17: pop top b B A top [2,17] A top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 17 15 17 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. i (i) C[i].B C[i].E C[i] i = 18: a)i = C[18].B push F b)extent F while(18=top.B>C[i].B=18)?NO C[18].B 18 C[18].E 23 c) if(18==23)?NO i=19: very similar to i=18 F extent c ed b F A top 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 17 15 17 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While (top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while } Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)If(i==top.E) then pop top. i (i) C[i].B C[i].E C[i] i = 20: a)i C[20].B b)extent A while(18=top.B>C[i].B=0) extent.B min{0,18} extent.E max{21, 23} parent[18] 0 Pop top i=21: i==top.E pop top A extent c ed b F A top F A A 01234567891011121314151617181920212223 05617181413910201915167812112122341223 00224466884422668818 00 21 17 15 17 15 17 23 21 23 AABBCCDDEECCBBDDEEFFAAFF
Lemma 1 At iteration i of step (3) of the algorithm, if the tree rooted at top is active and i lies on cycle f and we have f.B < top.B, h In the tree rooted at top such that h overlaps with f. Proof: top is active it must have been pushed onto the stack before the current iteration (top.B <i) and we didn’t reach the end of top’s extent yet (i < top.E). i must be contained in top’s extent (top.B<i<top.E). Since i lies on the cycle f that begins before top (f.B<top.B), there must be an edge from cycle f that overlaps with top
Theorem The algorithm produces a forest in which each tree is composed of exactly those nodes that form a connected component.
Proof after each iteration, the trees in the forest correspond exactly to the connected components determined by the permutation values scanned up to that point (induction on the number applications of step 3 of the algorithm: Base: each tree of F0 has a single node and no two nodes belong to the same connected component. Step: assume invariant after (i-1) iterations and let i lie on cycle f. We prove that the nodes of the tree containing i form the same set as the nodes of the connected component containing i (other connected components are unaffected and so still obey the invariant).
Proof A node in the tree containing i must be in the same connected component as I. i = f.B: then, nothing changes in the overlap graph ( and thus in the connected components); from step(3): the forest remains unchanged, so that the invariant is preserved. i>f.B: ( top,f) will be added to the forest whenever f.B<top.B holds. (top,f) will join the sub tree rooted at f with that rooted at top into a single sub tree. f.B<top.B h in the tree rooted at top such that h and f overlap (lemma 1) (h,f) must belong to the overlap graph (h,f) connecting the connected component containing f with that containing top merging them into a single connected component the invariant is preserved.
Whenever (j,i) and (k,l) with j<k<i<l, are gray edges on cycles f and h, respectively, then edge (f,h) must belong to the overlap graph. In such a case, our algorithm ensures that edge (h,f) belong to the overlap forest. Proof A node in the same connected component as I must be in the tree containing i. jkil h f
Complexity: 1)Setup and Initialize empty stack. 2)For i 0 to 2n+1 a)If (i == C[i].B) then push C[i]. b)Extent C[i]. While(top.B > C[i].B){ Extent.B min{extent.B, top.B} Extent.E max{ extent.E, top.E} parent[top.B] C[i].B Pop top End while} Top.B min{extent.B, top.B} Top.E max{extent.E, top.E} c)if (i == top.E) then pop top. Linear time Each cycle is inserted exactly once to the stack, and is poped after adding an edge to the graph, or after passing the end of it. Thus also linear time Every step of the algorithm takes linear time,so the entire algorithm, runs in worst case linear time.
Experimental setup Signed permutations of length 10, 20, 40, 60, 80, 160, 320, and 640: For each length, 10 groups of 3 signed permutations were generated from the identity permutation using Nadeau and Taylor’s model 5 evolutionary rates were used: 4, 16, 64, 236, and 1024 inversions per edge. For each length, 10 groups of 3 permutations were also generated and used as an extreme test case.
For each of these test suites, the 3 distances among the 3 genomes in each group were computed 20,000 times in tight loop. An average and standard deviation were computed over the 10 groups. Computed inversion distance are expected to be at most twice the evolutionary rate, since there are 2 edges between each pair of genomes. Experimental setup
Run time of connected components computation as a function of the permutation size
Total run time as a function of the permutation size
Inversion distance as function of the permutation size
Speed comparison of Bader linear time algorithm and that of the UF approach(only connected components)
Speed comparison of Bader’s linear-time algorithm and that of the UF algorithm
summary Presented a simple, practical, linear time algorithm for computing inversion distance between two signed permutations, with a detailed experimental study.
Similar presentations
© 2025 Inc.
All rights reserved.