Download presentation
Presentation is loading. Please wait.
1
1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena SUNY Stony Brook
2
2 Genome Rearrangement events –duplication –translocation –reversal (inversion) occur primarily during reproduction allow large-scale genomic comparisons
3
3 Sorting by Reversals genome represented as a permutation on 1, 2, …, n – n = # homologous genes among species assumptions –can identify genes –genes are distinct operation: reversal of a subsequence (of genes) –models inversion (occurs during crossover) one of the permutations can be 1, 2, …, n –appropriately relabel others
4
4 6 reversal in our model (for f(l) = l ): cost = 18 Example 4328715611109 432178569 11 1234876591011 1234567891011
5
5 Our Model unsigned cost of reversal of subsequence of length l is f(l) total sorting cost (or distance) is f (length(s j )) Sj are reversed subsequences
6
6 Cost Functions additive f(x+y) = f(x) + f(y) subadditive f(x+y) < f(x) + f(y) superadditive f(x+y) > f(x) + f(y) other –e.g. bitonic f(l)
7
7 Problems algorithm to sort any permutation –worst-case min cost approximate min cost for a given permutation
8
8 Extremal Costs highly subadditive: e.g. unit cost, f(l) = 1 –NP complete [Caprara, ’97] –series of approximation ratios: 2, 1.75, 1.375 highly superadditive: f(l) > l 2 –essentially bubblesort
9
9 Our Results additive cost function –specifically f(l) = l QuickSort-like algorithm for worst-case –complexity: O(n lg 2 n) min cost approximation ratio of O(lg 2 n)
10
10 MedianEject(a,b) find r maximal blocks of wrong-sided elements with respect to median for lg r do:flip every other pair of blocks of wrong-sided and adjacent blocks move wrong-sided blocks to median boundary reverse left and right blocks
11
11 complexity: O((b-a) lg r) Sample Run
12
12 ReversalSort(a,b) MedianEject (a,b); ReversalSort (a, ); ReversalSort (,b); Complexity T(n) = 2 T ( ) + O(f(n) lg n) O(f(n)lg 2 n) = O(n lg 2 n) for f(n)~n 2 n
13
13 Algorithmic Improvements Isimplify “short” phases IImerge 2 last steps of MedianEject when possible ( 2p+q vs. 3p+q ) IIIapply II recursively pqp
14
14 Approximation Ratio M(p) is the maximal total distance between pairs of out-of order elements Lemma 4:min cost is (M(p)) but Lemma 6: # of out-of order elts < 3 M(p) + Lemma 7:MedianEject touches only elements within linear range from out-of-order elements yields: each round of MedianEject takes O(M(p) lg 2 n) ReversalSort costs O(M(p) lg 2 n) ReversalSort is at most O((lg 2 n) times optimal
15
15 use our cost (= distance) to build phylogenetic trees 4 plants (chloroplastic genes) consistent with [Martin et al., PNAS Sept ‘02] work in progress [M. Shoham] Bioinformatic “Validation” CyanophoraCyanidiumGuilardiaPorphyra
16
16 weighted genes tighter approximation ratio –close to O(lg n) –can get to constant? other cost functions (incl. bitonic) the signed case Open Problems: Algorithmic
17
17 chromosomal ordering what is the right cost function? –consider cost (l) = l d combine with constant-based models –restricted regions –“undesired” reversal sequences deal with duplication and translocation events Open Problems: Modeling
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.