Download presentation
Presentation is loading. Please wait.
1
Genomic Sorting with Length-Weighted Intervals 236818 - Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf Merschon
2
What we saw so far Current algorithms of genome rearrangements ignore the length of reversals; rather they only count their number. Current algorithms of genome rearrangements ignore the length of reversals; rather they only count their number. Traditionally, such analysis assumes that each reversal is of unit cost. Traditionally, such analysis assumes that each reversal is of unit cost.
3
Motivation The assumption of unit cost reversals is not completely defensible biologically: The assumption of unit cost reversals is not completely defensible biologically: A longer genomic reversal will cause more upheaval to the organism, resulting in a lower likelihood of the organism surviving to pass the mutation. A longer genomic reversal will cause more upheaval to the organism, resulting in a lower likelihood of the organism surviving to pass the mutation. The mechanics of genome reversal may suggest that probabilities of reversals depends on their length (among other factors). The mechanics of genome reversal may suggest that probabilities of reversals depends on their length (among other factors).
4
The topics covered On top of the surface: On top of the surface: Introduction to Genomic Sorting with Length- Weighted Intervals. Introduction to Genomic Sorting with Length- Weighted Intervals. Lower and upper bounds on complexity of solution. Lower and upper bounds on complexity of solution. Proofs (Partial). Proofs (Partial). Down under: Down under: Improved bounds on Sorting with Length-Weighted Reversals (Extended Abstract). Improved bounds on Sorting with Length-Weighted Reversals (Extended Abstract). Concept and examples. Concept and examples. Sorting by Length-weighted Reversals: Dealing with Signs and Circularity. Sorting by Length-weighted Reversals: Dealing with Signs and Circularity. General approach to solutions. General approach to solutions.
5
Goal Find an algorithm that efficiently sorts one sequence into another by reversals under length sensitive cost models. Find an algorithm that efficiently sorts one sequence into another by reversals under length sensitive cost models. Focus is on sorting unsigned permutations by reversals. Focus is on sorting unsigned permutations by reversals. The problem remains NP-hard in our new model and hence we will try to reach approximation results. The problem remains NP-hard in our new model and hence we will try to reach approximation results.
6
Definitions (1) Let the function denote the cost of a reversal of length. Let the function denote the cost of a reversal of length. Traditionally,. Traditionally,. We say a function is: We say a function is: Additive if Additive if Subadditive if Subadditive if Superadditive if Superadditive if
7
Definitions (2) A Reversal Graph of permutations of length n is a graph where: A Reversal Graph of permutations of length n is a graph where: The vertices are all the permutations of length n. The vertices are all the permutations of length n. There is an edge ( p 1, p 2 ) if of weight if there exists one -reversal that transforms the permutation p 1 into the permutation p 2. There is an edge ( p 1, p 2 ) if of weight if there exists one -reversal that transforms the permutation p 1 into the permutation p 2.
8
Wanted Results 1. Minimize the cost sufficient to sort any permutation of n elements (actually achieving an upper bound). Equivalent to computing the diameter of the reversal graph under the shortest-path metric. 2. Approximate the minimum-cost reversal sequence for a given permutation. We would like a heuristic that assures the resulting sequence costs no more than a slowly growing function of n times that of the optimal sequence.
9
Important notes The relatively coarse bounds generated by the following techniques applying them to biological data. The relatively coarse bounds generated by the following techniques applying them to biological data. The work presented leads to interesting algorithmic results and raises some interesting questions as a basis for further bioinformatics studies. The work presented leads to interesting algorithmic results and raises some interesting questions as a basis for further bioinformatics studies.
10
Previous Work Unit cost, unsigned reversals was shown to be NP-hard by Caprara. Our problem inherits hardness under more general metrics from this result. Unit cost, unsigned reversals was shown to be NP-hard by Caprara. Our problem inherits hardness under more general metrics from this result. Kececloglu & Sankoff gave approximation algorithms on reversal distance that guarantee results at most 2 times optimal. Kececloglu & Sankoff gave approximation algorithms on reversal distance that guarantee results at most 2 times optimal. Bafna & Pevzner improved this to a factor of 7/4. Bafna & Pevzner improved this to a factor of 7/4. Berman et al improved this factor to 1.375. Berman et al improved this factor to 1.375. Minimum-cost unsigned reversal sorting has been studied also under models where cost increases so dramatically that only length-2 reversals are afforded. Minimum-cost unsigned reversal sorting has been studied also under models where cost increases so dramatically that only length-2 reversals are afforded. Experiments were done on both mitochondrial genomes of two fungi as well as on random samples. They suggest that length may play an important role in biasing certain rearrangement patterns. Experiments were done on both mitochondrial genomes of two fungi as well as on random samples. They suggest that length may play an important role in biasing certain rearrangement patterns.
11
Goal 1 – Bounding the diameter of the Reversal Graph By bounding the diameter of the Reversal Graph, we establish an upper bound on the cost of sorting any n -element permutation. By bounding the diameter of the Reversal Graph, we establish an upper bound on the cost of sorting any n -element permutation. Standard sorting algorithms exhibit interesting performance on highly subadditive and superadditive functions, but not additive measures. The primary result of this section is a new reversal-based sorting algorithm which performs well on additive cost functions. (Examples in next slides). Standard sorting algorithms exhibit interesting performance on highly subadditive and superadditive functions, but not additive measures. The primary result of this section is a new reversal-based sorting algorithm which performs well on additive cost functions. (Examples in next slides).
12
Examples on Highly Subadditive & Superadditive functions Subadditive: A reversal-based version of selection sort performs at most n-1 reversals, a fraction of which are potentially in length. Thus selection sort gives an diameter algorithm. Subadditive: A reversal-based version of selection sort performs at most n-1 reversals, a fraction of which are potentially in length. Thus selection sort gives an diameter algorithm. Especially efficient for Especially efficient for Superadditive: Bubble sort and insertion sort perform transpositions of neighboring elements, one for each inversion in the input permutation. This gives an diameter algorithm. Superadditive: Bubble sort and insertion sort perform transpositions of neighboring elements, one for each inversion in the input permutation. This gives an diameter algorithm. Particularly efficient for Particularly efficient for
13
The interesting case Additive functions, particularly Additive functions, particularly Presented is an algorithm for sorting any permutation of n elements in cost using divide and conquer. Presented is an algorithm for sorting any permutation of n elements in cost using divide and conquer. The key operation is MedianEject. The key operation is MedianEject.
14
Definitions (3) Sorting a permutation involves putting element i in position i. Let denote the element in the position in the permutation. Let denote the element in the position in the permutation. Let denote the position of the element in the permutation. Let denote the position of the element in the permutation. An element x is wrong-sided if x & are on different sides of the median. Meaning or vice versa. An element x is wrong-sided if x & are on different sides of the median. Meaning or vice versa.
15
MedianEject We apply MedianEject to portions of the permutation from position a to b. One round of MedianEject moves all wrong-sided elements in the interval [a,b] to the correct side relative to its median in the following manner: We apply MedianEject to portions of the permutation from position a to b. One round of MedianEject moves all wrong-sided elements in the interval [a,b] to the correct side relative to its median in the following manner: MedianEject(a,b)= Identify the maximal runs of wrong-sided elements r, the median (b-a)/2. for (i = 1 to log r) reduce the number of wrong-sided runs by half using non-overlapping reversals, none crossing the median. With two reversals, move remaining wrong-sided runs to median boundary. Reverse the left and right wrong-sized runs using a single reversal. MedianEject(a,b)= Identify the maximal runs of wrong-sided elements r, the median (b-a)/2. for (i = 1 to log r) reduce the number of wrong-sided runs by half using non-overlapping reversals, none crossing the median. With two reversals, move remaining wrong-sided runs to median boundary. Reverse the left and right wrong-sized runs using a single reversal.
16
MedianEject – Sample Run
17
Lemmas (1) Lemma 1: MedianEject costs O(f(b-a)logr) for any additive cost function f. Lemma 1: MedianEject costs O(f(b-a)logr) for any additive cost function f. Proof (intuitively): There are O(logr) reversals since with each pass there are half as many maximal runs of wrong-sided elements on each side of the median. Each reversal reveres at most b-a elements and hence costs O(f(b-a)) resulting in a total of O(f(b-a)logr). Proof (intuitively): There are O(logr) reversals since with each pass there are half as many maximal runs of wrong-sided elements on each side of the median. Each reversal reveres at most b-a elements and hence costs O(f(b-a)) resulting in a total of O(f(b-a)logr).
18
Reversal Sort MedianEject is the partitioning operation of the following Quicksort-like algorithm: MedianEject is the partitioning operation of the following Quicksort-like algorithm:
19
Lemmas (2) Lemma 2: ReversalSort runs in time for any additive cost function f(n). Lemma 2: ReversalSort runs in time for any additive cost function f(n). Proof: By the master theorem, the recurrence evaluates to. Proof: By the master theorem, the recurrence evaluates to.
20
Goal 2 – Approximating Distance From a biological point of view, constructing the least expensive transformation from a given permutation A to another permutation B is more interesting than minimizing diameter. This is because we want to reconstruct the evolutionary history from A and B, a history which presumably took the most parsimonious possible path. From a biological point of view, constructing the least expensive transformation from a given permutation A to another permutation B is more interesting than minimizing diameter. This is because we want to reconstruct the evolutionary history from A and B, a history which presumably took the most parsimonious possible path.
21
Definitions (4) We now show that for all permutations, the reversal sorting algorithm yields a cost which is times optimal for any additive cost function. We now show that for all permutations, the reversal sorting algorithm yields a cost which is times optimal for any additive cost function. Our analysis requires the definition of a weighted graph G(p) associated with a given permutation p. Our analysis requires the definition of a weighted graph G(p) associated with a given permutation p. The vertices of G(p) will be the n elements (positions) of p. There will be an edge (i,j) in G(p) where. The weight of this edge is. The vertices of G(p) will be the n elements (positions) of p. There will be an edge (i,j) in G(p) where. The weight of this edge is.
22
Definitions (5) G(p) may be used to provide lower bounds on the optimal cost of sorting. However, these bounds can be very coarse. G(p) may be used to provide lower bounds on the optimal cost of sorting. However, these bounds can be very coarse. Instead, we bound the optimal cost in terms of the weight of the heaviest non-crossing matching M(G(p)). Instead, we bound the optimal cost in terms of the weight of the heaviest non-crossing matching M(G(p)). We say a matching M(G(p)) (namely a group of edges from G(p) ) is non-crossing if Such maximal matching can be easily found using dynamic programming. We say a matching M(G(p)) (namely a group of edges from G(p) ) is non-crossing if Such maximal matching can be easily found using dynamic programming.
23
Lemmas (3) Theorem 3: The greedy breakpoint-merging heuristic can yield a reversal sequence whose cost is optimal. Theorem 3: The greedy breakpoint-merging heuristic can yield a reversal sequence whose cost is optimal. Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. Lemma 4: The weight of M(G(p)) is a lower bound on the reversal-sorting cost for permutation p under additive weight functions. Lemma 4: The weight of M(G(p)) is a lower bound on the reversal-sorting cost for permutation p under additive weight functions. Proof: Consider the simpler task of just placing the elements defining edges from M(G(p)) into their proper position. This task can be done in cost f(w), where w is the total weight of M(G(p)), by performing the reversals defined by the edges in the matches. Because none of the intervals overlap or nest, no longer reversal can be helpful to move multiple elements into the proper position; because the cost function is additive we cannot benefit by using shorter reversals. Proof: Consider the simpler task of just placing the elements defining edges from M(G(p)) into their proper position. This task can be done in cost f(w), where w is the total weight of M(G(p)), by performing the reversals defined by the edges in the matches. Because none of the intervals overlap or nest, no longer reversal can be helpful to move multiple elements into the proper position; because the cost function is additive we cannot benefit by using shorter reversals.
24
Lemmas (4) To argue that the weight of M(G(p)) is a good lower bound, we will bound certain properties of p & G(p) in the size of this matching. To argue that the weight of M(G(p)) is a good lower bound, we will bound certain properties of p & G(p) in the size of this matching. Lemma 5: Let denote the k th edge of M(G(p)), where. Let be a function which equals 1 if intersects the interval [i, …, j] and is zero otherwise. Then edge if Lemma 5: Let denote the k th edge of M(G(p)), where. Let be a function which equals 1 if intersects the interval [i, …, j] and is zero otherwise. Then edge if Proof: By definition, M(G(p)) is the maximum cost non-crossing matching. Hence such an edge (i, j) cannot exist in G(p), for if so we could remove all intersected matching edges and insert (i, j) into M(G(p)) to yield a higher cost non-crossing matching. Proof: By definition, M(G(p)) is the maximum cost non-crossing matching. Hence such an edge (i, j) cannot exist in G(p), for if so we could remove all intersected matching edges and insert (i, j) into M(G(p)) to yield a higher cost non-crossing matching.
25
Lemmas (5) Lemma 6: The number of out-of-position elements in p is at most. Lemma 6: The number of out-of-position elements in p is at most. Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. Lemma 7: No element outside of the penumbra moves during the execution of MedianEject. Lemma 7: No element outside of the penumbra moves during the execution of MedianEject. Definition: The penumbra is the set of positions where out-of-position elements potentially lie unioned with all positions overlapped by edges of M(G(p)). Definition: The penumbra is the set of positions where out-of-position elements potentially lie unioned with all positions overlapped by edges of M(G(p)). Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. Implied (By Lemma 7): Every round of non- overlapping reversals costs at most throughout the execution of ReversalSort. Implied (By Lemma 7): Every round of non- overlapping reversals costs at most throughout the execution of ReversalSort.
26
Lemmas (6) Corollary 1: The cost of the each round of MedianEject is, and therefore ReversalSort costs. Corollary 1: The cost of the each round of MedianEject is, and therefore ReversalSort costs. Theorem 8: The ReversalSort heuristic solution is at most a factor of times the optimal solution. Theorem 8: The ReversalSort heuristic solution is at most a factor of times the optimal solution. Proof: Derived from the previous lemmas. Proof: Derived from the previous lemmas.
27
Coming Up Next Improved bounds on Sorting with Length- Weighted Reversals (Extended Abstract). Improved bounds on Sorting with Length- Weighted Reversals (Extended Abstract). Sorting by Length-weighted Reversals: Dealing with Signs and Circularity. Sorting by Length-weighted Reversals: Dealing with Signs and Circularity. Conclusions, Suggestions & Questions raised. Conclusions, Suggestions & Questions raised. Comments!? Comments!?
28
Improved bounds on Sorting with Length-Weighted Reversals We will now approach the problem of sorting integer sequences by length weighted reversals using a wider range of cost functions. We will now approach the problem of sorting integer sequences by length weighted reversals using a wider range of cost functions. For the cost function we consider a wide class of functions, namely where l is the length of the reversal. For the cost function we consider a wide class of functions, namely where l is the length of the reversal. So far we have mainly dealt with the case where. So far we have mainly dealt with the case where.
29
Sorting Sequences of 0’s and 1’s To sort a sequence of 0 ’ s and 1 ’ s. To sort a sequence of 0 ’ s and 1 ’ s. Recursively sort the left and right halves. Recursively sort the left and right halves. Perform one more reversal across the median for a sorting cost of: Perform one more reversal across the median for a sorting cost of: Pinter and Skiena used this algorithm to obtain an upper bound of on diameter for linear cost reversals. Pinter and Skiena used this algorithm to obtain an upper bound of on diameter for linear cost reversals. As was shown in first part of the presentation. As was shown in first part of the presentation.
30
Bounds and Approximation Ratios for different values The table summarizes the found bounds and approximations ratios for different values. The table summarizes the found bounds and approximations ratios for different values. Proofs for some of the bounds and approximation ratios will be presented as proof of concept. Proofs for some of the bounds and approximation ratios will be presented as proof of concept.
31
Upper Bounds on Diameter (1) In the case of additive cost functions we saw that the upper bound on sorting any given permutation is. In the case of additive cost functions we saw that the upper bound on sorting any given permutation is. Similarly, we would like to find such bounds for other functions in the class we are using (i.e. ). Similarly, we would like to find such bounds for other functions in the class we are using (i.e. ). To do this, we will use the concept of sorting sequences of 0 ’ s and 1 ’ s. To do this, we will use the concept of sorting sequences of 0 ’ s and 1 ’ s.
32
Upper Bounds on Diameter (2) Case 1 – : Case 1 – : Consider the divide and conquer sorting algorithm described in the previous slide. The recursion relation for sorting the 0 ’ s and 1 ’ s becomes: Consider the divide and conquer sorting algorithm described in the previous slide. The recursion relation for sorting the 0 ’ s and 1 ’ s becomes: For permutations, the cost for the recursion sorting becomes: For permutations, the cost for the recursion sorting becomes: Obviously, these results are upper bounds. Obviously, these results are upper bounds.
33
Upper Bounds on Diameter (3) Case 2 – : Case 2 – : Consider the divide and conquer sorting algorithm described in the previous slide. The recursion relation for sorting the 0 ’ s and 1 ’ s becomes: Consider the divide and conquer sorting algorithm described in the previous slide. The recursion relation for sorting the 0 ’ s and 1 ’ s becomes: For permutations, the cost for the recursion sorting becomes: For permutations, the cost for the recursion sorting becomes: Obviously, these results are upper bounds. Obviously, these results are upper bounds.
34
Upper Bounds on Diameter (4) Case 3 – : Case 3 – : This case has no use for reversals of more than two elements. As such, bubble sort is an asymptotically optimal solution. This case has no use for reversals of more than two elements. As such, bubble sort is an asymptotically optimal solution. As a result of this, a tight bound (Upper and Lower) on the diameter is: As a result of this, a tight bound (Upper and Lower) on the diameter is:
35
Lower Bounds on Diameter: Concept Proving the lower bounds on the diameters for different values of is much more complex than proving the upper bounds. Proving the lower bounds on the diameters for different values of is much more complex than proving the upper bounds. We will see the proof of a lower bound for a linear cost function. We will see the proof of a lower bound for a linear cost function. Tighter than what we have already seen. Tighter than what we have already seen. Tighter than what we have already seen Tighter than what we have already seen
36
Lemmas (7) Theorem 2.3: The cost to sort n elements by reversals with a linear cost function is, even when all elements are 0 ’ s and 1 ’ s. Theorem 2.3: The cost to sort n elements by reversals with a linear cost function is, even when all elements are 0 ’ s and 1 ’ s. Thus, our bounds for sorting 0/1 sequences are tight (same Upper and Lower Bounds), but a multiplicative gap of exists for sorting permutations. Thus, our bounds for sorting 0/1 sequences are tight (same Upper and Lower Bounds), but a multiplicative gap of exists for sorting permutations.
37
Proof of Lower Bound on Diameter for the Linear Cost Function (1) We will approach the problem by exhibiting a difficult sorting instance. We will approach the problem by exhibiting a difficult sorting instance. Specifically, we will prove a lower bound of on the cost of sorting the length- n sequence 010101 … 01 by reversals. Specifically, we will prove a lower bound of on the cost of sorting the length- n sequence 010101 … 01 by reversals. The proof follows a potential function argument. The proof follows a potential function argument.
38
Definitions (6) Before the sorting begins, we match the 0 with the 1. Throughout the sorting algorithm we will keep this matching. Before the sorting begins, we match the 0 with the 1. Throughout the sorting algorithm we will keep this matching. Let be the current distance between the 0 and the 1 after the reversal. Let be the current distance between the 0 and the 1 after the reversal. When there is no ambiguity, we abbreviate by. When there is no ambiguity, we abbreviate by. The potential function is: The potential function is:
39
Lemma 2.1: The initial value of the potential function is 0, and the final value is. Lemma 2.1: The initial value of the potential function is 0, and the final value is. We will show how a reversal affects the value of in the potential function by considering the i th (0,1) pair. We will show how a reversal affects the value of in the potential function by considering the i th (0,1) pair. Observation 2.1: The distance can only change when one element of the pair is inside the reversal and the other is outside. Observation 2.1: The distance can only change when one element of the pair is inside the reversal and the other is outside. Lemma 2.2: A reversal of length k increases the potential P(t) by at most 4k. Lemma 2.2: A reversal of length k increases the potential P(t) by at most 4k. Proof of these two lemmas results in theorem 2.3. Proof of these two lemmas results in theorem 2.3. Lemmas (8)
40
Proof of Lower Bound on Diameter for the Linear Cost Function (2) Proof: Suppose that for a reversal of length k, one the elements of a (0,1) pair is inside the reversal and another is outside so that is affected by the reversal. Proof: Suppose that for a reversal of length k, one the elements of a (0,1) pair is inside the reversal and another is outside so that is affected by the reversal. At the most, the distance between the two elements of this pair can increase by k because each element is moved at most by a distance k. At the most, the distance between the two elements of this pair can increase by k because each element is moved at most by a distance k.
41
Proof of Lower Bound on Diameter for the Linear Cost Function (3) Before reversal After reversal
42
Proof of Lower Bound on Diameter for the Linear Cost Function (4) Let us assume by symmetry that 0 is outside the reversed sequence and the 1 is inside. Suppose that the distance from the 0 to the closest element in the reversal is l. Let us assume by symmetry that 0 is outside the reversed sequence and the 1 is inside. Suppose that the distance from the 0 to the closest element in the reversal is l. The increase of the potential caused by the change in for this pair is at most: The increase of the potential caused by the change in for this pair is at most:
43
Proof of Lower Bound on Diameter for the Linear Cost Function (5) The distance l must be a natural number and occurs at most twice in one reversal, once on the left side and once on the right side of the reversed sequence. The distance l must be a natural number and occurs at most twice in one reversal, once on the left side and once on the right side of the reversed sequence. According to observation 2.1, there are at most k such pair whose distance changes the value of the potential function. According to observation 2.1, there are at most k such pair whose distance changes the value of the potential function.
44
Proof of Lower Bound on Diameter for the Linear Cost Function (6) As a result, the increase in the value of the potential function increases by at most: As a result, the increase in the value of the potential function increases by at most: Notice that grows as l gets smaller. Notice that grows as l gets smaller.
45
Proof of Lower Bound on Diameter for the Linear Cost Function (6) By Sterling ’ s approximation, therefore and the potential thus increases by at most. By Sterling ’ s approximation, therefore and the potential thus increases by at most.
46
Sorting by Length-weighted Reversals: Dealing with Signs and Circularity. Abstract: Abstract: Sorting linear and circular permutations and 0/1 sequences by reversals in a length sensitive cost model. Sorting linear and circular permutations and 0/1 sequences by reversals in a length sensitive cost model. We consider both the signed and unsigned case. We consider both the signed and unsigned case.
47
What Lies Ahead Lower and upper bounds on the various cases. Lower and upper bounds on the various cases. Mentions of some approximations that guarantee the bounds shown Mentions of some approximations that guarantee the bounds shown Partial proofs some of the bounds and approximations. Partial proofs some of the bounds and approximations. Cost functions are still of the class. Cost functions are still of the class.
48
Circularity generally offers more opportunities to reduce the optimal cost to sort a given permutation by reversals. Circularity generally offers more opportunities to reduce the optimal cost to sort a given permutation by reversals. At the same time, it presents a greater challenge of finding a more efficient solution. At the same time, it presents a greater challenge of finding a more efficient solution. A non unit cost model exacerbates these problems even further. A non unit cost model exacerbates these problems even further. Take as an example the permutation. Take as an example the permutation. One can sort it by using two reversals. One can sort it by using two reversals. In the circular case, where the two ends of the permutation meet, one can sort it by using one reversal. In the circular case, where the two ends of the permutation meet, one can sort it by using one reversal. In the case of a unit cost model, the ratio of the costs is 2. In the case of a unit cost model, the ratio of the costs is 2. However, in the case of a linear cost model, the ratio is. However, in the case of a linear cost model, the ratio is. A Word (or Two) on Circularity
49
Relationship of Costs for the Different Cases The following relationships hold for the four different cases: The following relationships hold for the four different cases:
50
Bounds and Approximation Ratios Lower and upper bounds for SBR of singed or unsigned and linear or circular 0/1 sequences and permutations. Approximation ratios for SBR of signed linear as well as signed and unsigned circular 0/1 sequences and permutations.
51
Approximation Algorithms for Sorting 0/1 Sequences We will now introduce lower bounds for sorting linear signed as well as circular unsigned 0/1 sequences. We will now introduce lower bounds for sorting linear signed as well as circular unsigned 0/1 sequences. We will see an approximation algorithm for linear signed 0/1 sequences. We will see an approximation algorithm for linear signed 0/1 sequences. We will deal with the case of. We will deal with the case of.
52
SBR of Circular Unsigned 0/1 Sequences – Definitions Given a circular sequence S, denote the length of the 0 and 1 blocks contained in S by and respectively. Given a circular sequence S, denote the length of the 0 and 1 blocks contained in S by and respectively. Let and. Let and. We define the potential function P ( S ) as follows: We define the potential function P ( S ) as follows:
53
SBR of Circular Unsigned 0/1 Sequences – Lemmas Lemma 1: A reversal of length r acting on a circular sequence S increases the value of the potential function P ( S ) by at most. Lemma 1: A reversal of length r acting on a circular sequence S increases the value of the potential function P ( S ) by at most. Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. Lemma 2: The function is a lower bound for sorting an unsigned circular sequence S. Lemma 2: The function is a lower bound for sorting an unsigned circular sequence S. Proof: By induction (next slide). Proof: By induction (next slide).
54
SBR of Circular Unsigned 0/1 Sequences – Proof Let m be the number of reversals in some optimal sorting solution. We want to prove that if a sorting solution uses exactly m reversals it costs at least V ( S ). Let m be the number of reversals in some optimal sorting solution. We want to prove that if a sorting solution uses exactly m reversals it costs at least V ( S ). Base case: m = 0 trivial. Base case: m = 0 trivial. Induction step: Suppose the claim holds for all. Consider a 0/1 sequence S of that has an optimal sorting series of reversals. Denote the first reversal and let r be its length. Can be sorted by k reversals and hence V ( S ’) is a lower bound for sorting S ’. By lemma 1 we get and by the definition of V we know get. Therefore: Induction step: Suppose the claim holds for all. Consider a 0/1 sequence S of that has an optimal sorting series of reversals. Denote the first reversal and let r be its length. Can be sorted by k reversals and hence V ( S ’) is a lower bound for sorting S ’. By lemma 1 we get and by the definition of V we know get. Therefore: as needed. as needed.
55
SBR of Linear Signed 0/1 Sequences – Definitions Consider a linear signed 0/1 sequence. Define a block in the sequence to be a contiguous segment of 0 ’ s or 1 ’ s of the same sign. Consider a linear signed 0/1 sequence. Define a block in the sequence to be a contiguous segment of 0 ’ s or 1 ’ s of the same sign. Notice that there are four kinds of blocks in such a sequence. Notice that there are four kinds of blocks in such a sequence. We represent the sequence as a series of. Let us denote as the potential function for such a linear sequence S. We represent the sequence as a series of. Let us denote as the potential function for such a linear sequence S.
56
SBR of Linear Signed 0/1 Sequences – Lemmas Lemma 3: The potential V(S) is a lower bound on the cost of sorting linear signed sequences. Lemma 3: The potential V(S) is a lower bound on the cost of sorting linear signed sequences. Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. Theorem 2: The algorithm signedImprovedDC is an O(1) approximation algorithm. Theorem 2: The algorithm signedImprovedDC is an O(1) approximation algorithm. Proof: Won ’ t be provided in this presentation. Proof: Won ’ t be provided in this presentation. And the algorithm? In the next slide … And the algorithm? In the next slide …
57
SBR of Linear Signed 0/1 Sequences – Approximation Algorithm Given a signed sequence S, let unsign( S ) represent the sequence without the signs. Given a signed sequence S, let unsign( S ) represent the sequence without the signs. signedImprovedDC( S ) signedImprovedDC( S ) 1. U unsign(S) 2. u improvedDC(U) 3. Mimic the reversals used to sort U on S. Denote the resulting sequence as S ’. 4. Reverse elements of S ’ with a negative sign. Let s be the cost of this step. 5. Output s + u improvedDC(S) is an O ( 1 ) approximation algorithm for unsigned sorting of linear 0/1 sequences when. (Not supplied and not proved in this presentation.) improvedDC(S) is an O ( 1 ) approximation algorithm for unsigned sorting of linear 0/1 sequences when. (Not supplied and not proved in this presentation.)
58
Summary – What We’ve Seen (1) The introduction of Length Weighted Models for Sorting By Reversals. The introduction of Length Weighted Models for Sorting By Reversals. Incentive: Incentive: Unit cost isn ’ t biologically defensible. Unit cost isn ’ t biologically defensible. Experiments show that length weighted models may help substantially in biasing between two evolutionary paths. Experiments show that length weighted models may help substantially in biasing between two evolutionary paths. Lower and Upper Bounds on sorting with additive cost functions. Lower and Upper Bounds on sorting with additive cost functions. Upper Bounds: For any given permutation. Upper Bounds: For any given permutation. Lower Bounds: For a specific permutation p. Lower Bounds: For a specific permutation p.
59
Summary – What We’ve Seen (2) Improved Bounds on Cost of Length Weighted Sorting By Reversals: Improved Bounds on Cost of Length Weighted Sorting By Reversals: Dealing with a wider range of functions. Dealing with a wider range of functions. Improved Upper Bounds on sorting unsigned 0/1 sequences and permutations for all values of. Improved Upper Bounds on sorting unsigned 0/1 sequences and permutations for all values of. Improved Lower Bound for the case of. Improved Lower Bound for the case of. An improvement from something we ’ ve already seen. An improvement from something we ’ ve already seen.
60
Summary – What We’ve Seen (3) Sorting By Reversals by Length Weighted Models – Dealing with Signs and Circularity: Sorting By Reversals by Length Weighted Models – Dealing with Signs and Circularity: Still with the same family of functions. Still with the same family of functions. Lower Bounds for the cases circular unsigned and linear singed 0/1 sequences. Lower Bounds for the cases circular unsigned and linear singed 0/1 sequences. Approximation algorithm for the sorting of linear signed 0/1 sequences. Approximation algorithm for the sorting of linear signed 0/1 sequences. Many lemmas, theorems and corollaries Many lemmas, theorems and corollaries
61
Questions Raised Aside from what hasn ’ t been covered in this presentation (which is, other than more bounds and approximation algorithms, another gargantuan set of lemmas, theorems and corollaries) there are many questions left open. Aside from what hasn ’ t been covered in this presentation (which is, other than more bounds and approximation algorithms, another gargantuan set of lemmas, theorems and corollaries) there are many questions left open. What is the right cost function, or what are the right cost functions for various types of sequences? What is the right cost function, or what are the right cost functions for various types of sequences? Is the family of functions presented in this presentation large enough to contain the right one(s)? Is the family of functions presented in this presentation large enough to contain the right one(s)? Is the real cost function defined differently over different ranges? Should it be species specific? Is the real cost function defined differently over different ranges? Should it be species specific? Should we include more data (other than length) for computing a reversals cost? e.g. The place of the reversal or the sequences being reversed. Should we include more data (other than length) for computing a reversals cost? e.g. The place of the reversal or the sequences being reversed.
62
And least but not last… (as far as this presentation goes) Questions? Questions? Comments!? Comments!?
63
Bibliography Pinter, R.Y., and Skiena, S., "Sorting with length-weighted reversals", Proceedings of the 13th International Conference on Genome Informatics (GIW 2002), December 2002, pp. 103-111. Pinter, R.Y., and Skiena, S., "Sorting with length-weighted reversals", Proceedings of the 13th International Conference on Genome Informatics (GIW 2002), December 2002, pp. 103-111.Sorting with length-weighted reversals13th International Conference on Genome Informatics (GIW 2002),Sorting with length-weighted reversals13th International Conference on Genome Informatics (GIW 2002), M. A. Bender, D. Ge, S. He, H. Hu, R. Y. Pinter, S. Skiena, and F. Swidan. "Improved Bounds on Sorting with Length-Weighted Reversals (Extended Abstract). “ Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 912- 921, 2004. M. A. Bender, D. Ge, S. He, H. Hu, R. Y. Pinter, S. Skiena, and F. Swidan. "Improved Bounds on Sorting with Length-Weighted Reversals (Extended Abstract). “ Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 912- 921, 2004.Improved Bounds on Sorting with Length-Weighted Reversals (Extended Abstract).Improved Bounds on Sorting with Length-Weighted Reversals (Extended Abstract). F. Swidan, M. A. Bender, D. Ge, S. He, H. Hu, and R. Pinter: "Sorting by length-weighted reversals: Dealing with signs and circularity"." Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science (LNCS), Vol. 3109, July 2004, pp. 32-46. F. Swidan, M. A. Bender, D. Ge, S. He, H. Hu, and R. Pinter: "Sorting by length-weighted reversals: Dealing with signs and circularity"." Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science (LNCS), Vol. 3109, July 2004, pp. 32-46.Sorting by length-weighted reversals: Dealing with signs and circularity".Sorting by length-weighted reversals: Dealing with signs and circularity".
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.