16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.

Slides:



Advertisements
Similar presentations
Great Theoretical Ideas in Computer Science for Some.
Advertisements

A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Based on slides by Y. Peng University of Maryland
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
School of CSE, Georgia Tech
Introduction to Graphs
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
GOLOMB RULERS AND GRACEFUL GRAPHS
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2001 Makeup Lecture Chapter 23: Graph Algorithms Depth-First SearchBreadth-First.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
On Balanced Signed Graphs and Consistent Marked Graphs Fred S. Roberts DIMACS, Rutgers University Piscataway, NJ, USA.
1 Data Structures DFS, Topological Sort Dana Shapira.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Using PQ Trees For Comparative Genomics - CPM Using PQ Trees For Comparative Genomics Gad M. Landau – Univ. of Haifa Laxmi Parida – IBM T.J. Watson.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesFunctions.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
The countable character of uncountable graphs François Laviolette Barbados 2003.
Jan Topological Order and SCC Edge classification Topological order Recognition of strongly connected components.
10. Lecture WS 2014/15 Bioinformatics III1 V10 Metabolic networks - Graph connectivity Graph connectivity is related to analyzing biological networks for.
Connectivity and Paths 報告人:林清池. Connectivity A separating set of a graph G is a set such that G-S has more than one component. The connectivity of G,
1 Decomposition into bipartite graphs with minimum degree 1. Raphael Yuster.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
8.4 Closures of Relations Definition: The closure of a relation R with respect to property P is the relation obtained by adding the minimum number of.
Great Theoretical Ideas in Computer Science for Some.
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
3.6 Rational Functions.
Chapter 5. Optimal Matchings
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
V17 Metabolic networks - Graph connectivity
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
V11 Metabolic networks - Graph connectivity
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
V12 Menger’s theorem Borrowing terminology from operations research
V11 Metabolic networks - Graph connectivity
V11 Metabolic networks - Graph connectivity
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes of different species – allows inferring phylogenetic relationships. Together with phylogenetic information, ancestral gene order reconstructions give some clues about the conservation of the functional organisation of genomes  towards a global knowledge of life evolution. Often, phylogeny reconstruction techniques using gene order data rely on the definition of an evolutionary distance between two gene orders. These distances are usually computed as the minimal number of rearrangement operations needed to transform one genome into another one. Bergeron et al. WABI 2004, (2004)

16. Lecture WS 2004/05Bioinformatics III2 V16 – genome rearrangement Most choices of rearrangements quickly lead to hard algorithmic problems. Therefore, the set of operations is usually restricted to reversals, translocations, fusions or fissions where linear-time algorithms were developed in the last years. However, this choice of rearrangement operations is more dictated by algorithm necessity than by biological reality. E.g., in some genomes, transpositions and inverted transpositions can be quite common. A family of phylogenetic approaches labelled „distance-based“ methods relies on pair-wise evolutionary distances which are then fed into an algorithm such as neighbor-joining to infer tree topology and branch lengths. These methods do not provide information about the putative ancestral gene order. Bergeron et al. WABI 2004, (2004)

16. Lecture WS 2004/05Bioinformatics III3 V16 – genome rearrangement Parsimony-based approaches attempt to identify the rearrangement scenario (including tree topology and gene orders at the internal nodes) that minimizes the number of evolutionary events required.  problem is computationally much more difficult than just computing distances. Heuristic algorithms exist that use either breakpoint or reversal distances. However, these methods only provide us with one (or a small number of) possible hypothesis about ancestral gene orders, with no information about alternate optimal or near-optimal solutions. Today: - quick look at the reversal distance problem again - new method „sets of conserved intervals“ (Bergeron & Jens Stoye) Bergeron et al. WABI 2004, (2004)

16. Lecture WS 2004/05Bioinformatics III4 Breakpoint Graph The breakpoint graph of a permutation  is an edge-colored graph G(  ) with n + 2 vertices {  0,  1...  n,  n+1 }  {0, 1,..., n, n+1}. We join vertices  i and  i+1 by a black edge for 0  i  n. We join vertices  i and  j by a gray edge if  i   j. Black path Grey path Superposition of black and grey paths forms the breakpoint graph: A breakpoint graph is obtained by a super- position of a black path traversing the vertices 0, 1,..., n, n+1 in the order given by the permutation  and a gray path traversing the vertices in the order given by the identity permutation.

16. Lecture WS 2004/05Bioinformatics III5 Cycle decomposition A cycle in an edge-colored graph G is called alternating if the colors of every two consecutive edges of this cycle are distinct. In the following, cycles will mean alternating cycles. Cycle decomposition of the breakpoint graph: A vertex v in a graph G is called balanced if the number of black edges incident to v equals the number of grey edges incident to v. A balanced graph is a graph in which every vertex is balanced. G(  ) is a balanced graph. Therefore, there exists a cycle decomposition of G(  ) into edge-disjoint alternating cycles (every edge in the graph belongs to exactly one cycle in the decomposition). Cycles in an edge decomposition may be self-intersecting. The previous breakpoint graph can be decomposed into 4 cycles, one of which is self-intersecting.

16. Lecture WS 2004/05Bioinformatics III6 Effects of reversals on cycles (A) For reversals acting on two cycles,  (b – c) = 1. (B) For reversals acting on an unoriented cycle,  (b – c) = 0. (C) For reversals acting on an oriented cycle,  (b – c) = -1 Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)

16. Lecture WS 2004/05Bioinformatics III7 Cycle decomposition What is the decomposition of the breakpoint graph into a maximum number c(  ) of edge-disjoint alternating cycles? Here, c(  ) = 4. Cycle decompositions play an important role in estimating reversal distances. When a reversal is applied to a permutation, the number of cycles in a maximum decomposition can change by at most one (while the number of breakpoints can change by two). Bafna&Pevzner (1996) proved the bound for the reversal distance d(  ): d(  )  n c(  ) which is much tighter than the bound in terms of breakpoints d(  )  b(  ) / 2. For many biological problems, d(  ) = n c(  ). Therefore, the reversal distance problem reduces to the problem of finding the maximal cycle decomposition. Hurdles, Super-hurdles, fortresses...

16. Lecture WS 2004/05Bioinformatics III8 Alternative concept: conserved intervals Bergeron & Stoye, Report Uni Bielefeld Distrance matrices can be used as data for phylogenetic reconstruction, or to reconstruct ancestral genomes. However, all distances (except for the breakpoint distance) are closely tied to initial choices of allowable rearrangement operations. They are pure distances because similarities between genomes are ignored. breakpoint distance is based on the notion of conserved adjacencies. These are easy to compute, but breakpoint distance often fails to capture more global relations between genomes. A first generalization of adjacencies: common intervals that identify subsets of genes that appear consecutively in two or more genomes. Jens Stoye

16. Lecture WS 2004/05Bioinformatics III9 Permutations, Gene Order, and Rearrangements Bergeron & Stoye, Report Uni Bielefeld Assume that the genes of an organism are ordered and oriented along linear or circular DNA molecules. E.g. mitochondrial genes in insects Collapse 38 genes into set of 17 blocks. Genes in one block do not change order between these species. Distance approaches: focus on the difference between 2 particular genomes. E.g. Fruit Fly differs from Mosquito by the reversal of gene 10, and the transposition of genes 7 and 8.  count minimal number of reversals and/or transpositions  distance matrix for the set of species

16. Lecture WS 2004/05Bioinformatics III10 Permutations, Gene Order, and Rearrangements Bergeron & Stoye, Report Uni Bielefeld breakpoint distance: counts the lost adjacencies between genomes. E.g. given the circularity of the genomes, Fruit Fly and Mosquito have 12 conserved adjacencies and a breakpoint distance of 5. E.g. the first 4 species of table 1 share 6 adjacencies: [1,2], [2,3], [11,12], [15,16], [16,17], and [17,1]. When comparing all 6 species, [17,1] is the only left adjacency.

16. Lecture WS 2004/05Bioinformatics III11 Permutations, Gene Order, and Rearrangements Bergeron & Stoye, Report Uni Bielefeld Observation: the 6 permutations are very „similar“. E.g. the genes in the interval [1,12] are all the same, with small variations in their ordering. This is also true for the genes in the intervals [3,6], [6,9], [9,11], and [12,17]. Such intervals, together with conserved adjacencies play a fundamental role in rearrangement and distance theories, ancestral genome reconstructions, and phylogeny. Family portrait of the conserved intervals of the permutations of table 1 Here, the elements that can be glued together to form larger objects are boxed in rectangles.

16. Lecture WS 2004/05Bioinformatics III12 Which arrangements are preferable? Bergeron & Stoye, Report Uni Bielefeld All permutations of table 1 fit the representation with the following conventions (1) free objects within a rectangle can be reordered, or can change sign (2) connections between rectangles are fixed. Consider 2 rearrangement scenarios that transform silkworm into Locust using a minimal number of reversals The two scenarios are fundamentally different, although both use 6 reversals. The right one uses much longer reversals than the left one, and the right one breaks conserved intervals between Silkworm and Locust in intermediate permutations, namely [3,6], [1,12], and [12,17]. The right scenario looks highly suspicious.

16. Lecture WS 2004/05Bioinformatics III13 Conserved intervals Definition 1 Let G be a set of signed permutations of n elements. An interval [a,b] is a conserved interval of the set G if: (1) either a precedes b, or –b precedes –a, in each permutation, and (2) the sets of unsigned elements that appear between a and b is the same for all permutations in G.  If [a,b] is a conserved interval, so is [-b,-a]. Consider 2 permutations P = Q = Here, [1,5] and [2,3] are conserved intervals, but not [1,6]. The other conserved intervals of P and Q are [1,-4], [1,8], [5,-4], [5,8], and [-4,8]. The diagram representation of these intervals is

16. Lecture WS 2004/05Bioinformatics III14 Conserved intervals When the identity permutation is not in G, it is always possible to rename the elements of G such that conserved intervals will be intervals of consecutive elements. E.g. if one composes the permutations P and Q of the example with the inverse permutation P -1, P‘ = P -1 o P = Q‘ = P -1 o Q = or Proposition 1 Let R be a permutation and G a set of permutations, denote by R o G the set of permutations obtained by composing each permutation in G with R. The interval [a,b] is conserved in G if and only if the interval [R(a),R(b)] is conserved in R o G.

16. Lecture WS 2004/05Bioinformatics III15 Conserved intervals Proposition 1 Let R be a permutation and G a set of permutations, denote by R o G the set of permutations obtained by composing each permutation in G with R. The interval [a,b] is conserved in G if and only if the interval [R(a),R(b)] is conserved in R o G. Proof: if a permutation P is written as P = p 1 p 2... p n then R o P is:R o P = R(p 1 ) R(p 2 )... R(p n ) If [a,b] is conserved in G, then each permutation in G has a consecutive block of elements beginning with a and ending with b, or beginning with –b and ending with –a. These properties hold also for the set R o G, if one replaces a by R(a) and b by R(b). Some intervals, such as [1,7] for the set {P‘,Q‘} in the above example, are the union of smaller intervals: [1,7 ] = [1,5]  [5,7]. Intervals that are not unions are specially useful. Definition 2 Conserved intervals that are not the union of shorter conserved intervals are called irreducible. Sets of conserved intervals can be characterized by the set of irreducible intervals.

16. Lecture WS 2004/05Bioinformatics III16 Irreducible conserved intervals Proposition 2 Two different irreducible conserved intervals [a,b] and [c,d] of a set G of permutations are either 1) disjoint 2) nested with different endpoints, or 3) overlapping on one element. Proof. Wlog we can assume that G contains the identity permutation and that conserved intervals are intervals of consecutive elements. Suppose that [a,b] and [c,d] are nested with a = c and d < b. Since [c,d] is a conserved interval, it contains all integers between c and d  the interval [d,b] contains all integers between d and b, and [a,b] is not irreducible. If [a,b] and [c,d] overlap with more than one element, we can suppose a < c < b < d. Since all elements between c and d are greater than c, then the interval between a and c must contain all elements between a and c, thus [a,b] is not irreducible.

16. Lecture WS 2004/05Bioinformatics III17 Conserved intervals Overlapping irreducible intervals form chains linked by their successive common elements. A chain of k-1 intervals [a 1,a 2 ] [a 2,a 3 ]... [a k-1,a k ] will be denoted simply by its k links [a 1,a 2,a 3... a k ]. E.g. [1,5,7,8] is a chain of the set of conserved intervals of P‘ and Q‘. A maximal chain is a chain that cannot be extended. Proposition 3. Every irreducible conserved interval belongs to a unique maximal chain. Proof: By Prop. 2: if [a,b] is an irreducible conserved interval, then no other can begin by a or end by b. Maximal chains, as sets of links, together with isolated genes, form a partition of the set of genes.

16. Lecture WS 2004/05Bioinformatics III18 Conserved intervals A set of permutations on n elements can have as many as n(n-1)/2 conserved intervals, but at most n-1 irreducible intervals. These bounds are achieved with sets containing only one permutation. Proposition 4. Each maximal chain of k links contributes k(k-1)/2 to the total number of conserved intervals. Proof. Conserved intervals [a,b] are in bijection with chains of the form [a, x 1,..., k x, b] of irreducible intervals. Each maximal chain of k links has k(k-1)/2 such sub-chains.

16. Lecture WS 2004/05Bioinformatics III19 Conserved intervals Proposition 5 Let P be a permutation that is contained in both sets G 1 and G 2. The interval [a,b] is a conserved interval of G = G 1  G 2 if and only if there exist two chains of irreducible conserved intervals, with respect to P, with k  0, l  0: [a, x 1,..., k x, b] in G 1 [a, y 1,..., y l, b] in G2. The interval [a,b] is irreducible if and only if {x 1,..., x k } and {y 1,..., y l } are disjoint. Proof. The interval [a,b] is a conserved interval of G if and only if it is a conserved interval in both G 1 and G 2, therefore there must exist chains beginning by a and ending by b for both sets G 1 and G 2. If [a,b] is irreducible in G, and if [a,x] and [x,b] are conserved intervals of G 1, say, then x cannot belong to the set {y 1,..., y l }. If there is a common element x to both sets {x 1,..., x k } and {y 1,..., y l }, then [a,b] = [a,x]  [x,b] and both [a,x] and [x,b] are conserved intervals of G.

16. Lecture WS 2004/05Bioinformatics III20 Variable Geometry Genomes Multi-chromosomal genomes can also be represented by permutations, with special marks that identify different chromosomes. E.g. where each chromosome is on a separate line. Even if the adjacency [5,6] is conserved between the 2 permutations, the first genome does not even have those genes on the same chromosome. In the case of multi-chromosomal genomes, conserved intervals [a,b] should have the added requirement that a and b belong to the same chromosome, in each genome. The definition of conserved intervals can be adapted to other types of genomes than single linear chromosomes. For circular genomes, one can always align all permutations of the set beginning with gene +1.

16. Lecture WS 2004/05Bioinformatics III21 Algorithms Bergeron & Stoye present 3 algorithms: (1) compute the conserved intervals of two permutations (2) compute the conserved intervals of a set of permutations (3) compute conserved intervals of two sets of permutations, directly from their two individual sets of conserved intervals. Conserved Intervals of 2 permutations are strongly related to the notion of connected components of the overlap graph of a signed permutation. Here: linear algorithm that identifies all irreducible intervals [a,b] of a permutation  with the identity permutation such that a > 0 and b > 0 in . The case of negative endpoints is treated by reversing . E.g. for the permutationP = algorithm 1 identifies the positive irreducible intervals [6,7], [5,9], and [0,10]. It will identify [2,3] and [3,4] on the reversed permutation.

16. Lecture WS 2004/05Bioinformatics III22 Algorithms The algorithm assumes that the input permutation is in the form  = (0,  1,...,  n-1, n) M i : nearest unsigned element of the permutation that precedes  i and is greater than |  i|. Lemma 1 If [  s,  e ] is a positive conserved interval of  and the identity permutation, then M s = M e. Algorithm uses two stacks: S contains the possible start positions of conserved intervals, M contains possible candidates for M i. The top of S is always denoted by s. The top of M is always denoted by m. Proposition 6 Algorithm 1 outputs the positive irreducible conserved intervals of a permutation  with the identity permutation in O(n) time.

16. Lecture WS 2004/05Bioinformatics III23 Conserved intervals Algorithm runs in linear time.

16. Lecture WS 2004/05Bioinformatics III24 Similarity and distance The number of conserved intervals of a set of permutations is a measure of similarity, but can easily be transformed into a distance between two permutations, or two sets of two permutations. Definition 3 Let G 1 and G 2 be two permutations on n elements, with N 1 and N 2 conserved intervals. Let N be the number of conserved intervals in G 1  G 2. The interval distance between G 1 and G 2 is then defined by: d(G 1,G 2 ) = N 1 + N 2 – 2N The interval distance satisfies the fundamental properties of a mathematical distance, e.g. it fulfils the triangle inequality: d(P,Q) + d(Q,R)  d(P,R)

16. Lecture WS 2004/05Bioinformatics III25 Similarity and distance When comparing two permutations, the interval distance counts the total number of intervals that are unique to one of them. E.g. the distance between P = Q = is given by d(P,Q) = (11  10)/2 +(11  10)/2 – 2  11 = 88 The 2 measures sometimes disagree. The behavior of the interval distance reflects that the length (number of genes) involved in a rearrangement operation matters: short reversals are less disturbing than long ones.

16. Lecture WS 2004/05Bioinformatics III26 Comparison with other distance measures Breakpoint distance also gives different results than interval distances. while the same results are obtained by transposition + reversal distances.

16. Lecture WS 2004/05Bioinformatics III27 Similarity and distance Proposition 7 Suppose that P and Q have n elements, then (1) if P is obtained from Q by reversing k elements, then the interval distance between P and Q is k (n – k); (2) if P is obtained from Q by transposing two consecutive blocks of a and b elements, then the interval distance between P and Q is (a+b)(n – (a+b)) + ab. Because the interval distance is affected by length, one should question the practice of collapsing identical strips of genes. Why not use all available information?

16. Lecture WS 2004/05Bioinformatics III28 Link with rearrangement theories Characterize the rearrangement operations that preserve conserved intervals. Definition 4. Let P and Q be two permutations, and  a rearrangement operation applied to P yielding P‘. We say that  preserves the conserved intervals of P and Q if the conserved intervals of {P,Q} are contained in those of {P‘,Q}. Only rearrangements within blocks are preserving. Note that all operations, except fusions, destroy some adjacencies that existed in the original permutation: the number and nature of these adjacencies is a key concept. Definition 5. Let  be a rearrangement operation that transforms P into P‘. A breakpoint of  is a pair of elements that are adjacent in P but not in P‘. Breakpoints are where one has to cut P in order to apply . Reversals and translocations have 2 breakpoints, transpositions have 3, and fissions have 1.

16. Lecture WS 2004/05Bioinformatics III29 Link with rearrangement theories Consider the irreducible intervals of P and P‘ with respect to P. Adjacencies in P either belong to a (smallest) irreducible interval, or are free. E.g. in this diagram the adjacency (3,4) belongs to the interval [1,5], (2,3) belongs to [2,3], and (8,9) is free. When  2 adjacencies belong to the same irreducible interval, then none of these adjacencies is conserved between P and P‘.

16. Lecture WS 2004/05Bioinformatics III30 Link with rearrangement theories Theorem 3. Reversals, transpositions, and reverse transpositions are preserving if and only if all their breakpoints belong to the same irreducible interval, or are free. Translocations and fissions are preserving if and only if all their breakpoints are free. Proof. If the breakpoints of any operation are free, then no conserved interval is cut. If the breakpoints of a reversal, transposition, or reverse transposition belong to the same irreducible interval, then the operation reorders, or reverses, some blocks within that interval, thus preserving conserved intervals. If a reversal has its two breakpoints in different intervals, it will break those two intervals. If it has only one free breakpoint, it will break the interval containing the other breakpoint. The same kind of arguments hold for transpositions and reverse transpositions. If a breakpoint of a translocation or fission is not free, then it belongs to an irreducible interval whose extremities will end up in two different chromosomes. It turns out that most rearrangement operations used in optimal scenarios are indeed preserving.

16. Lecture WS 2004/05Bioinformatics III31 Link with rearrangement theories E.g. (without proof) Theorem 4. All the breakpoints of a cycle belong to the same irreducible interval. In the sorting by reversals theory, a sorting reversal is defined as a reversal that decreases the reversal distance by 1. The breakpoints of sorting reversals, except one type called hurdle merging, belong to a single cycle. Corollary 4. All sorting reversals, except hurdle merging, are preserving Corollary 5. All transpositions that create two adjacencies are preserving.

16. Lecture WS 2004/05Bioinformatics III32 Apply conserved intervals to reconstruct ancestor Bergeron et al. WABI 2004, (2004)

16. Lecture WS 2004/05Bioinformatics III33 Apply conserved intervals to reconstruct ancestor Bergeron et al. WABI 2004, (2004)

16. Lecture WS 2004/05Bioinformatics III34 Summary Linear-time algorithms could be developed to minimize reversal distance rearrangement scenarios. Open question which distance measures (breakpoint distance, reversal distance, interval distance...) are most appropriate to compare genome architectures. Experimental evidence provides new insights which types of rearrangements have likely occurred in the past  need to adopt algorithms to the biological reality. Concept of „conserved intervals“ sounds very promising – can account for arbitrary types of rearrangements.