Genome Rearrangement By Ghada Badr Part I.

Slides:



Advertisements
Similar presentations
NP-Hard Nattee Niparnan.
Advertisements

A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
On Balanced Signed Graphs and Consistent Marked Graphs Fred S. Roberts DIMACS, Rutgers University Piscataway, NJ, USA.
Approximation Algorithms
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.
Fixed Parameter Complexity Algorithms and Networks.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Conservation of Combinatorial Structures in Evolution Scenarios
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Mattew Mazowita, Lani Haque, and David Sankoff
Greedy Algorithms And Genome Rearrangements
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Genome Rearrangement By Ghada Badr Part I

Genome, chromosome, gene, gene order The entire complement of genetic material carried by an individual is called the genome. Each genome contains one or more DNA molecules, one per chromosome

Genome, chromosome, gene, gene order A gene is a segment of DNA sequence with a specific function

Genome, chromosome, gene, gene order A C D F 5’ 3’ 3’ 5’ B E Gene order: A -B C D -E F Genes can be ordered by their DNA sequence location. DNA consists of two complementary strands twisted around each other to form a right-handed double helix. A sign (+/-) is usually used to indicate on which strand a gene is located.

Genome, chromosome, gene, gene order A B C D E F H I K J The DNA molecule (chromosome) may be circular or linear

Genome Rearrangement A -B C D -E F B -E F -D A C The genome is structurally specific to each species, and it changes only slowly over time. Therefore genome comparison among different species can provide us with much evidence about evolution. Genome rearrangements are an important aspect of the evolution of species. Even when the gene content of two genomes is almost identical, gene order can be quite different. A -B C D -E F Genome 1 B -E F -D A C Genome 2

Genome Rearrangement Gene order analysis on a set of organisms is a powerful technique for genomic comparison phylogenetic inference.

Genome Rearrangement General Definition for the problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming (sorting) those genomes into one another. What genome means and what events are, makes the diversity of the problem. Since these events are rare, scenarios minimizing their number are more likely close to reality. Many models have been proposed.

Genome Models Genes (or blocks of contiguous genes) are a good example of homologous markers, segments of genomes, that can be found in several species. The simplest possible model is: The order of genes in each genome is known, All the genomes share the same set of genes, All genomes contain a single copy of each gene, and All genomes consist of a single chromosome.

Genome Models Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome. permutations: Signed Permutation: Each gene may be assigned + or - sign to indicate the strand it resides on. Unsigned Permutation: If the corresponding strand is unknown.

Permutaions Genes (markers) are represented by integers: with +,- sign to indicate the strand they lie on. The order and orientation of genes of one genome in relation to the other is represented by a signed permutation .  = ( 2 n-1 n) of size n over {-n, ... , -1, 1, ... , n}, such that for each i from 1 to n, either i or -i is mandatory represented, but not both.

Permutaions Identity permutation: The identity permutation n = (1, 2, 3, . . . . , n). When multiple genomes with the same gene content are compared, one of them is chosen as a base (reference), i.e, represented as n, and all other identical genes are given the same integer values.

Permutaions Sorted/unsorted permutation: In order to sort a permutation this means that we want to apply some operations on to change it to n. If (1 = 2) We say that is sorted with respect to . If (1  2) We say that is unsorted with respect to .

Permutaions Example: Mitochondrial Genomes of 6 Arthropoda 1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) Fruit Fly Mosquito Silkworm Locust Tick Centipede 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 1 2 3 4 5 6 8 7 9 -10 11 12 13 14 15 16 17 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 1 2 3 4 5 6 7 8 9 10 11 12 14 13 15 16 17 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 1 2 3 5 4 6 7 8 9 10 11 12 13 14 15 16 17 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 1 3 4 5 6 7 8 9 10 11 -2 12 13 14 15 16 17 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17) 1 3 4 5 6 7 8 9 10 11 -2 12 16 13 14 15 17

Permutaions Example: Mitochondrial Genomes of 6 Arthropoda 1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) Fruit Fly Mosquito Silkworm Locust Tick Centipede 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)

Permutaions Linear and circular permutation:  is linear when it represents a linear chromosome, or circular when it represents a circular chromosome. When  = ( 2 n-1 n) is circular: ’ = (-n n-1 2 1) all permutations obtained by shifts on or ’ shift( , i) = (n-i+1 n-i+2n-1 n1 n-i are all equivalent. Example: (-3,2,1,-4) & (-1,-2,3,4)

Permutaions Points in permutations For a given permutation  = ( 2 n-1 n), there is a point between each pair of consecutive values i and i+1 in . If is linear: there are two additional points, one before and one after n. If is circular: there is one additional point between nand 1. Pts() = n+1 if linear, and pts() = n if circular.

Permutaions Linear extension of a permutation: For a given  = ( 2 n-1 n) If  is linear: a linear extension of is ’= (0,  2 n-1 n, n+1) If  is circular: a linear extension of is ’= (0,  2 n-1 n-1, n)

Permutaions Example:  = (4,8,9,7,6,5,1,3,2) ’= (0,4,8,9,7,6,5,1,3,2,10) ’= (0.4.8.9.7.6.5.1.3.2.10) Then Pts() = 10 Now: we want to compare our genomes.

Permutations - similarity/distance Problem: Given two genomes, How do we measure their similarity and/or distance?  A Related Problem: Given two permutations, How do we measure their similarity and/or distance?

Permutations - similarity/distance A distance measure should be a metric on the set of genomes. A Metric d on a set S (d: S  S  R) satisfies the following three axioms: Positivity: for all s, t in S, d(s,t)  0, and d(s,t)=0 iff s = t. Symmetry: for all s, t in S, d(s,t) = d(t,s). Triangular inequality: for all s, t, u in S, d(s,u)  d(s,t) + d(t,u).

Permutations - similarity/distance Measures of similarity between permutations that are used in computational biology are numerous in literature. First measures used are (will be useful later on): Breakpoints (Introduced by Sankoff and Blanchette (1997)) Common intervals

Permutations-distance - Breakpoints When analyze  with respect to , each point in  can be an adjacency or a breakpoint. A point (pair of consecutive values) (i, i+1) in  is an adjacency between  and : when either (i, i+1) or (-I+1, -i) are consecutive in . If  is linear: we have adjacency before  if  is also the first value in , and an adjacency after n, if n is also last value in . If  is circular: we assume that n is also last value in  and (n, 1) is an adjacency if  is also the first value in .

Permutations-distance - Breakpoints brp() = pts() - adj() where: pts() is the number of points in . adj() is the number of adjacencies. If  is sorted ( = ):  has only adjacencies and no breakpoints (brp() = 0). If  is unsorted (  ):  has at least one breakpoint (brp()  0). Breakpoint distance counts the lost adjacencies between genomes. The breakpoint distance between  and  is:

Permutations-distance - Breakpoints Back to our Example:  = (4,8,9,7,6,5,1,3,2) ’= (0,4,8,9,7,6,5,1,3,2,10) ’= (0.4.8.9.7.6.5.1.3.2.10) Then Pts() = 10, brp()? Adjacencies? n= (0.1.2.3.4.5.6.7.8.9.10) (8,9) (7,6) (6,5) (3,2)  adj() = 4  brp() = pts() - adj() = 10 - 4 = 6

Permutations-distance - Breakpoints Breakpoint distance is based on the notion of conserved adjacencies and can be defined on a set of more than two genomes. It is easy to compute. It always fails to capture more global relations between genomes. The first generalization of adjacencies is the notion of common intervals.

Permutations-distance - Common Intervals Common intervals: subsets of genes that appear consecutively together in two or more genomes, where genes are the same in each interval but may be not in the same order or orientation. Example (circular chromosomes) 1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17) If compare the first 4 species: they share 6 adjacencies {1,2}, {2,3},{11.12},{15,16},{16,17},{17,1} If compare all 6 species: they share only 1 adjacency {17,1}

Permutations-distance - Common Intervals Common intervals: subsets of genes that appear consecutively together in two or more genomes, where genes are the same in each interval but may be not in the same order or orientation. Example (circular chromosomes) 1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17) 6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17) The six permutations are very similar. The genes in the interval [1,12] are all the same, as genes in the intervals [3,6], [6,9],[9,11], and [12,17].

Permutations-distance - Common Intervals We can use common intervals as a measure of similarity between species. Disadvantage: All these measures do not reflect rearrangement operations or explain what happened to the genome over time.

Rearrangement operations (events) Back to our original problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. What are the Rearrangement events (Operation)? These events (Operation) could be applied to a single gene or to a group of genes, intervals.

Rearrangement operations Example: Mitochondrial Genomes of 6 Arthropoda Fruit Fly Mosquito Silkworm Locust Tick Centipede 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Rearrangement Operations Rearrangement operations affect gene order and gene content. There are various types: In case of single-chromosome genome: • Inversions • Transpositions • Reverse transpositions Gene Duplications Gene loss In case of multiple-chromosomes genomes we add: • Translocations fusions fissions

Rearrangement Operations - Single Chro. Inversion

Rearrangement Operations - Single Chro. Inversion

Rearrangement Operations - Single Chro. Inversion

Rearrangement Operations - Single Chro. Example: Mitochondrial Genomes of 6 Arthropoda An inversion. Fruit Fly Mosquito Silkworm Locust Tick Centipede

Rearrangement Operations - Single Chro. Transposition

Rearrangement Operations - Single Chro. Transposition

Rearrangement Operations - Single Chro. Transposition

Rearrangement Operations - Single Chro. Example: Mitochondrial Genomes of 6 Arthropoda Fruit Fly Mosquito Silkworm Locust Tick Centipede A transposition

Rearrangement Operations - Single Chro. Reverse Transposition

Rearrangement Operations - Single Chro. Reverse Transposition

Rearrangement Operations - Single Chro. Reverse Transposition

Rearrangement Operations - Single Chro. Example: Mitochondrial Genomes of 6 Arthropoda Fruit Fly Mosquito Silkworm Locust Tick Centipede A reverse transposition

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Translocation

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. Fusion Fission

Rearrangement Operations - Multiple Chro. From 24 chromosomes To 21 chromosomes [Source: Linda Ashworth, LLNL] DOE Human Genome Program Report

Rearrangement Problems Back to our original problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. Any set of operations yields a distance between genomes, by counting the minimum number of operations needed to transform one genome into the other.

Rearrangement Problems Back to our original problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. Two classical problems Computing the distance d() Computing one optimal sorting sequence of events.

Reversal Distance - Sorting by Reversals Given a permutation , calculate reversal distance d() and find one optimal sequence of reversals sorting . Assumption: Only reversals are allowed. No duplication in genes. Genomes are unichromosomal.

Reversal Distance - Sorting by Reversals A reversal is represented as a set of genes appearing together in the given genome.

Reversal Distance - Sorting by Reversals

Reversal Distance - Sorting by Reversals

Reversal Distance - Sorting by Reversals

Reversal Distance - Sorting by Reversals

Reversal Distance - Sorting by Reversals

Reversal Distance - Sorting by Reversals This approach is symmetric

Reversal Distance - Sorting by Reversals Reversal graph for n = 3 Vertices: all permutations of n = 3. Edges: connect an edge between 1 and 2 if reversal distance d(1, 2) = 1.

Reversal Distance - Sorting by Reversals Reversal graph for n = 3 Reversal distance d(i, k) = length of shortest path between vi and vk.

Reversal Distance - Sorting by Reversals Reversal graph for n = 3 The graph is huge |V| = n!.2n A feasible graph-search algorithm is not possible!

Reversal Distance - Sorting by Reversals The classical approach for solving these two problems in polynomial time was developed by Hannenhalli and Pevzner. (1995) The reversal distance can be computed in O(n) time by Bader et. al. (2000) The fastest algorithm to find an optimal sorting sequence is < O(n2) by Tannier et. al. (2007) Most approaches are based on a special structure called the breakpoint graph.

Reversal Distance - Sorting by Reversals Breakpoint Graph: edges are black or gray. Given  = (n-1n) If  is linear: we add the values 0, and n+1, the represents the extremities of the chromosome obtaining:  = (0, n-1n, n+1) If  is circular: assume n = n and add only the value 0, obtaining:  = (0, n-1n-1, n)

Reversal Distance - Sorting by Reversals Black edge: Links each pair of consecutive value in  by a horizontal (a point in ). Gray edges: Link the extremities of black edges such that the values will be in order. Graph: collection of cycles, where black and gray edges alternate. Trivial cycle: one black and one gray edge (adjacency) Long Cycle: four or more edges ( 2 breakpoints)

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals  = (-3 , 2 , 1 , -4) Linear Circular Linear and circular permutations are different in breakpoint graph construction. Same analyses.

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals +5 When sorted

Reversal Distance - Sorting by Reversals  = (-3 , 2 , 1 , -4) sorted If is sorted: Only adjacencies, no breakpoints. Breakpoint graph is a collection of trivial cycles. # cycles in sorted graph cyc() = pts()

Reversal Distance - Sorting by Reversals  = (-3 , 2 , 1 , -4) sorted If is unsorted: At least one breakpoint, at least one long cycle. # cycles cyc() is at most = pts() - 1 Observation: To sort a permutation , we would like to increase the number of cycles in its breakpoint graph.

Reversal Distance - Sorting by Reversals The effects of a reversal  over a breakpoint graph . Split reversal Joint reversal cyc( ) cyc()  cyc( ) cyc()  Neutral reversal cyc( ) cyc()

Reversal Distance - Sorting by Reversals The effects of a reversal  over a breakpoint graph .

Reversal Distance - Sorting by Reversals Observation: To sort , we must maximize the number of split reversals in the sorting sequence s. If s has only split reversals: what will be the reversal distance d()?(Hint: in terms of pts() and cyc()) d()pts() - cyc() Are we done?

Reversal Distance - Sorting by Reversals A split reversal does not always exist. For example, if all black edges in the graph have the same direction. In this case, we need to add some joint and/or neutral reversals in the sorting sequence s. d()pts() - cyc()

Reversal Distance - Sorting by Reversals It is always possible to calculate the number of non-split reversals in a sorting sequence. It will be the number of non-split reversals to sort some hard components in the graph with no orientation, unoriented components. Unoriented components can be a hurdle hrd()or more hardly a fortress frt() in the breakpoint graph. Hardles are very rare, and fortresses are even more rare in permutations that represent real genomes. In practice, split reversals are sufficient to sort the permutation.

Reversal Distance - Sorting by Reversals Can we choose any split reversal? only safe reversals. Safe reversal: a split reversal not producing hurdles. Unsafe reversal Safe reversal There is always a safe reversal for any oriented .

Reversal Distance - Sorting by Reversals The final formula for the reversal distance d() is: d()pts() - cyc() + hrd() + frt() Where: frt() = 1, if  is a fortress, and 0 otherwise. pts() = n+1, if is linear, and n if is circular.

Reversal Distance - Sorting by Reversals Algorithm: Get optimal sorting sequence s that sorts  Input: A signed permutation . Output: An optimal sequence of reversals sorting . Construct the breakpoint graph of . S   [empty] If frt() = 1 then choose a reversal  to eliminate the fortress     s  s . [concatenate the reversal  to s] End if While there is hurdles in  do choose a reversal  to eliminate the hurdle End while While  is not sorted do choose a safe split reversal  to  return s

Reversal Distance - Sorting by Reversals Algorithm: Get optimal sorting sequence s that sorts  Input: A signed permutation . Output: An optimal sequence of reversals sorting . Construct the breakpoint graph of . S   [empty] If frt() = 1 then choose a reversal  to eliminate the fortress     s  s . [concatenate the reversal  to s] End if While there is hurdles in  do choose a reversal  to eliminate the hurdle End while While  is not sorted do choose a safe split reversal  to  return s

Reversal Distance - Sorting by Reversals Algorithm: Get optimal sorting sequence s that sorts  Input: A signed permutation . Output: An optimal sequence of reversals sorting . Construct the breakpoint graph of . S   [empty] If frt() = 1 then choose a reversal  to eliminate the fortress     s  s . [concatenate the reversal  to s] End if While there is hurdles in  do choose a reversal  to eliminate the hurdle End while While  is not sorted do choose a safe split reversal  to  return s

Reversal Distance - Sorting by Reversals Algorithm: Get optimal sorting sequence s that sorts  Input: A signed permutation . Output: An optimal sequence of reversals sorting . Construct the breakpoint graph of . S   [empty] If frt() = 1 then choose a reversal  to eliminate the fortress     s  s . [concatenate the reversal  to s] End if While there is hurdles in  do choose a reversal  to eliminate the hurdle End while While  is not sorted do choose a safe split reversal  to  return s ComplexityO(n5) Tools: GRIMM & GRAPPA

Reversal Distance - Sorting by Reversals We can have more than one optimal solution

conclusions Represented linear and circular genomes as permutations in our simple model. Described first measures for similarity between permutation were breakpoint and common intervals --> has no biological interpretation. Used genome rearrangement events to describe similarity/distances between genomes --> has more biological meaning. Described in details one distance measure (reversal distance) and events (reversals) to sort genomes.

Thank you Questions? Next Lecture?