JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
School of CSE, Georgia Tech
Introduction to Graphs
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Graph. Undirected Graph Directed Graph Simple Graph.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
GRAPH Learning Outcomes Students should be able to:
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Connectivity and Paths 報告人:林清池. Connectivity A separating set of a graph G is a set such that G-S has more than one component. The connectivity of G,
Genome Rearrangement By Ghada Badr Part I.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
Introduction to Algorithms
An Introduction to Graph Theory
WABI: Workshop on Algorithms in Bioinformatics
CSCI2950-C Genomes, Networks, and Cancer
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Conservation of Combinatorial Structures in Evolution Scenarios
Genome Rearrangement and Duplication Distance
Hamiltonian cycle part
PC trees and Circular One Arrangements
Chapter 5. Optimal Matchings
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
Enumerating Distances Using Spanners of Bounded Degree
Instructor: Shengyu Zhang
Analysis of Algorithms
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
Minimum Spanning Tree Algorithms
A Unifying View of Genome Rearrangement
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
Mutations.
Richard Anderson Lecture 5 Graph Theory
MAGE: Models and Algorithms for Genome Evolution 2013
Presentation transcript:

Restricted DCJ Model: Rearrangement Problems with Chromosome Reincorporation JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE JOURNAL OF COMPUTATIONAL BIOLOGY, 2011 Ron Zeira Group meeting 12.6.13

Outline Motivation DCJ model and solution Restricted DCJ model Restricted DCJ sorting DCJ halving problem Restricted DCJ halving Restricted DCJ median

Motivation During evolution, genomes undergo large-scale mutation: Reversals Translocations Chromosome fusion Chromosome fission Block interchanges Duplications Deletions

Motivation Reversal Fission Block interchange

DCJ model Yancopoulos et al. 2005. A unifying view of genome rearrangements. Anne Bergeron, Julia Mixtacki, and Jens Stoye. Given 2 genomes A and B (no duplications), find a shortest sequence of rearrangement operations that transforms A into B.

Definitions Gene – oriented DNA sequence from tail (5’) and to head (3’). For gene a: at - tail and ah - head. adjacency of 2 consecutive genes a and b, depends on their orientation: Extremity that is not adjacent to any other gene is called a telomere, represented by a singleton set : {at} or {ah}.

Definitions A genome is a set of adjacencies and telomeres such that the tail or the head of any gene appears in exactly one adjacency or telomere. Chromosomes are either linear (bounded by two telomeres) or circular (no telomere).

Genome example

DCJ operation A: {p,q},{r,s}  {p,r},{q,s} or {p,s},{q,r} B: {p,q},{r}  {p,r},{q} or {q,r},{p} C1: {r},{q}  {r,q},({}) C2: {r,q},({})  {r},{q} fusion fission translocations translocations

DCJ operation inversion inversion inversion circular excision

DCJ distance example A= ○ ac-d ○ , eb , ○ fg ○ B= ab , ○ cd ○, ○ e ○ , fg

Adjacency graph Obviously, every vertex in the adjacency graph has degree one or two, therefore it is a union of cycles and paths. Since the graph is bipartite, all cycles have even length

DCJ distance Lemma 1. The application of a single DCJ operation changes the number of circular or linear components by at most one. Lemma 2. Let A and B be two genomes defined on the same set of N genes, then we have A = B if and only if N = C + I/2 where C is the number of cycles and I the number of odd paths in AG(A,B). Lemma 3. The application of a single DCJ operation changes the number of odd paths in the adjacency graph by –2, 0, or 2. Lemma 4. Let A and B be two genomes defined on the same set of N genes, dDCJ (A,B) N − (C + I/2)

Restricted DCJ model Genomes with multiple linear chromosomes Each circular excision immediately followed by its incorporation Effectively, block interchange operation in 2 DCJs operations

Restricted DCJ example

Restricted DCJ sorting THEOREM. Multilinear genomes Π, Γ: dDCJ(Π,Γ)=drestricted-DCJ(Π, Γ) Scenario in O(n*log(n)) Algorithm outline: Capping “left to right” extend chromosomes iteratively: Optimal DCJ increases C by 1 xor OP by 2

Capping Adjoin new markers (caps) to telomere ends Odd paths: Even paths: Distance unchanged Only cycles and 1-paths ○x y○ x◊ y◊ ◊○ ○x y○ y∆ x◊ ∆◊ ○◊ ∆○

Iterative building j,j+1 already properly adjacent  Done j, j+1 on different chromosomes: O.W.  j,j+1 on same chromosome j,j+1 reverse orientation: O.W. j,j+1 same orientation or

Iterative building m+1 on different chromosome: O.W.  m+1 same chromosome m+1 reverse orientation

Iterative building m,m+1 same orientation or

Data structure DS for permutations – Kaplan & Verbin (2005) Balanced tree + additional information (marker, size, orientation flag, reverse flag, chromosome delimiters) Logarithmic time operations: Find ith marker of a chromosome Find position of marker g Reverse operation Mimic DCJ using reversals

DCJ halving problem Warren-Sankoff 2008, Mixtacki 2009 2 paralogous copies per marker Perfectly duplicated genome A: Alternative definition: Linear chromosome Ci : Circular chromosome Ci :

DCJ halving problem Perfectly duplicated genome example: Given Π, a duplicated genome, find a perfectly duplicated genome with minimal DCJ distance

Natural graph

DCJ having distance DCJ halving distance: EC – even cycles, OP – odd paths Min distance perfectly duplicated genome in O(n)

Restricted DCJ halving Find a multilinear perfectly duplicated genome with min distance Find unrestricted DCJ halving genome Transform to multilinear while preserving distance Particular scenario in O(n*log(n))

Restricted DCJ halving Let: Π – original genome, Φ- unrestricted perfectly duplicated genome Add paralogous caps Integrating adjacency {x,y} in Π iff x in linear chro. and y in circular chro. in Φ If {x,p},{y,q} in Φ, replace:

Integrating adjacency

Integrating adjacency Preserving distance Still perfectly duplicated genome Linear time

DCJ median problem Reconstruct common ancestor of Π, Γ Distance is not enough Add outgroup specy Φ Find a median M with minimal distance: M Π Γ Φ

Restricted DCJ median DCJ median for unilinear chro. is NP-hard – Caprara 2003 Let Π1, Π2, Π3 unilinear chro., ΠM multilinear chro. median genome Fuse ΠM telomeres into single chro. Π’M 2 linear chro. can be joined in 4 ways In AG(Πi,ΠM) at most 2 odd paths

Restricted DCJ median Telomeres in different even paths  ∆d=0 Telomeres in same even path  ∆d=-1 Telomeres in different odd paths  ∆d=+1 Case 3 can happen once Can always fuse linear chro. and improve distance

Summary Restricted DCJ model is more realistic Adds block exchange operation Distance is the same as unrestricted da Silva et al. 2012 added indel to model