Double Cut and Join with Insertions and Deletions

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Chapter 6. Relaxation (1) Superstring Ding-Zhu Du.
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Convex drawing chapter 5 Ingeborg Groeneweg. Summery What is convex drawing What is convex drawing Some definitions Some definitions Testing convexity.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Sub Graph Listing محمد مهدی طالبی دانشگاه صنعتی امیرکبیر.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Convex Grid Drawings of 3-Connected Plane Graphs Erik van de Pol.
Definition Hamiltonian graph: A graph with a spanning cycle (also called a Hamiltonian cycle). Hamiltonian graph Hamiltonian cycle.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Linear Programming Digital Lesson. Copyright © by Houghton Mifflin Company, Inc. All rights reserved. 2 Linear programming is a strategy for finding the.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Chapter 1 Fundamental Concepts II Pao-Lien Lai 1.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Linear Programming Problem. Definition A linear programming problem is the problem of optimizing (maximizing or minimizing) a linear function (a function.
Introduction to Algorithms Jiafen Liu Sept
Connectivity and Paths 報告人:林清池. Connectivity A separating set of a graph G is a set such that G-S has more than one component. The connectivity of G,
Network Flows Chun-Ta, Yu Graduate Institute Information Management Dept. National Taiwan University.
Genome Rearrangement By Ghada Badr Part I.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Trees.
WABI: Workshop on Algorithms in Bioinformatics
Digital Lesson Linear Programming.
Digital Lesson Linear Programming.
Conservation of Combinatorial Structures in Evolution Scenarios
Genome Rearrangement and Duplication Distance
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
Chapter 5. Optimal Matchings
SINGLE-SOURCE SHORTEST PATHS IN DAGs
Lecture 3: Genome Rearrangements and Duplications
CSE 421: Introduction to Algorithms
3.5 Minimum Cuts in Undirected Graphs
Autumn 2015 Lecture 10 Minimum Spanning Trees
Approximation Algorithms
Problem Solving 4.
CSE 421: Introduction to Algorithms
Richard Anderson Autumn 2016 Lecture 5
A Unifying View of Genome Rearrangement
Richard Anderson Winter 2009 Lecture 6
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Autumn 2016 Lecture 10 Minimum Spanning Trees
Paths and Connectivity
Richard Anderson Lecture 5 Graph Theory
Winter 2019 Lecture 10 Minimum Spanning Trees
Richard Anderson Winter 2019 Lecture 5
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
MAGE: Models and Algorithms for Genome Evolution 2013
GRAPHS.
Concepts of Computation
Presentation transcript:

Double Cut and Join with Insertions and Deletions MARı´LIA D.V. BRAGA, EYLA WILLING, and JENS STOYE JOURNAL OF COMPUTATIONAL BIOLOGY September 2011 Ron Zeira Group meeting, March 2013

Introduction to DCJ Yancopoulos et al. 2005. A unifying view of genome rearrangements. Anne Bergeron, Julia Mixtacki, and Jens Stoye. Given two genomes A and B (no duplications), the goal is to find a shortest sequence of rearrangement operations that transforms A into B.

Definitions A gene is an oriented sequence of DNA that starts with a tail (3’) and ends with a head (5’). a is a gene: at is the tail and ah is the head. adjacency of two consecutive genes a and b, depends on their respective orientation: An extremity that is not adjacent to any other gene is called a telomere, represented by a singleton set : {at} or {ah}.

Definitions A genome is a set of adjacencies and telomeres such that the tail or the head of any gene appears in exactly one adjacency or telomere. Chromosomes are either bounded by two telomeres or circular (no telomere).

Example

DCJ operation

DCJ distance example

Adjacency graph Obviously, every vertex in the adjacency graph has degree one or two, therefore it is a union of cycles and paths. Since the graph is bipartite, all cycles have even length

DCJ distance Lemma 1. The application of a single DCJ operation changes the number of circular or linear components by at most one. Lemma 2. Let A and B be two genomes defined on the same set of N genes, then we have A = B if and only if N = C + I/2 where C is the number of cycles and I the number of odd paths in AG(A,B). Lemma 3. The application of a single DCJ operation changes the number of odd paths in the adjacency graph by –2, 0, or 2. Lemma 4. Let A and B be two genomes defined on the same set of N genes, dDCJ (A,B) N − (C + I/2)

DCJ with indels No duplications. Allows DCJ operations. Unequal genomes. Allows insertions and deletions of multiple contiguous segments. “◦” – represent a telomere. Markers are separated into groups:

Example

l is a string of either A or B (or empty) If l is non-empty, v is labeled o.w clean.

Operations DCJ Deletions Insertions

Adjacency graph

Problem defintion The DCJ-indel distance of A and B is the minimum number of DCJ and indel operations required to transform A into B.

Accumulation

Runs Run – maximal substring of markers of the same kind (A/B). Λ(C) – number of runs in component C.

More results The optimal DCJ (no indel) distance can be achieved by optimally sorting each component separately. λ(C) - The indel-potential of component C is the minimum number of runs obtained by a separate optimal DCJ-sorting on C.

Better bound We can sort each component separately with optimal DCJ operations + another λ(C).

Not optimal

Backtracking

Algorithm

DCJs vs. Indels

Experiments Analyze the evolution of Rickettsia – parasites causing diseases such as typhus, spotted fever. Closely related except for R. bellii with high level of rearrangement.

Results Unique segments Merged runs Min #DCJ Min #DCJ+indel Runs

Results 2 1-run cycles: cannot be merged. Identifies deletions that occurred right after differentiation

Triangle inequality Triangle inequality doesn’t hold. For example: Authors add surcharge to the distance to solve the problem.