1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
 Theorem 5.9: Let G be a simple graph with n vertices, where n>2. G has a Hamilton circuit if for any two vertices u and v of G that are not adjacent,
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Rearrangements and Duplications in Tumor Genomes.
The Breakpoint Graph The Breakpoint Graph Augment with 0 = n
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Greedy Algorithms And Genome Rearrangements
Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – ©Shai Lubliner.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
GRAPH Learning Outcomes Students should be able to:
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 10.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Fall 2015 COMP 2300 Discrete Structures for Computation Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1.
Eulerian Paths and Cycles. What is a Eulerian Path Given an graph. Find a path which uses every edge exactly once. This path is called an Eulerian Path.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
1/44 A simple Test For the Consecutive Ones Property Without PC-trees!
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
CSCI2950-C Lecture 9 Cancer Genomics
CSCI2950-C Genomes, Networks, and Cancer
Conservation of Combinatorial Structures in Evolution Scenarios
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
Richard Anderson Autumn 2016 Lecture 5
A Unifying View of Genome Rearrangement
Richard Anderson Winter 2009 Lecture 6
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
Richard Anderson Winter 2019 Lecture 6
Richard Anderson Lecture 5 Graph Theory
Richard Anderson Winter 2019 Lecture 5
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
GRAPHS.
Richard Anderson Autumn 2015 Lecture 6
Presentation transcript:

1 Michal Ozery-Flato and Ron Shamir

2 The Genomic Sorting Problem HOW?

3 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case) Preliminaries Reduction to a simpler case The main algorithm (reduced case)

4 Genome Modeling

5 Genome Modeling Chromosome flip 

6 Reciprocal Translocations Exchange non-empty ends between two chromosomes Prefix-prefix Prefix-postfix X1X2Y1Y2X1X2 Y1Y2 X1X2Y1Y2-Y1-X2

7 Sorting by Reciprocal Translocations Tails {(1, 2,-4), (-3, 5),(6,-8,-7,9)} = {1, 4, -3,-5, 6, -9 } A B: –genes(A) = genes(B) –Tails (A) = Tails(B) An O(n 3 ) algorithm (Hannenhalli 96, Bergeron et al. 06) reciprocal translocations

8 The Cycle Graph cycle graph(A,B) external internal adjacency #cycles(A,B) =3 A={(4, -1), (-3,-2, 5), (6,-7,8)} B={(1,2,3), (4,5), (6,7,8)}

9  A = (4, -1, -3,-2, 5, 6 -7,8) (concatenation of A’s chrs) The Overlap Graph (with Chromosomes) edge chromosome Overlap graph (A, B,  A ) ( 1,2 )( 4,5 )( 2,3 )( 6,7 )( 7,8 )

10 (Connected) Components Overlap graph (A, B,  A ) ( 1,2 )( 4,5 )( 2,3 )( 6,7 )( 7,8 ) bad component = non-trivial internal component trivial component = adjacency

11 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case)

12 The Reciprocal Translocation Distance d RT (A,B) = reciprocal translocation distance Theorem [Hannenhalli 96, Bergeron et al. 06] : d RT (A,B) = #genes - #chrs - #cycles(A,B) + F(A,B) –F(A,B) = depends on the topology of the bad components. If there are no bad components then F=0.

13 Reduced Case: No Bad Components Result 1: The problem “Sorting by Reciprocal Translocations” can be reduced to the problem “Sorting by Reciprocal Translocations, No Bad Components” in linear time.

14 Reduction’s Main Idea Isolation: all bad components are found in one chromosome. Goal: eliminate the bad components without creating –Maintain two lists of chromosomes: Exactly one minimal bad component Two or more minimal bad components –Use prefix-prefix translocations (no sign changes)

15 Overview Preliminaries Reduction to a simpler case The main algorithm (reduced case)

16 Translocations Defined by External Edges e = external edge  (e) = transforms e into an adjacency –Increases #cycles(A,B) –May create a bad component d RT (A,B) = #genes – #chrs – #cycles(A,B) +F(A,B) 1 2 e G y x 1 2 G  (e) e yx

17 The Main Algorithm 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 “Farward part” (S) “Backward part” (L) Solution

18 The Main Algorithm LSUnused edgesA  1,3,4,5(1,-5,6) (3,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

19 The Main Algorithm LSUnused edgesA  1 3,4,5(3,-4,-5,6) (1,2) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

20 The Main Algorithm LSUnused edgesA 1  3,4,5(1,-5,6) (3,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

21 The Main Algorithm LSUnused edgesA 143,53,5(3,6) (1,-5,-4,2) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

22 The Main Algorithm LSUnused edgesA 14,35(-2,6) (1,-5,-4,-3) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

23 The Main Algorithm LSUnused edgesA 14,3  (-2,6) (1,-5,-4,-3) 1.Mark all edges (except adjacencies) as “unused”, S , L  2.While there is an unused external edge e a.Mark e as “used” b.If  (e)   (FIRST(L)): Apply  (e) to A and APPEND (S, e) 3.If all the edges are used  return (S,L) 4.While all the unused edges are internal Undo last translocation and PREPEND(L, POP(S)) 5.Goto 1 B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

24 Implementation of the Algorithm Simple O(n 2 ) time implementation time implementation using a data structure that: –Maintains a fragmented signed permutation –Allows one to find an external edge e and perform the translocation  (e) in time –Based on a data structure by Kaplan & Verbin 05'

25 Thank You !

26 Simulating Translocations by Reversals [Hannenhalli & Pevzner] A translocation can be simulated by: A reversal on  A, or A chromosome flip in  A + a reversal on  A cycle graph(A,B)

27 Working on the overlap graph H = overlap graph(A, B,  A ) H is sorted if every component is trivial Operations: –  (v) : a reversal on an oriented external vertex v (cost = 1) –  (X) : a flip on chromosome X (cost = 0)

28 H●  (v) (two chromosome only) unoriented edge oriented edge chromosome H v unoriented edge oriented edge chromosome H●  (v) v unoriented edge oriented edge chromosome H v

29 H●  (X) unoriented edge oriented edge chromosome H X unoriented edge oriented edge chromosome H●  (X) X unoriented edge oriented edge chromosome H X