Lecture 3: Genome Rearrangements and Duplications

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
School of CSE, Georgia Tech
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
The Breakpoint Graph The Breakpoint Graph Augment with 0 = n
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – ©Shai Lubliner.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?
Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism RECOMB 2004 João Meidanis.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back) Pavel Pevzner Department of Computer Science, University of California.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
A Simpler 1.5-Approximation Algorithm for sorting by transposition Tzvika Hartman.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
Genome Rearrangement By Ghada Badr Part I.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
15. Lecture WS 2004/05Bioinformatics III1 V15: genome rearrangement – current status * Genome comparison mouse – human: syntenic regions * Breakpoint analysis.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
CSCI2950-C Lecture 9 Cancer Genomics
CSCI2950-C Genomes, Networks, and Cancer
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Conservation of Combinatorial Structures in Evolution Scenarios
Genome Rearrangement and Duplication Distance
Tao Jiang Department of Computer Science
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
Comparative RNA Structural Analysis
Non-breaking Similarity of Genomes with Gene Repetitions
CSCI2950-C Lecture 6 Genome Rearrangements and Duplications
5.4 T-joins and Postman Problems
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Lecture 3: Genome Rearrangements and Duplications

Breakpoint graph 1-dimensional construction Transform p = < 2, -4, -3, 5, -8, -7, -6, 1 > into g = < 1, 2, 3, 4, 5, 6, 7, 8 > by reversals. Vertices: i ® ia ib -i ® ib ia and 0b, 9a Edges: match the ends of consecutive blocks in p, g Superimpose matchings

Breakpoint graph Breakpoints Each reversal goes between 2 breakpoints, so d ³ # breakpoints / 2 = 6/2 = 3. Theorem (Hannenhalli-Pevzner 1995): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = 8 + 1 – 5 + 0 + 0 = 4 Breakpoints are not independent. Breakpoint graph shows dependencies between the breakpoints.

Breakpoint graph Þ rearrangement scenario

Oriented and Unoriented Cycles Proper reversal acts on black edges: c(ρ π) – c (π) = 1 C F Unoriented Cycles E No proper reversal acting on an unoriented cycle These are “impediments” in sorting by reversals.

Interleaving Edges Interleaving edges are grey edges that cross each other Example: Edges (0,1) and (18, 19) are interleaving Cycles are interleaving if they have an interleaving edge These 2 grey edges interleave 0 5 6 10 9 15 16 12 11 7 8 14 13 17 18 3 4 1 2 19 20 22 21 23

Interleaving Graphs An Interleaving Graph is defined on the set of cycles in the Breakpoint graph and are connected by edges where cycles are interleaved A A B B D C C E E F F 0 5 6 10 9 15 16 12 11 7 8 14 13 17 18 3 4 1 2 19 20 22 21 23 0 5 6 10 9 15 16 12 11 7 8 14 13 17 18 3 4 1 2 19 20 22 21 23 B D A E F C

Interleaving Graphs Label oriented cycles. Component oriented if contains oriented cycle. A A B B D C C E E F F 0 5 6 10 9 15 16 12 11 7 8 14 13 17 18 3 4 1 2 19 20 22 21 23 0 5 6 10 9 15 16 12 11 7 8 14 13 17 18 3 4 1 2 19 20 22 21 23 B D A E F C

Interleaving Graphs Remove oriented components from interleaving graph. A A B C D E E F C D B A E F

Hurdles Hurdle: Minimal or maximal unoriented component under containment partial order. A A E E A E h(π) = 1

Reversal Distance with Hurdles Hurdles are obstacles in the genome rearrangement problem They cause a higher number of required reversals for a permutation to transform into the identity permutation 3 2 1 3 -1 -2 1 -3 -2 c(π) = 2, h(π) = 1 1 2 3 Every hurdle can be transformed into oriented cycles by reversal on arbitrary cycle in hurdle.

Reversal Distance with Hurdles Hurdles are obstacles in the genome rearrangement problem They cause a higher number of required reversals for a permutation to transform into the identity permutation Let h(π) be the number of hurdles in permutation π Taking into account of hurdles, the following formula gives a tighter bound on reversal distance: d(π) ≥ n+1 – c(π) + h(π) Every hurdle can be transformed into oriented cycles by reversal on arbitrary cycle in hurdle. ** Doing so, might cause problems with overlapping hurdles

Superhurdles “Protect” non-hurdles Deletion of superhurdles creates another hurdle

Superhurdles “Protect” non-hurdles Deletion of superhurdles creates another hurdle Superhurdle Superhurdle

Superhurdles “Protect” non-hurdles Deletion of superhurdles creates another hurdle Hurdle Hurdle

Fortresses A permutation π with an odd number of hurdles, all of which are superhurdles Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f where c = # cycles; h = # hurdles f = 1 if π is fortress.

GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph

GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph

Coming Next Other rearrangement operations Duplications Rearrangements and Phylogeny Multiple Genomic Distance Problem: Given permutations 1, …, k find a permutation  such that k=1, k d(1, ) is minimal.

Other Types of Rearrangements So far: Discussed reversals. Also: translocations, fissions, fusions (modeled as reversals in concatenate of chromosomes (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10)

Other Types of Rearrangements Transpositions 1 2 3 4 5 6 1 2 5 3 4 6 Duplication Transposition 1 2 3 4 5 6 1 2 3 4 5 3 4 6 Duplications are very frequent in cancer genomes.

Duplications HARD!!! (NP-hard?) What problem to solve? Given G  {1, .., n}N (“permutation with duplicates”) Find reversals 1, 2, …, t, duplications 1, …, s, and permutation  such that  (1, …, t, 1, …, s) i = G and s + t is minimal 1 2 3 4 5 6 1 2 3 4 5 3 4 -2 -3 6 ??? HARD!!! (NP-hard?)

Duplications (2) What problem to solve? Given: G  {1, .., n}N (“permutation with duplicates”) , H =  G for some permutation  Find: Reversals 1, 2, …, t such that 1 …t G = H and t is minimal Signed reversal distance with duplicates NP-hard (Chen, et al. 2005) If 1-1 mapping of repeated elements (orthologs) in G to H then problem reduces to reversal distance.

Duplications (3) Solution when at most two duplicates per gene What problem to solve? Given: P {1, .., n}N (permutation with duplicates) Find: Permutation  and reversals 1, 2, …, s, duplications 1, … t such that 1, …, s1, …, t  = P and t minimal. Solution when at most two duplicates per gene El-Mabrouk and Sankoff (2002)

Whole Genome Duplication Genome is doubled – extra copy of each element. Subsequently undergoes reversals. Genome Halving Problem. Given a duplicated genome P, recover the ancestral pre-duplicated genome R minimizing the reversal distance from the perfect duplicated genome R © R to the duplicated genome P. (El-Mabrouk and Sankoff 1998-2003)

Whole Genome Duplication Genome is doubled – extra copy of each element. Subsequently undergoes reversals. If copies of each element labeled uniquely, then problem reduces to reversal distance problem.

Reversal Distance and Duplications Let d(G,H) = reversal distance b/w G and H Problem of computing d(P, R  R) is unsolved minR d(P, R  R) solvable in polynomial time

Breakpoint Graph p g G( p,g ) 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0b 2a 2b 4b 4a 3b 3a 5a 5b 8b 8a 7b 7a 6b 6a 1a 1b 9a

Genome Halving: Exhaustive Doubled genome with 2n genes Compute reversal distance on all 2n labeling of genes.

Genome Halving Weak Genome Halving Problem. For a given duplicated genome P, find a perfect duplicated genome R © R and a labeling of gene copies that maximizes the number of black-gray cycles c(G) in the breakpoint graph G(P,R © R) of the labeled genomes P and R  R. (Alekseyev and Pevzner 2006) Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f where c = # cycles; h = # hurdles f = 1 if π is fortress.

Contracted Breakpoint Graph Breakpoint graph construction p 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5h 5t 8h 8t 7h 7t 6h 6t 1t 1h 9t Implicit were obverse edges (xt, xh)  is black-obverse alternating path  is gray-observe alternating path

Contracted Breakpoint Graph With duplicates, pair of vertices with same label. Contract these identical vertices

Contracted Breakpoint Graph P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e G’(P,R © R) Each gray edge is pair of parallel edges

Cycle Decompositions Genomes P and Q G(P,Q) breakpoint graph for some labeling Black-gray cycle decomposition ??? G’(P,Q) contracted breakpoint graph Induced black-gray cycle decomposition Labeling Problem. Given a black-gray cycle decomposition of the contracted breakpoint graph G′(P,Q) of duplicated genomes P and Q, find labeling of P and Q that induces this cycle decomposition.

Maximal black-gray cycle decomposition P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e G’(P,R © R) BG graph corresponding to G’ Maximal black-gray cycle decomposition

P as black-observe cycle Cycle Decomposition P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e P as black-observe cycle

Genome Halving Algorithm: Outline Input: Doubled genome P Construct BO (black-obverse) graph for P by gluing identical edges Introduce gray edges “optimally” to create BOG (black-observe-gray) graph G’ with single gray-observe cycle (!!!) R = gray-observe cycle in G’ Find maximal black-gray cycle decomposition of G’ Q = R  R