Genome Rearrangement and Duplication Distance

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Longest Common Subsequence
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Greedy Algorithms And Genome Rearrangements
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.
1 Chapter 6 Dynamic Programming. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, optimizing some local criterion. Divide-and-conquer.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy.
Genome evolution within the individual
Yufeng Wu and Dan Gusfield University of California, Davis
Hidden Markov Models BMI/CS 576
WABI: Workshop on Algorithms in Bioinformatics
Greedy Algorithms – Chapter 5
CSCI2950-C Genomes, Networks, and Cancer
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Unit 1. Sorting and Divide and Conquer
Mutations Chapter 12-4.
SGN23 The Organization of the Human Genome
CSE 5290: Algorithms for Bioinformatics Fall 2009
CSCE 411 Design and Analysis of Algorithms
Lecture 3: Genome Rearrangements and Duplications
Algorithmic Problems Related to Sequences and Phylogenetic Trees
Greedy Algorithms And Genome Rearrangements
Minimizing the Aggregate Movements for Interval Coverage
A Unifying View of Genome Rearrangement
CSCI2950-C Lecture 6 Genome Rearrangements and Duplications
Dynamic Programming-- Longest Common Subsequence
and 6.855J March 6, 2003 Maximum Flows 2
Double Cut and Join with Insertions and Deletions
Dynamic Programming II DP over Intervals
Algorithms and Data Structures Lecture X
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Rearrangement Phylogeny of Genomes in Contig form
Presentation transcript:

Genome Rearrangement and Duplication Distance Crystal L. Kahn 9/18/08

Genome Rearrangement Over course of evolution, genomes undergo large structural changes Chromosomal fissions, fusions, inversions, transpositions Genome rearrangement is an area of computational biology that uses parsimony* methods to compute “distances” between pairs of genomes Characterize similarity between genomes by quantifying number of operations required to transform one into another Not interested in point mutations (SNPs) -- different than edit distance * Maximum likelihood methods can also be used

Genome Rearrangements Humans and mice have similar genomes, but their genes are ordered differently ~245 rearrangements ~ 300 large synteny blocks

History of Chromosome X Rat Consortium, Nature, 2004 Rearrangement Events: Reversals Fusions Fissions Translocation

Genome Rearrangement Models Types of rearrangement operations that have been considered: Reversal (Inversion) [HP, STOC95], [Bader et al., WADS01] Translocation [Hannenhalli, DAM95] Duplication transposition [El-Mabrouk, JCSS02] Ultimate goal: generic genome rearrangement model that allows any type of rearrangement G1 G1 G2 Duplications common in cancer G2

Duplication Distance: DX(Z,Y) Input strings X, Y, Z (X non-ambiguous) Def: duplication operation, Z°s,t,p(X) X Z s t p Problem: Compute DX(Z,Y) = min number duplication operations to transform Z into Y Theorem: O(n4) algorithm, n = |Y|

Definitions T = abcdefg  = bcd  = ace String: sequence of characters Substring: contiguous sequence of characters Subsequence: sequence of characters, not necessarily contiguous Note: a substring is a subsequence, but not necessarily vice versa T = abcdefg  = bcd  = ace

Key Insight W.L.O.G., let Z = Ø X a b c d e f g h i j k l m n o p q r s “overlapping” Y a b c d j k c d e f l o p q a b c d c d j e k f l o p q Observation: overlapping subsequences interfere with each other Lemma: a set of subsequences that are substrings of X and that cover all the characters of Y can be converted into a sequence of duplicate operations iff they are mutually non-overlapping “Feasible set”

Finding min-cardinality feasible set for Ys,t Let  be element of feasible set that includes index s 2 Cases:  includes index t  does not include index t Y s t  Ys,t Y s t  Ys,t

Let d(Ys,t) = DX(Ø,Ys,t) where Case 1 Ys,t and Case 2 Ys,t

Assume, by induction, already computed Ys,t Assume, by induction, already computed Substring of X “internal substrings” of  placements of Xs,t in Ys,t Xs,t = abcd Ys,t = abcbccabcd Ys,t \  Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Possibly exponential number of “placements” as,t computed with second recurrence in O(n2) time

Assume, by induction, already computed Ys,t Assume, by induction, already computed bs,t computed in O(n) time

Running Time n = |Y| For a substring Ys,t: Computing as,t takes O(n2) time Computing bs,t takes O(n) time Total of O(n2) substrings of Y Total running time: O(n4)

Duplication Transposition vs. Duplication 1 s t p n G ° s,t,p 1 s t (p-1) p n G Duplication transposition: “paste” into same string s < t < p 1 s t n G ° s,t,p(G) 1 s t (p-1) p n G 1 p n Duplication: “paste” into another string

Duplication can be more complicated… 1 s t n G 1 p n G 1 s (p-1) p t n G ° s,t,p(G) s < p < t

Duplication Transposition Distance in Semi-Ambiguous Genomes [El-Mabrouk, JCSS02] incorrectly computes duplication transposition distance Implication in paper is that: Given X non-ambiguous and Y semi-ambiguous, DT(X,Y) = # maximal repeated segments of Y Counterexample: X = abcdefg Y = abdecdbcefg Y0 = abcdefg Y1 = abcdbcefg Y2 = abdcdbcefg Y3 = abdecdbcefg

A Lower Bound for Duplication Transposition Distance Lemma: If Y has at most 2 copies of every character, X is non-ambiguous, and X is a subsequence of Y, then DX(X,Y)  DT(X,Y) There is still no known algorithm for duplication transposition distance

Conclusions Duplication Distance is a simple model for genome rearrangement and can be computed efficiently. In a special case, it provides a lower bound to duplication transposition distance Thank you! Questions?

New Model for Cancer Mutation: Amplisomes Can show that minimum amplisome distance can be reframed as: min [DG(A,Ø) + DA(T,A)] where min is taken over all possible choices of A A Duplication Distance is subproblem

Tumor Amplisomes (Maurer, et al. 1987; Wahl, 1989…) Other terms: Episome Amplicon Double-minute 20

DX(X,Y) ≤ DT(X,Y) when Y is semi-ambiguous Why is semi-ambiguity necessary? Semi-ambiguity ensures that all copied substrings are substrings of original X (not some intermediate) -- so for every DT operation, there exists a duplicate operation that produces the same result Example: X = A Y = AAAAAAAA DT(X,Y) = 3 DX(X,Y) = 7