Download presentation
Presentation is loading. Please wait.
1
Genome Rearrangement and Duplication Distance
Crystal L. Kahn 9/18/08
2
Genome Rearrangement Over course of evolution, genomes undergo large structural changes Chromosomal fissions, fusions, inversions, transpositions Genome rearrangement is an area of computational biology that uses parsimony* methods to compute “distances” between pairs of genomes Characterize similarity between genomes by quantifying number of operations required to transform one into another Not interested in point mutations (SNPs) -- different than edit distance * Maximum likelihood methods can also be used
3
Genome Rearrangements
Humans and mice have similar genomes, but their genes are ordered differently ~245 rearrangements ~ 300 large synteny blocks
4
History of Chromosome X
Rat Consortium, Nature, 2004 Rearrangement Events: Reversals Fusions Fissions Translocation
5
Genome Rearrangement Models
Types of rearrangement operations that have been considered: Reversal (Inversion) [HP, STOC95], [Bader et al., WADS01] Translocation [Hannenhalli, DAM95] Duplication transposition [El-Mabrouk, JCSS02] Ultimate goal: generic genome rearrangement model that allows any type of rearrangement G1 G1 G2 Duplications common in cancer G2
6
Duplication Distance: DX(Z,Y)
Input strings X, Y, Z (X non-ambiguous) Def: duplication operation, Z°s,t,p(X) X Z s t p Problem: Compute DX(Z,Y) = min number duplication operations to transform Z into Y Theorem: O(n4) algorithm, n = |Y|
7
Definitions T = abcdefg = bcd = ace String: sequence of characters
Substring: contiguous sequence of characters Subsequence: sequence of characters, not necessarily contiguous Note: a substring is a subsequence, but not necessarily vice versa T = abcdefg = bcd = ace
8
Key Insight W.L.O.G., let Z = Ø
X a b c d e f g h i j k l m n o p q r s “overlapping” Y a b c d j k c d e f l o p q a b c d c d j e k f l o p q Observation: overlapping subsequences interfere with each other Lemma: a set of subsequences that are substrings of X and that cover all the characters of Y can be converted into a sequence of duplicate operations iff they are mutually non-overlapping “Feasible set”
9
Finding min-cardinality feasible set for Ys,t
Let be element of feasible set that includes index s 2 Cases: includes index t does not include index t Y s t Ys,t Y s t Ys,t
10
Let d(Ys,t) = DX(Ø,Ys,t) where Case 1 Ys,t and Case 2 Ys,t
11
Assume, by induction, already computed
Ys,t Assume, by induction, already computed Substring of X “internal substrings” of placements of Xs,t in Ys,t Xs,t = abcd Ys,t = abcbccabcd Ys,t \ Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Possibly exponential number of “placements” as,t computed with second recurrence in O(n2) time
12
Assume, by induction, already computed
Ys,t Assume, by induction, already computed bs,t computed in O(n) time
13
Running Time n = |Y| For a substring Ys,t:
Computing as,t takes O(n2) time Computing bs,t takes O(n) time Total of O(n2) substrings of Y Total running time: O(n4)
14
Duplication Transposition vs. Duplication
s t p n G ° s,t,p s t (p-1) p n G Duplication transposition: “paste” into same string s < t < p s t n G ° s,t,p(G) 1 s t (p-1) p n G p n Duplication: “paste” into another string
15
Duplication can be more complicated…
s t n G p n G s (p-1) p t n G ° s,t,p(G) s < p < t
16
Duplication Transposition Distance in Semi-Ambiguous Genomes
[El-Mabrouk, JCSS02] incorrectly computes duplication transposition distance Implication in paper is that: Given X non-ambiguous and Y semi-ambiguous, DT(X,Y) = # maximal repeated segments of Y Counterexample: X = abcdefg Y = abdecdbcefg Y0 = abcdefg Y1 = abcdbcefg Y2 = abdcdbcefg Y3 = abdecdbcefg
17
A Lower Bound for Duplication Transposition Distance
Lemma: If Y has at most 2 copies of every character, X is non-ambiguous, and X is a subsequence of Y, then DX(X,Y) DT(X,Y) There is still no known algorithm for duplication transposition distance
18
Conclusions Duplication Distance is a simple model for genome rearrangement and can be computed efficiently. In a special case, it provides a lower bound to duplication transposition distance Thank you! Questions?
19
New Model for Cancer Mutation: Amplisomes
Can show that minimum amplisome distance can be reframed as: min [DG(A,Ø) + DA(T,A)] where min is taken over all possible choices of A A Duplication Distance is subproblem
20
Tumor Amplisomes (Maurer, et al. 1987; Wahl, 1989…) Other terms:
Episome Amplicon Double-minute 20
21
DX(X,Y) ≤ DT(X,Y) when Y is semi-ambiguous Why is semi-ambiguity necessary?
Semi-ambiguity ensures that all copied substrings are substrings of original X (not some intermediate) -- so for every DT operation, there exists a duplicate operation that produces the same result Example: X = A Y = AAAAAAAA DT(X,Y) = 3 DX(X,Y) = 7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.