Download presentation
Published byClifton Dawson Modified over 9 years ago
1
Sorting by Cuts, Joins and Whole Chromosome Duplications
Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015
2
Genome rearrangements
3
Motivation I: evolution
Human genome project
4
Motivation II: cancer MCF-7 breast cancer cell-line Normal karyotype
NCI, 2001
5
Philadelphia chromosome
6
Definitions: gene A gene – oriented segment:
A gene has two extremities: head and tail. Positive: tailhead; Negative: headtail.
7
Definitions: chromosome
Chromosome is a series of consecutive genes. 2 consecutive extremities form an adjacency. A telomere is an extremity that is not part of an adjacency. Circular chrom. has no telomeres. Linear chrom. has 2 telomeres.
8
Definitions: genome A genome is a set of chromosomes.
Equivalently, a genome is a set of adjacencies. Ordinary genome has one copy of each gene. Otherwise duplicated.
9
GR distance problem Distance dop(Π,Σ) – minimal number of operations between genomes Π and Σ. Operations: Reversals Translocations Transpositions Others…
10
The SCJ model SCJ – Single Cut or Join (Feijão,Meidanis 11):
Cut an adjacency to 2 telomeres. Join 2 telomeres to an adjacency. Simple and practical model. Reflects evolutionary distance (Biller et al. 13) cut join
11
Example
12
Models with multiple gene copies
Most models with multiple gene copies are NP-hard. Not many models allow duplications or deletions. Many normal and cancer genomes have multiple gene copies.
13
The SCJD model A duplication takes a linear chromosome and produces an additional copy of it. An SCJD operation is either a cut, or a join or a duplication.
14
The SCJD distance The minimal number of SCJD operations that transform an ordinary genome into a duplicated genome.
15
Results outline Characterize optimal solution structure.
Give a distance optimization function. Solve the optimization problem. Study the number of duplications in optimal scenario.
16
SCJD optimal scenario structure
Theorem: There exists an optimal SCJD sorting scenario, consisting, in this order, of SCJ operations on single-copy genes. Duplications. SCJ operations acting on duplicated genes. SCJs SCJs duplications
17
Proof outline An SCJ operation acts on extremities on 2 duplicated genes or 2 unduplicated genes. Preempting SCJ on unduplicated genes keeps a valid sorting scenario. Preempt duplications while scenario is valid.
18
Corollary: SCJD distance
Write the distance as a function of Γ’. Find Γ’ that minimizes the distance. η – higher score for adj. in Γ and Δ
20
Distance optimization solution
The following genome maximizes H: If Γ not linear, remove an adjacency with η=1 from each circular chromosome in Γ’ to obtain Γ’’. Theorem: SCJD distance is computable in linear time.
21
Circular genome problem
22
Circular genome example revised
24
Controlling the number of duplications
Duplications are more “radical” events than cut or join. Lemma: Our algorithm gives an optimal sorting scenario with a maximum number of duplications.
25
Optimal solutions can have different numbers of duplications
26
Minimizing duplications is hard
Theorem: Finding an optimal SCJD sorting scenario with a minimum number of duplications is NP-hard. Reduction from Hamiltonian path problem on a directed graph with in/out degree 2.
27
Proof outline For a 2-digraph G and two vertices x, y, there is an Eulerian path P:xy. Create a duplicated genome Σ from P and an empty genome Π. Add auxiliary genes and k copies of Σ, Π. There is a Hamiltonian path xy in G iff there is an optimal sorting scenario with k duplications.
28
Summary Genome rearrangements are important.
Problems with multiple gene copies are hard. SCJD – allows SCJ and duplications: Linear algorithm for the SCJD distance. Study the number of duplications in optimal solution. We hope to generalize the model and apply it on cancer data.
29
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.