Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Rearrangements

Similar presentations


Presentation on theme: "Genome Rearrangements"— Presentation transcript:

1 Genome Rearrangements

2 Basic Biology: DNA Genetic information is stored in deoxyribonucleic acid (DNA) molecules. A single DNA molecule is a sequence of nucleotides adenine (A) cytosine (C) guanine (G) thymine (T) phosphate nitrogenous base pentose sugar Nucleotide DNA molecule

3 Basic Biology: DNA Paired DNA strands are in reverse complementary orientation. One in forward, 5’ to 3’ direction The other in reverse, 3’ to 5’ direction Both strands are complementary. A pairs with a T G pairs with a C 5’ 3’ 3’ 5’ forward strand reverse strand Image modified with the permission of the National Human Genome Research Institute (NHGRI), artist Darryl Leja.

4 Basic Biology: Genome The genome is the entire hereditary information of an organism. Genomes are partitioned into chromosomes. A chromosome can be linear (eukaryotes), or circular (prokaryotes). Image modified with the permission of the National Human Genome Research Institute (NHGRI), artist Darryl Leja.

5 Karyotype of a human male.
The Human Karyogram Karyotype of a human male. Courtesy: National Human Genome Research Institute

6 Changes in Genomic Sequences
Genomes of different species (even of closely related individuals) differ from one another. These differences are caused by point mutations, in which only one nucleotide is changed, and genome rearrangements, where multiple nucleotides are modified.

7 Point Mutations Insertion …ATGGCG… → …ATGTGCG…
Deletion …ATGTGCG…→ …ATGGCG… Substitution …ATGTGCG… → …ATGCGCG… …ATG-GCATGTGCGATGTGCG… …ATGTGCATG-GCGATGCGCG… DNA sequence alignment showing matches, mismatches, and insertions/deletions

8 Genome Rearrangements
Reversal Translocation Fission Fusion Redo this slide. Have a look at

9 Levenshtein’s Edit Distance
Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B. Possible edit operations: point mutations genome rearrangements

10 A Word Puzzle To transform a start word into a target word, change, add, or delete characters until the target is reached. Example: start “spices” target “lice”: spices → slices → slice → lice spices → spice→ slice→ lice How many steps do you need to transform a republican into a democrat? Google into Yahoo?

11 Edit Distance Using Point Mutations
S1=AGCTT, S2=AGCCTG, S3=ACAG AGCTT AGCTG AGCCTG  dedit(S1,S2) = 2 AGCTT AGCTG AGCAG ACAG  dedit(S1,S3) = 2 AGCCTG AGCTG AGCAG ACAG  dedit(S2,S3) = 2 TG insert C TG TA delete G Redo this slide? delete C TA delete G

12 Edit Distance and Evolution
The edit distance is often used to infer evolutionary relationships. Parsimony assumption: the minimum number of changes reflects the true evolutionary distance Redo this slide? Parsimonious phylogeny inferred from edit distances

13 Levenshtein’s Edit Distance
Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B. Possible edit operations: point mutations genome rearrangements

14 Rearrangements and Anagrams
An anagram is a rearrangement of a word or phrase into another word or phrase. eleven plus two → twelve plus one forty five → over fifty Please visit the Internet Anagram web server at

15 Rearrangements and Anagrams
Dot plot: “spendit” vs. “stipend” Dot plot: Mouse genome vs. Human genome

16 Genome Comparison: Human - Mouse
Humans and mice have similar genomes, but their genes are in a different order. How many edits (rearrangements) are needed to transform human into mouse? Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

17 Transforming Mice into Humans
a) Mouse and human share a common ancestor b) They share the same genes, but in a different order c) A series of rearrangements transforms one genome into the other

18 History of Chromosome X
Rat Consortium, Nature, 2004

19 Dobzhansky’s Experiment
Giant polytene chromosomes Modified from T.S. Painter, J. Hered. 25:465–476, 1934. Drosophila melanogaster life cycle taken from FlyMove Harvesting polytene chromosomes taken from BioPix4U

20 Dobzhansky’s Experiment
Chromosome 3 of Drosophila pseudoobscura Standard and Arrowhead arrangements differ by an inversion from segments 70 to 76 Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

21 Dobzhansky’s Experiment
Configurations observed in various inversion heterozygotes Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

22 Dobzhansky’s Experiment
Single and Double Inversions Phylogeny for 3rd chromosome of D. pseudoobscura Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

23 Unsigned Reversals 1 3 2 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

24 Unsigned Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

25 Unsigned Reversals and Gene Orders
p1 = r(1,2) p2 = r(2,5) p3 =

26 Reversal Edit Distance
Goal: Given two permutations, find the shortest series of reversals that transforms one into another Input: Permutations p and s Output: A series of reversals r1,…,rt transforming p into s, such that t is minimum t - reversal distance between p and s drev(p, s) - smallest possible value of t, given p and s Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

27 Sorting by Reversals Problem
Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n ) Input: Permutation π Output: A series of reversals r1, …, rt transforming π into the identity permutation such that t is minimum Reversal Distance Problem and Sorting by Reversals Problem are equivalent. Why? Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

28 Algorithm 1: GreedyReversalSort(π)
1 for i  1 to n – 1 2 j  position of element i in π (i.e. π[j]=i) 3 if j≠i 4 π  π • r(i, j) 5 output π 6 if π is the identity permutation 7 return Taken from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

29 GreedyReversalSort is Not Optimal
For p = the algorithm needs 5 steps: Step 0: Step 1: i=1; j=2; r(1,2) Step 2: i=2; j=3; r(2,3) Step 3: i=3; j=4; r(3,4) Step 4: i=4; j=5; r(4,5) Step 5: i=5; j=6; r(5,6) However, two reversals are enough: Step 1: Step 2:

30 Adjacencies & Breakpoints
An adjacency is a pair of adjacent elements that are consecutive A breakpoint is a pair of adjacent elements that are not consecutive b(p) is the number of breakpoints in p π = Extend π with π0 = 0 and π7 = 7 adjacencies breakpoints, b(p)=4 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

31 Reversal Distance and Breakpoints
One reversal eliminates at most 2 breakpoints. p = b(p ) = 5 p1 = b(p1) = 4 p2 = b(p2) = 2 p3 = b(p3) = 0 This implies: reversal distance ≥ b(p ) / 2 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

32 Strips An interval between two consecutive breakpoints in a permutation is called a strip. A strip is increasing if its elements increase. Otherwise, the strip is decreasing. A single-element strip is considered decreasing with exception of the strips [0] and [n+1]. Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

33 Strips and Breakpoints
Observation 1: If a permutation contains a decreasing strip, then there exists a reversal that will decrease the number of breakpoints. Observation 2: Otherwise, create a decreasing strip by reversing an increasing strip. The number of breakpoints can be reduced in the next step. r(3,8) r(6,8) Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

34 Algorithm2: BreakpointReversalSort(π)
1 while b(π) > 0 2 if π has a decreasing strip Choose reversal r that minimizes b(π • r) 4 else Choose a reversal r that flips an increasing strip in π π  π • r output π 8 return Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

35 Performance Guarantee
BreakpointReversalSort (BRS) is an approximation algorithm that will not use more than four times the minimum number of reversals. BRS eliminates at least one breakpoint every two steps: dBRS ≤ 2b(p) steps An optimal algorithm eliminates at most two breakpoints every step: dOPT  b(p) / 2 steps Performance guarantee: dBRS / dOPT  [ 2b(p) / (b(p)/2) ] = 4 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

36 Gene Orientation & Genome Representation
modified from

37 Genome Rearrangements
Animate this?

38 Signed Reversals 5’ ATGCCTGTACTA 3’ 3’ TACGGACATGAT 5’
Break and Invert 5’ ATGTACAGGCTA 3’ 3’ TACATGTCCGAT 5’ Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

39 Signed Reversals 1 3 2 4 10 5 6 8 9 7 Revise slide, draw double-strand 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

40 Signed Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

41 Signed Reversals and Breakpoints
1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The reversal introduced two breakpoints Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

42 Summary: Complexity Results
Sorting by unsigned reversals: NP-hard can be approximated within a constant factor Sorting by signed reversals: can be solved in polynomial time

43 Web Tools GRIMM Web Server Cinteny
computes signed and unsigned reversal distances between permutations. Cinteny a web server for synteny identification and the analysis of genome rearrangement

44 DCJ Genome Rearrangements
The DCJ model uses Double-Cut-and-Join genome rearrangement operations. DCJ operations break and rejoin one or two intergenic regions (possibly on different chromosomes).

45 Genome Representation
Example. linear c1=(o o) circular c2=(5 6 7) In the DCJ model, a genome is grouped into chromosomes (linear/circular). A gene g on the forward strand is represented by [-g,+g] A gene g on the reverse strand is represented by [+g,-g] Telomeres are represented by the special symbol ‘o’. An adjacency (intergenic region) is encoded by the unordered pair of neighboring gene/telomere ends.

46 DCJ Operations The double-cut-and-join operation “breaks” two adjacencies and rejoins the fragments: {a, b} {c, d} → {a,d} {c,b}, or {a,c} {b,d}. a, b, c, and d represent different (signed) gene ends or telomeres (with ‘+o’ = ‘-o’). A special case occurs for c=d=o: {a,b} {o,o} ↔ {a,o} {b,o}.

47 Signed reversal of genes 2 and 3

48 Chromosome Linearization

49 Weird genme transformation

50 Using Graphs to Sort Genomes
Adjacency graph AG(A,B)=(V,E) is a bipartite graph. V contains one vertex for each adjacency of genome A and B. Each gene, g, defines two edges: e1 connecting the adjacencies with +g of A and B e2 connecting the adjacencies with –g. Example: genome A: (o o) (5 6 7) genome B: (o o) (o o)

51 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) Add transformations.

52 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o)

53 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o)

54 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3}

55 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3}

56 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3} DCJ2: {4,o} {7,-5}  {4,-5} {7,o}

57 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3} DCJ2: {4,o} {7,-5}  {4,-5} {7,o}

58 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3} DCJ2: {4,o} {7,-5}  {4,-5} {7,o} DCJ3: {3,-4} {o,o}  {3,o} {o,-4}

59 Using Graphs to Sort Genomes
Example: Algorithm 3: DCJSORT(A,B) 1 Generate adjacency graph AG(A, B) of A and B 2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p 4 let v={q,m} be the vertex of A that contains q 5 if u ≠ v then 6 replace vertices u and v in A by {p,q} and {l,m} 7 update edge set 8 end if 9 end for 10 for each telomere {p,o} in B do 11 let u={p,l} be the vertex of A that contains p 12 if l≠o then 13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if 16 end for genome A: (o o) (5 6 7) genome B: (o o) (o o) DCJ1: {1,2} {-2,-3}  {1,-2} {2,-3} DCJ2: {4,o} {7,-5}  {4,-5} {7,o} DCJ3: {3,-4} {o,o}  {3,o} {o,-4} A  DCJ1 DCJ2 DCJ3B

60 Summary: Complexity Results
Sorting by unsigned reversals: NP-hard can be approximated within a constant factor Sorting by signed reversals: can be solved in polynomial time Sorting by DCJ rearrangements:

61 The End

62 Disclaimer Our presentation is in many parts inspired by the textbook An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner, by lectures from Anne Bergeron and Julia Mixtacki, as well as many review articles from multiple colleagues. As e


Download ppt "Genome Rearrangements"

Similar presentations


Ads by Google