Download presentation
Presentation is loading. Please wait.
Published byMelinda Watkins Modified over 9 years ago
1
1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?
2
2 From Biology to Computing From Biology to Computing: …… Problem Formulation
3
3 Transforming men into mice (and into cats, cows, dogs, rats, chicken…) Pavel Pevzner University of California, San Diego (joint work with Glenn Tesler and Ben Raphael)
4
4 Transforming Men into Mice: Fragile versus Random Breakage Models of Chromosome Evolution Pavel Pevzner University of California, San Diego (joint work with Qian Peng and Glenn Tesler)
5
5 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Approaches to answering these questions: Statistical: Nadeau-Taylor random breakage theory (1984) Combinatorial: Sankoff, et al. (early ‘90s) Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements
6
6 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements
7
7 Transforming mice into men (X chromosome)
8
8 Genome Rearrangements: Evolutionary “Earthquakes” What is the evolutionary scenario for transforming one genome into the other? What is the organization of the ancestral genome? Are there any rearrangement hotspots in mammalian genomes?
9
9 Collaborators Cat and Cow (Guillaume Bourque, Bill Murphy and Steve O’Brien) Chimpanzee (Ben Raphael and Haixu Tang) Dog (Guillaume Bourque and Gregor Adelfinger) Rat (Guillaume Bourque and Bin Ma) Tumors (Ben Raphael and UCSF Cancer Center group)
10
10 Susumu Ohno: Two Hypothesis Ohno, 1970, 1973 Whole Genome Duplication Hypothesis: Big leaps in evolution would have been impossible without whole genome duplications. Random Breakage Hypothesis: Genomic architectures are shaped by rearrangements that occur randomly (there are no fragile regions).
11
11 Whole Genome Duplication Hypothesis Finally Confirmed After Years’ of Controversy The Whole Genome Duplication hypothesis first met with skepticism and was only recently confirmed. Kellis, Birren & Lander, Nature, 2004 “Our analysis resolves the long- standing controversy on the ancestry of the yeast genome ” “There was a whole-genome duplication.” Wolfe, Nature, 1997 “There was no whole- genome duplication.” Dujon, FEBS, 2000 “Duplications occurred independently” Langkjaer, JMB, 2000 “Continuous duplications” Dujon, Yeast 2003 “Multiple duplications” Friedman, Gen. Res, 2003 “Spontaneous duplications” Koszul, EMBO, 2004
12
12 Random Breakage Hypothesis Meets a Different Fate The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution. Nadeau & Taylor, 1984, PNAS First estimate of the number of synteny blocks between human and mouse First convincing arguments in favor of the Random Breakage Model (RBM) RBM was re-iterated in hundreds of papers
13
13 Random Breakage Hypothesis Meets a Different Fate The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution Nadeau & Taylor, PNAS 1984 First estimate of the number of synteny blocks between human and mouse First convincing arguments in favor of the Random Breakage Model (RBM) RBM was re-iterated in hundreds of papers Pevzner & Tesler, PNAS 2003 Rejected RBM and proposed the Fragile Breakage Model Postulated existence of rearrangement hotspots and vast breakpoint reuse
14
14 Are the Rearrangement Hotspots Real? The Fragile Breakage Model did not live long. In 2004 David Sankoff presented convincing arguments against the Fragile Breakage Model (Sankoff & Trinh, 2004) “… we have shown that breakpoint re-use of the same magnitude as found in Pevzner and Tesler, 2003 may very well be artifacts in a context where NO re-use actually occurred.”
15
15 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM
16
16 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM
17
17 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM Today I will argue against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM
18
18 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM Today I will argue against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM (as if I have nothing better to do)
19
19 Evolution of HerpesViruses
20
20 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements
21
21 Whole genome synteny blocks
22
22 History of Chromosome X Rat Consortium, Nature, 2004
23
23 Human-Mouse-Rat Phylogeny
24
24 Reconstruction of Tumor Genome Architectures Pavel Pevzner and Ben Raphael (UCSD) & Colin Collins and Stas Volik (UCSF Cancer Center)
25
25 Chromosome Painting: Normal Cells
26
26 Tumor Genomes Tumor cells often exhibit chromosomal aberrations:
27
27 Tumor Genomes Thousands of individual rearrangements known for different tumors. promoterc-ab1 oncogene BCR genepromoter ABL gene BCR genepromoter Rearrangements may disrupt genes and alter gene regulation. Example: translocation in leukemia yields “Philadelphia” chromosome: Chr 9 Chr 22
28
28 Breast Cancer Tumor Genome MCF7 is human breast cancer cell line. Cytogenetic analysis suggests complex architecture: What is the detailed architecture of MCF7 tumor genome? What sequence of rearrangements produced MCF7?
29
29 Nadeau-Taylor random breakage theory Proposed by Ohno (Nature, 1973). Detailed statistical formulation given by Nadeau and Taylor (PNAS, 1984). Upheld by many studies since then, including Nature, Feb. 2001 human genome paper. Mouse genome paper (Nature, Dec. 2002) : first doubts
30
30 Reversals (also called inversions) Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred about once-twice every million years on the evolutionary path between human and mouse. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
31
31 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred one-two times every million years on the evolutionary path between human and mouse.
32
32 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in gene order).
33
33 Sorting by reversals
34
34 Sorting by reversals Most parsimonious scenarios The reversal distance is the minimum number of reversals required to transform one gene order into another. Here, the distance is 4.
35
35 Sorting by Reversals Breakpoint distance
36
36 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination
37
37 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination How many breakpoints can be eliminated by a single reversal?
38
38 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination reversal distance >= # breakpoints / 2 = 6/2 = 3
39
39 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination reversal distance >= # breakpoints / 2 = 6/2 = 3 This formula vastly underestimates the reversal distance by assuming that breakpoints are never re-used.
40
40 Breakpoint graph Breakpoint Graph (Bafna and Pevzner) DualityTheorem (Hannenhalli-Pevzner): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = 8 + 1 – 5 + 0 + 0 = 4
41
41 Breakpoint graph The Reversal Distance Theorem: reversal distance = #blocks + 1 – #cycles + #hurdles # hurdles is a rather complicated parameter, but can be computed from breakpoint graph in linear time. Here, reversal distance = 8 blocks+1–5 cycles+0= 4
42
42 Breakpoint graph Reversal Distance Theorem (slightly imprecise version): reversal distance = number of blocks–number of cycles
43
43 Complexity of reversal distance
44
44 Types of Rearrangements Reversal 1 2 3 4 5 61 2 -5 -4 -3 6 Translocation 4 1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 Fusion Fission
45
45 Rearrangements in Multi-Chromosomal Genomes are not limited to reversals… translocations:
46
46 Rearrangements in Multi-chromosomal Genomes Besides reversals… translocations: fusions and fissions of chromosomes
47
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b
48
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b
49
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b
50
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b
51
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b
52
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) A reversal replaces two black edges with two other black edges a c d b a c d b
53
Reversals on Circular Chromosomes reversal A reversal replaces two black edges with two other black edges a c b a c d b d a b cda bcd
54
Not a Reversal P=(+a-b-c+d) Q=?????? This operation also replaces two black edges with two other black edges. But it is not a reversal. a c d b a c d b
55
Fissions ∗ P=(+a-b-c+d) Q=(+a-b)(-c+d) Fissions split a single chromosome into two – also replace two black edges with two other black edges. a c d b a c d b ∗ fission
56
Translocations/Fusions Translocations/Fusions transform two cycles (chromosomes) into a single one. They also replace two black edges with two other black edges. P=(+a-b-c+d) Q=(+a-b)(-c+d) ∗ a c d b a c d b ∗ fusion
57
2-Breaks 2-Break replaces any pair of black edges with another pair. P=(+a-b-c+d) Q=(+a-b-d+c) 2-break a c d b a c d b ∗ ∗
58
2-Break Distance Problem Given two genomes, find the shortest sequence of 2-Breaks transforming one genome into another.
59
Two Genomes as Black-Red and Green-Red Cycles P=(+a-b-c+d) Q=(+a+c+b-d) a c d b a b d c P Q
60
Common Red Edges a c d b a b d c P Q a b c d
61
Superimposing... a c d b a b d c P Q Q a b c d
62
a c d b a b d c P Q Q a b c d
63
a c d b a b d c P Q Q a b c d
64
a c d b a b d c P Q Q a b c d
65
Breakpoint Graph BG(P,Q) a c d b a b d c P Q a b c d
66
Breakpoint Graph: Red, Black, and Green Matchings Breakpoint graph is formed by red, black and green edges. a b c d
67
Black-Red Cycles (red genome) Breakpoint graph is formed by red, black and green edges. black and red edges form genome P a b c d
68
Green-Red Cycles (green genome) Breakpoint graph is formed by red, black and green edges. green and red edges form genome Q a b c d
69
Black-Green Cycles (breakpoint graph) Breakpoint graph is formed by red, black and green edges. black and green edges form black-green cycles cycle (P,Q) – number of cycles in the breakpoint graph of genomes P and Q a b cc d
70
Breakpoint Graph Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) green-red cycles (genome Q) black-green cycles G(P,Q) a b c d
71
Breakpoint Graph of Two Identical Genomes Trivial Breakpoint Graph is a breakpoint graph of two identical genomes. Q=(+a-b-c+d) BG(Q,Q) a b c d
72
Identity Breakpoint Graph Consists of Trivial Black-Green Cycles Identity Breakpoint Graph is a breakpoint graph of two identical genomes. Identity breakpoint graph consists of trivial cycles, each formed by one green and one black edge. a b c d # trivial cycles = # genes
73
Genome Rearrangements Affect Black-Green Cycles cycle(P,Q)=2 cyclescycle(Q,Q)= 4 trivial cycles Transforming genome P into genome Q corresponds to transforming black-green cycles in G(P,Q) into trivial cycles in G(Q,Q). a c d b a b c d
74
Rearrangements Change Breakpoint Graphs and Cycle(P,Q) cycle(P',Q) = 3 cycle(Q,Q) = 4 =#genes a c b d a c b d a c b d cycle(P,Q) = 2 BG(P,Q) BG(P',Q) BG(Q,Q) trivial cycles
75
Sorting by 2-Breaks 2-breaks P=Q 0 → Q 1 →... → Q d =Q BG(P,Q) → BG(Q 1, Q) →... → BG(Q,Q) cycle(P,Q) cycles→..............→cycle(Q,Q)=#genes # of black-green cycles increased by #genes - cycle(P,Q) How much each 2-break can contribute to this increase?
76
A 2-Break: adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges) removes 2 black edges and thus destroys at least 1 old cycle (containing two old old edges): change in the number of cycles ≤ 2-1=1. Each 2-Break Increases #Cycles by at Most 1
77
2-Break increases the number of cycles by at most one since any non-trivial cycle can be split into two cycles with a 2-break ∗∗ There Exist 2-Breaks Increasing #Cycles by 1
78
Any 2-Break increases the number of cycles by at most one Any non-trivial cycle can be split into two cycles with a 2-break Every sorting by 2-breaks must increase #cycles by #genes - cycle(P,Q) 2-Break distance between genomes P and Q: #genes - cycle(P,Q) 2-Breaks Distance
79
79 Human-mouse breakpoint graph
80
Human and mouse genomes can be viewed as strings in the alphabet of 280 synteny blocks (at least 0.5 million nucleotides in length) The breakpoint graph on these blocks has 35 cycles 2-Break distance between HUMAN and MOUSE: #genes - cycle(HUMAN,MOUSE)=280-35=245 2-Break Distance between HUMAN and MOUSE
81
Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) Reversal replaces two black edges with two other black edges a c d b a c d b
82
Not a Reversal P=(+a-b-c+d) Q=?????? This operation also replaces two black edges with two other black edges. But it is not a reversal. a c d b a c d b
83
Fissions ∗ P=(+a-b-c+d) Q=(+a-b)(-c+d) Fissions split a single cycle (chromosome) into two. Fissions replace two black edges with two other black edges. a c d b a c d b ∗ fission
84
Translocations/Fusions Translocations/Fusions transform two cycles (chromosomes) into a single one. They also replace two black edges with two other black edges. P=(+a-b-c+d) Q=(+a-b)(-c+d) ∗ a c d b a c d b ∗ fusion
85
2-Breaks 2-Break replaces any pair of black edges with another pair forming matching on the same 4 vertices. Reversals/translocations/fusions/fissions represent all possible 2-Breaks. P=(+a-b-c+d) Q=(+a-b-d+c) 2-break a c d b a c d b ∗ ∗
86
2-Break Distance The 2-Break distance d 2 (P,Q) between genomes P and Q is the minimum number of 2-Breaks required to transform P into Q. 2-Break Distance Problem: Given two genomes, find the shortest sequence of 2-Breaks transforming one genome into another.
87
Two Genomes as Black-Red and Green-Red Cycles P=(+a-b-c+d) Q=(+a+c+b-d) a c d b a b d c P Q
88
Common Red Edges a c d b a b d c P Q a b c d
89
Superimposing... a c d b a b d c P Q Q a b c d
90
Breakpoint Graph G(P,Q) a c d b a b d c P Q a b c d
91
Breakpoint Graph: Red, Black, and Green Matchings Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: G(P,Q) a b c d
92
Black-Red Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) a b c d
93
Green-Red Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: green-red cycles (genome Q) G(P,Q) a b c d
94
Black-Green Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-green cycles a b cc d
95
Breakpoint Graph Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) green-red cycles (genome Q) black-green cycles G(P,Q) a b c d
96
Breakpoint Graph of Two Identical Genomes Identity Breakpoint Graph is a breakpoint graph of two identical genomes. P=(+a-b-c+d) G(P,P) a b c d
97
Identity Breakpoint Graph Consists of Trivial Black-Green Cycles Identity Breakpoint Graph is a breakpoint graph of two identical genomes. Identity breakpoint graph consists of trivial cycles. a b c d
98
Genome Rearrangements Affect Black-Green Cycles cycle(P,Q)=2 cyclescycle(P,P)= 4 trivial cycles Transforming genome Q into genome P by 2-breaks corresponds to transforming black-green cycles in G(P,Q) into trivial cycles in G(P,P). a c d b a b c d
99
Sorting by 2-Breaks 2-breaks Q=Q 0 → Q 1 →... → Q d =P G(P,Q) → G(P,Q 1 ) →... → G(P,P) cycle(P,Q) cycles →... → |P| cycles # of black-green cycles increased by |P| - cycle(P,Q) How much each 2-break can contribute to this increase?
100
A 2-Break: adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges) removes 2 black edges and thus destroys at least 1 old cycle (containing two old old edges): change in the number of cycles Δc ≤ 2-1=1. Each 2-Break Increases #Cycles by at Most 1
101
2-Break increases the number of cycles by at most one (Δc ≤ 1) Any non-trivial cycle can be split into two cycles with a 2-break (Δc = 1) ∗∗ There Exist 2-Breaks Increasing #Cycles by 1
102
Any 2-Break increases the number of cycles by at most one (Δc ≤ 1) Any non-trivial cycle can be split into two cycles with a 2-break (Δc = 1) Every sorting by 2-break must increase #cycles by |P| - cycle(P,Q) The 2-Break Distance between genomes P and Q: d 2 (P,Q) = |P| - cycle(P,Q) 2-Breaks Distance
103
103 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10)
104
104 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10) By concatenating chromosomes, this may be mimicked by a single reversal:
105
105 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10) By concatenating chromosomes, this may be mimicked by a single reversal:
106
106 Multichromosomal rearrangements Fission and fusion (1 2 3 4 5) ( ) (1 2) (3 4 5) By concatenating chromosomes, this may be mimicked by a single reversal: Evolution: Number of chromosomes in mammals varies widely.
107
107 Multichromosomal rearrangements Concatenates Concatenate all the chromosomes of a genome into a single sequence. Sort the resulting uni- chromosomal genome by reversals
108
108 Multichromosomal rearrangements Concatenates Concatenate together all the chromosomes of a genome into a single sequence. Sort the resulting uni- chromosomal genome by reversals GRIMM web server
109
109 GRIMM Web Server: Multichromosomal rearrangements
110
110 Reduction from multichromosomal to unichromosomal case (Hannenhalli&Pevzner): Let d = minimum total number of reversals, translocations, fissions, and fusions among all rearrangement scenarios between two genomes. H&P introduced canonical concatenate that allows one to mimic a most parsimonious scenario by a d-step reversal scenario on the concatenate. Difficult patological cases (Tesler 2002, Ozery-Flato&Shamir, 2003): Tesler, 2003 - there are rare pathological multichromosomal cases requiring a (d + 1)-step reversal scenario with one chromosome flip. Multichromosomal rearrangements
111
111 GRIMM web server http:// www-cse.ucsd.edu/ groups/bioinformatics/ GRIMM
112
112 GRIMM-Synteny Analysis of microrearrangements
113
113 Mike Kamal @ Whitehead Institute (MIT) and Ming Li (Waterloo) provided 558,000 anchors (short similarity regions, length 40-9600bp). We combined them into large synteny blocks and applied GRIMM. GRIMM-Synteny Human/mouse comparison
114
114 What are the similarity blocks and how to find them? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements
115
115 Finding Synteny Blocks 25,839 anchors Anchors enlarged for visibility. Apparent density may be an illusion. First, separate noise synteny blocks
116
116 GRIMM-Synteny on X chromosome (a) Macro/Micro-rearrangements 25,839 anchors Anchors enlarged for visibility. Apparent density may be an illusion. First, separate noise synteny blocks and then separate microrearrangements (inside synteny blocks) macrorearrangements (of whole blocks)
117
117 A single synteny block with 1114 anchors and 85 micro- rearrangem ents. GRIMM-Synteny Blowup of a synteny block
118
118 GRIMM-Synteny on X chromosome (a) From anchors to synteny blocks
119
119 Synteny Block Generation GRIMM-Synteny(Genome,w, ) w: gap size : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes (anchors) in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length )
120
120 Yet Another Synteny Block Generation ST-Synteny(Genome,w, ) w: gap size : minimum synteny block size Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing < elements
121
121 Two Algorithms: Which One is “Better”? GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length C) ST-Synteny(Genome,w, ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing < elements
122
122 GRIMM-Synteny on X chromosome (b) From anchors to synteny blocks
123
123 GRIMM-Synteny on X chromosome (c) From anchors to synteny blocks
124
124 GRIMM-Synteny on X chromosome (d) From anchors to synteny blocks 11 synteny blocks. 176 micro- rearrangements within these blocks.
125
125 GRIMM-Synteny on X chromosome (e) From anchors to synteny blocks
126
126 GRIMM-Synteny on X chromosome (f) Breakpoint graph
127
127 GRIMM-Synteny on X chromosome (g) Breakpoint graph
128
128 GRIMM-Synteny on X chromosome (h) Breakpoint graph
129
129 GRIMM-Synteny on X chromosome (i) Breakpoint graph
130
130 GRIMM-Synteny Human-mouse breakpoint graph
131
131 Evidence for fragile regions (rearrangement hotspots) in mammalian evolution
132
132 GRIMM determines minimum number of rearrangements is 7 (naked eye gives 6). There are numerous 7-step scenarios. The true scenario may have more than 7 steps. GRIMM on X chromosome
133
133 GRIMM on X chromosome: breakpoint re-uses
134
134 GRIMM on ALL chromosomes GRIMM determines minimum number of rearrangemnts is 245 (naked eye gives 130). There are numerous 245-step scenarios. The true scenario may have more than 245 steps.
135
135 Breakpoint re-use Whole genome
136
136 Are There any Rearrangement Hotspots in Human Genome?
137
137 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes
138
138 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints
139
139 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints
140
140 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k=260 breakpoints
141
141 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path?
142
142 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130
143
143 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130
144
144 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path
145
145 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130?
146
146 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 The Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130? Yes, 245 >> 130
147
147 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130
148
148 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130? Yes, 245 >> 130 There was a vast breakpoint re-use – an argument against the random breakage model (according to scan statistics).
149
149 Human/mouse comparison reveals the size of breakpoint regions (regions between consecutive synteny blocks) is small, accounting for ~ 5% of genome breakpoint re-use is very high, approx. 1.9 uses per breakpoint region on average Mouse genome paper (Nature, 2002): The analysis suggests that chromosomal breaks may have a tendency to reoccur in certain regions. High Breakpoint re-use provides evidence against the Random Breakage Model
150
150 Breakpoint re-use X chromosome
151
151 History of Chromosome X Rat Consortium, Nature, 2004
152
152 Breakpoint re-use Nadeau-Taylor random breakage theory Human genome paper(Nature, Feb. 2001): [W]e estimate that true number of conserved segments is around 190-230, in good agreement with the original Nadeau-Taylor prediction. Mouse genome paper (Nature, Dec. 2002): The analysis suggests that chromosomal breaks may have a tendency to reoccur in certain regions.
153
153 Breakpoint re-use Fragile breakage model We postulate that mammalian genomes are mosaics of fragile regions with high propensity for rearrangements, and solid regions with low propensity for rearrangements. Estimate > 360 fragile regions, of which 260 are observed in human-mouse breakpoint regions. The others may be revealed by comparison with several mammalian species when such data becomes available.
154
154 Breakpoint re-use Fragile breakage model
155
155 Random Breakage Theory re-re-visited Sankoff and Trinh, 2004 refute this conclusion and suggested that RBM is correct.
156
156 Random Breakage Theory re-re-visited First, we thank David Sankoff for an insightful criticism If you are not criticized, you may not be doing much Donald Rumsfeld
157
157 Random Breakage Theory re-re-visited First, we thank David Sankoff for an insightful criticism If you are not criticized, you may not be doing much Donald Rumsfeld But how can one criticize a THEOREM???
158
158 Random Breakage Theory re-re-visited If you are not criticized, you may not be doing much Donald Rumsfeld But how can one criticize a THEOREM???
159
159 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130
160
160 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes! Proof: ……………………………………………………… Is 245 larger than 130? Yes, 245 >> 130 ……………………………………………………… ……………………………………………………….
161
161 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes! Proof: ……………………………………………………… Is 245 larger than 130? Yes, 245 >> 130 ……………………………………………………… Sankoff did not question the validity of the proof - he questioned the validity of the numbers. The computed #breakpoint regions (260) and the rearrangement distance (245) are parameter-dependent and may be wrong
162
162 Sankoff-Trinh Argument Designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be???
163
163 Sankoff-Trinh Argument Sankoff & Trinh designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be??? S&T emphasized the importance of synteny block generation and parameter choice S&T argued that the breakpoint re-use we observed is caused by artifacts of parameter-dependent synteny block generation and micro-rearrangements
164
164 Criticizing Sankoff-Trinh Argument Sankoff and Trinh designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be??? Before you criticize people, you should walk a mile in their shoes. That way, when you criticize them, you are a mile away. And you have their shoes. J.K. Lambert
165
165 Walking in Sankoff-Trinh Shoes Sankoff and Trinh used a simple synteny block generation algorithm (ST-Synteny) and claimed that it is similar to GRIMM-Synteny ST-Synteny indeed appears to be similar to GRIMM-Synteny We reproduced Sankoff-Trinh’s simulation and their ST-Synteny algorithm
166
166 Chromosome Simulation Based on Random Breakage Model Simulation(n,m,k,w) Uni-chromosomal genome of size n = 5000 “genes” (elements) Generate m = 150 random inversions without breakpoint reuse Generate k micro-inversion of exactly w elements (inversion span) each randomly placed throughout the genome Produces a signed permutation of n elements
167
167 Sankoff-Trinh Synteny Block Generation ST-Synteny( ,w, ) Define each element of as a block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if one block contains element i or –i the other block contains j or –j |i - j| w, signs are ignored Delete any short block containing < elements ( =3) Assign signs to the remaining blocks according to the majority sign rule
168
168 Sankoff-Trinh Synteny Block Generation ST-Synteny(Genome,w, ) w: gap size : minimum synteny block size Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing < elements
169
169 Pevzner-Tesler Synteny Block Generation GRIMM-Synteny(Genome,w, ) w: gap size : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length )
170
170 GRIMM-Synteny Block Generation GRIMM-Synteny(Genome,w, ) w: gap size : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length )
171
171 Comparing Two Algorithms GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length C) ST-Synteny(Genome,w, ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing < elements
172
172 Comparing Two Algorithms GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length C) ST-Synteny(Genome,w, ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing < elements The algorithms look very similar but do they produce similar results?
173
173 ST-Synteny vs. GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny( ,380,380) ST-Synteny( ,378,380) Number of blocks 44, 10 Total block length (Mb) 95.3, 139.8 Breakpoint regions (%) 37.98, 9.05
174
174 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w 1 yields [1…100] [-200…-101] [201…300] genome 1 genome 2
175
175 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w 1 yields [1…100] [-200…-101] [201…300] genome 1 genome 2
176
176 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w 1 yields [1…100] [-200…-101] [201…300] [1…100 -200…-101] [201…300] genome 1 genome 2
177
177 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w 1 yields [1…100] [-200…-101] [201…300] [1…100 -200…-101] [201…300] [1…100 -200…-101 201…300] genome 1 genome 2 2-D distance is ignored
178
178 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
179
179 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
180
180 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
181
181 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
182
182 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
183
183 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
184
184 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
185
185 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
186
186 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2
187
187 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w, = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] …
188
188 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w, = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] …
189
189 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w, = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] … … [102 103 104] [105 106 107] … i.e. two blocks, but no breakpoint No way to connect the two blocks since there are no consecutive elements
190
190 GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny is in agreement with synteny blocks generated by other methods
191
191 GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny is in agreement with synteny blocks generated by other methods, except …
192
192 ST-Synteny Flaw II GRIMM-Synteny vs. ST-Synteny Number of blocks 44, 10 Total block length (Mb) 95, 140 Breakpoint regions (%) 38, 9 Breakpoint re-use: 1.97, 1.64 Changing parameters does not help ST-Synteny
193
193 ST-Synteny Flaw III ST-Synteny is not even symmetric, i.e., the number of synteny blocks between human and mouse may differ from the number of synteny blocks between mouse and human
194
194 ST-Synteny Results in Much Higher Breakpoint Reuse than GRIMM-Synteny An artifact of ST-Synteny rather than any argument against the Fragile Breakage Model
195
195 Higher Breakpoint Reuse Rate Explained by Larger Breakpoint Regions
196
196 ST-Synteny Deletes Elements and Small Blocks Much Faster
197
197 ST-Synteny vs. GRIMM-Synteny Simulation(n=5000, m=15, k=500, w=5) r = 1.31, 1.09
198
198 ST-Synteny vs. GRIMM-Synteny Simulation(n=5000, m=15, k=500, w=5) r = 1.31, 1.09
199
199 Multi-Chromosome Genome Simulation with Random Breakage Model Previous simulations ignored the length of anchors Use human coordinates of human/mouse alignment anchors Concatenate all chromosomes (randomly oriented) Generate 150 inversions/translocations at random locations Simulate 300 blocks Generate k micro-inversions at random locations with inversion span between 1 and W Apply GRIMM-Synteny, compute BRR with G, C=1 Mb
200
200 Random Breakage Model re-re-re-visited Sankoff & Trinh, 2004 emphasized the importance of accurate synteny block generation but felt victims of their own flawed ST-Synteny algorithm ST-Synteny was never applied to real data Peng et al., 2006 (PLOS Computational Biology): If Sankoff & Trinh fixed their ST-Synteny algorithm, they would confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model Sankoff, 2006 (PLOS Computational Biology): Not only did we foist a hastily conceived and incorrectly executed simulation on an overworked RECOMB conference program committee, but worse—nostra maxima culpa—we obliged a team of high-powered researchers to clean up after us!
201
201 Random Breakage Model re-re-re-visited Sankoff & Trinh, 2004 emphasized the importance of accurate synteny block generation but felt victims of their own flawed ST-Synteny algorithm ST-Synteny was never applied to real data Peng et al., 2006 (PLOS Computational Biology): If Sankoff & Trinh fixed their ST-Synteny algorithm, they would confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model Sankoff, 2006 (PLOS Computational Biology): Not only did we foist a hastily conceived and incorrectly executed simulation on an overworked RECOMB conference program committee, but worse—nostra maxima culpa—we obliged a team of high-powered researchers to clean up after us! ”nostra maxima culpa” = It’s all our fault (Latin)
202
Kikuta et al., Genome Res. 2007: “... the Nadeau and Taylor hypothesis is not possible for the explanation of synteny in rat.” All Recent Studies Support FBM
203
203 Sankoff-Trinh Argument It is much easier to be critical than correct Benjamin Disraeli
204
204 Where are the rearrangement hotspots located? We demonstrated the existence of rearrangement hotspots but did not answer the question where they are. We combined forces with Harris Lewin and Bill Murphy who formed the “Mammalian Genomic Architectures” consortium to find fragile regions in human genome. The results (and the preliminary answer to the question above) are reported in Murphy et al., Science, 2005
205
205 Where are the rearrangement hotspots located? We demonstrated the existence of rearrangement hotspots but did not answer the question where they are. We presented the preliminary answer to the question in Murphy et al., Science, 2005 (joint work with “Mammalian Genomic Architectures” consortium). Many groups are currently trying to identify all fragile regions in mammalian genomes (Alekseyev and PP, Genome Biology, 2010)
206
Turnover Fragile Breakage Model Recent studies reveal evidence for the “birth and death” of the fragile regions, implying that they move to different locations in different lineages. Turnover Fragile Breakage Model (TFBM) Matching Segmental Duplications This discovery resulted in the Turnover Fragile Breakage Model (TFBM) that accounts for the “birth and death” of the fragile regions and sheds light on a possible relationship between rearrangements and Matching Segmental Duplications. TFBM points to locations of the currently fragile regions in the human genome.
207
Tests vs. Models Why biologists believed in RBM for 20 years? Because RBM implies the exponential distribution of the sizes of the blocks observed in real genomes. A flaw in this logic: RBM is not the only model that complies with the “exponential distribution” test. Why RBM was refuted? Because RBM does not comply with the “breakpoint reuse” test: RBM implies low reuse but real genomes reveal high reuse. FBM complies with both the “exponential distribution” and “breakpoint reuse” tests. But is there a test that both RBM and FBM fail? Exponential distribution Breakpoint reuse RBM YES NO FBM YES Model Test
208
Tests vs. Models RBM and FBM fail the Multispecies Breakpoint Reuse (MBR) test. Exponential distribution Breakpoint reuse MBR RBM YES NO FBM YES NO Model Test
209
Tests vs. Models TFBM passes all three tests. Exponential distribution Breakpoint reuse MBR RBM YES NO FBM YES NO TFBM YES Model Test
210
Implications of TFBM Where are the (currently) Fragile Regions in the Human genome?
211
Prediction Power of TFBM H Can we determine currently active regions in the human genome H from comparison with other mammalian genomes? RBM provides no clue H FBM suggests to consider the breakpoints between H and any other genome QH G(QH,H) H TFBM suggests to consider the closest genome such as the macaque-human ancestor QH. Breakpoints in G(QH,H) are likely to be reused in the future rearrangements of H.
212
Validation of Predictions for the Macaque-Human Ancestor (QH,H) Prediction of fragile regions on (QH,H) based on the mouse, rat, and dog genomes: M Using mouse genome M as a proxy: accuracy 34 / 552 ≈ 6% MRD Using mouse-rat-dog ancestor genome MRD: accuracy 18 / 162 ≈ 11% Q Using macaque genome Q: accuracy 10 / 68 ≈ 16% (using synteny blocks larger than 500K)
213
Putative Active Fragile Regions in the Human Genome
214
Unsolved Mystery: What Causes Fragility? Zhao and Bourque, Genome Res. 2009, suggested that fragility is promoted by Matching Segmental Duplications, a pair of long similar regions located within breakpoint regions flanking a rearrangement. TFBM is consistent with this hypothesis since the similarity between MSDs deteriorates with time, implying that MSDs are also subject to a “birth and death” process.
215
215 Chromosome X two way similarities (PatternHunter) synteny bocks (GRIMM-Synteny) rearrangement scenario (GRIMM + MGR)
216
216 Acknowledgements Guillaume Bourque (U. Montreal) Jerry Greenberg (SDSC) Michael Kamal (MIT) Uri Keich (UCSD) Bill Murphy (National Cancer Institute) Stephen O’Brien (National Cancer Inst.) Bin Ma (U. of Western Ontario) Colin Collins (UCSF Cancer Center) Stas Volik (UCSF Cancer Center) David Sankoff (U. Ottawa)
217
217 Acknowledgements Guillaume Bourque (U. Montreal) Jerry Greenberg (SDSC) Michael Kamal (MIT) Uri Keich (UCSD) Bill Murphy (National Cancer Institute) Stephen O’Brien (National Cancer Inst.) Bin Ma (U. of Western Ontario) Colin Collins (UCSF Cancer Center) Stas Volik (UCSF Cancer Center) David Sankoff (U. Ottawa)
218
218 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny)
219
219 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny) David Sankoff (emphasized potential pitfalls of synteny block generation and rearrangement analysis)
220
220 Acknowledgements Rearrangement-based phylogeny Guillaume Bourque (Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Consortium Mike Kamal (Broad)… Rat Consortium Bin Ma (Western Ontario)… Chicken Consortium Pierre Bork, Evgeny Zdobnov (EMBL)… Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy and Steven O’Brien (NCI)
221
221 Collaborators Rearrangement-based phylogeny Guillaume Bourque ( Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF Cancer Center) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Mike Kamal (Broad)… Rat Bin Ma (Western Ontario), …. Chicken Pierre Bork and Evgeny Zdobnov (EMBL) Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy, Steven O’Brien (NCI)
222
222 Reconstructing Genomic Architecture of Tumor Genomes 1)Pieces of tumor genome: clones (100-250kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to a pair of end sequences (ES pair) (x,y). yx
223
223 Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 Tumor Genome Reconstruction Puzzle Reconstruct tumor genome
224
224 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction
225
225 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction
226
226 BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 Tumor Genome Reconstruction
227
227 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human
228
228 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
229
229 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
230
230 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
231
231 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
232
232 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
233
233 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
234
234 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
235
235 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
236
236 B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome
237
237 Real data noisy and incomplete!
238
238 Breast Cancer MCF7 Cell Line Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael et al. 2003.
239
239 33/70 clusters Total length: 31Mb Complications with MCF7: Chromosomes 1,3,17, 20
240
240 s A t C-B s A t CB inversion Human u A B u A w DBCD v E w CD v E duplication/ transposition u AB w C v E ???? Rearrangement Signatures Tumor s A t -B s A t -CBDCD translocation
241
241 Complex Tumor Genomes
242
242 Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Tumor genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994)
243
243 Tumor Amplisomes
244
244 33 clusters Total length: 31Mb 172013 Reconstructed MCF7 amplisome Chromosome colors Explains 24/33 invalid clusters. Raphael and Pevzner, 2004.
245
245 Tumor Genomes Projects Tumor genomeHuman genome 1)Identify recurrent aberrations 2)Identify temporal sequence of aberrations 3)Use these data for tumor diagnostics and therapeutics Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3
246
246 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny)
247
247 Sequencing Tumor Clones Confirms Complex Mosaic Structure Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006
248
248 Sequencing Tumor Clones Confirms Complex Mosaic Structure Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006 Hampton et al., A sequence-level map of chromosome breakpoints yields insights into the evolution of cancer genome. Genome Res, 2008 (157 breakpoints found using next generation sequencing)
249
249 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny) Ben Raphael (tumor genes) David Sankoff (emphasized potential pitfalls of synteny block generation and rearrangement analysis)
250
250 Acknowledgements Rearrangement-based phylogeny Guillaume Bourque (Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Consortium Mike Kamal (Broad)… Rat Consortium Bin Ma (Western Ontario)… Chicken Consortium Pierre Bork, Evgeny Zdobnov (EMBL)… Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy and Steven O’Brien (NCI)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.