Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?

Similar presentations


Presentation on theme: "1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?"— Presentation transcript:

1 1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?

2 2 From Biology to Computing From Biology to Computing: …… Problem Formulation

3 3 Transforming men into mice (and into cats, cows, dogs, rats, chicken…) Pavel Pevzner University of California, San Diego (joint work with Glenn Tesler and Ben Raphael)

4 4 Transforming Men into Mice: Fragile versus Random Breakage Models of Chromosome Evolution Pavel Pevzner University of California, San Diego (joint work with Qian Peng and Glenn Tesler)

5 5 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Approaches to answering these questions: Statistical: Nadeau-Taylor random breakage theory (1984) Combinatorial: Sankoff, et al. (early ‘90s) Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

6 6 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

7 7 Transforming mice into men (X chromosome)

8 8 Genome Rearrangements: Evolutionary “Earthquakes” What is the evolutionary scenario for transforming one genome into the other? What is the organization of the ancestral genome? Are there any rearrangement hotspots in mammalian genomes?

9 9 Collaborators Cat and Cow (Guillaume Bourque, Bill Murphy and Steve O’Brien) Chimpanzee (Ben Raphael and Haixu Tang) Dog (Guillaume Bourque and Gregor Adelfinger) Rat (Guillaume Bourque and Bin Ma) Tumors (Ben Raphael and UCSF Cancer Center group)

10 10 Susumu Ohno: Two Hypothesis Ohno, 1970, 1973 Whole Genome Duplication Hypothesis: Big leaps in evolution would have been impossible without whole genome duplications. Random Breakage Hypothesis: Genomic architectures are shaped by rearrangements that occur randomly (there are no fragile regions).

11 11 Whole Genome Duplication Hypothesis Finally Confirmed After Years’ of Controversy The Whole Genome Duplication hypothesis first met with skepticism and was only recently confirmed. Kellis, Birren & Lander, Nature, 2004 “Our analysis resolves the long- standing controversy on the ancestry of the yeast genome ” “There was a whole-genome duplication.” Wolfe, Nature, 1997 “There was no whole- genome duplication.” Dujon, FEBS, 2000 “Duplications occurred independently” Langkjaer, JMB, 2000 “Continuous duplications” Dujon, Yeast 2003 “Multiple duplications” Friedman, Gen. Res, 2003 “Spontaneous duplications” Koszul, EMBO, 2004

12 12 Random Breakage Hypothesis Meets a Different Fate The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution. Nadeau & Taylor, 1984, PNAS First estimate of the number of synteny blocks between human and mouse First convincing arguments in favor of the Random Breakage Model (RBM) RBM was re-iterated in hundreds of papers

13 13 Random Breakage Hypothesis Meets a Different Fate The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution Nadeau & Taylor, PNAS 1984 First estimate of the number of synteny blocks between human and mouse First convincing arguments in favor of the Random Breakage Model (RBM) RBM was re-iterated in hundreds of papers Pevzner & Tesler, PNAS 2003 Rejected RBM and proposed the Fragile Breakage Model Postulated existence of rearrangement hotspots and vast breakpoint reuse

14 14 Are the Rearrangement Hotspots Real? The Fragile Breakage Model did not live long. In 2004 David Sankoff presented convincing arguments against the Fragile Breakage Model (Sankoff & Trinh, 2004) “… we have shown that breakpoint re-use of the same magnitude as found in Pevzner and Tesler, 2003 may very well be artifacts in a context where NO re-use actually occurred.”

15 15 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM

16 16 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM

17 17 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM Today I will argue against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM

18 18 Random Breakage Theory re-re-re-visited Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM Pevzner & Tesler, 2003 argued against RBM Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM Today I will argue against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM (as if I have nothing better to do)

19 19 Evolution of HerpesViruses

20 20 What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

21 21 Whole genome synteny blocks

22 22 History of Chromosome X Rat Consortium, Nature, 2004

23 23 Human-Mouse-Rat Phylogeny

24 24 Reconstruction of Tumor Genome Architectures Pavel Pevzner and Ben Raphael (UCSD) & Colin Collins and Stas Volik (UCSF Cancer Center)

25 25 Chromosome Painting: Normal Cells

26 26 Tumor Genomes Tumor cells often exhibit chromosomal aberrations:

27 27 Tumor Genomes Thousands of individual rearrangements known for different tumors. promoterc-ab1 oncogene BCR genepromoter ABL gene BCR genepromoter Rearrangements may disrupt genes and alter gene regulation. Example: translocation in leukemia yields “Philadelphia” chromosome: Chr 9 Chr 22

28 28 Breast Cancer Tumor Genome MCF7 is human breast cancer cell line. Cytogenetic analysis suggests complex architecture: What is the detailed architecture of MCF7 tumor genome? What sequence of rearrangements produced MCF7?

29 29 Nadeau-Taylor random breakage theory Proposed by Ohno (Nature, 1973). Detailed statistical formulation given by Nadeau and Taylor (PNAS, 1984). Upheld by many studies since then, including Nature, Feb. 2001 human genome paper. Mouse genome paper (Nature, Dec. 2002) : first doubts

30 30 Reversals (also called inversions) Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred about once-twice every million years on the evolutionary path between human and mouse. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

31 31 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred one-two times every million years on the evolutionary path between human and mouse.

32 32 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in gene order).

33 33 Sorting by reversals

34 34 Sorting by reversals Most parsimonious scenarios The reversal distance is the minimum number of reversals required to transform one gene order into another. Here, the distance is 4.

35 35 Sorting by Reversals Breakpoint distance

36 36 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination

37 37 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination How many breakpoints can be eliminated by a single reversal?

38 38 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination reversal distance >= # breakpoints / 2 = 6/2 = 3

39 39 Sorting by Reversals Breakpoint distance Sorting by Reversal = breakpoint elimination reversal distance >= # breakpoints / 2 = 6/2 = 3 This formula vastly underestimates the reversal distance by assuming that breakpoints are never re-used.

40 40 Breakpoint graph Breakpoint Graph (Bafna and Pevzner) DualityTheorem (Hannenhalli-Pevzner): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = 8 + 1 – 5 + 0 + 0 = 4

41 41 Breakpoint graph The Reversal Distance Theorem: reversal distance = #blocks + 1 – #cycles + #hurdles # hurdles is a rather complicated parameter, but can be computed from breakpoint graph in linear time. Here, reversal distance = 8 blocks+1–5 cycles+0= 4

42 42 Breakpoint graph Reversal Distance Theorem (slightly imprecise version): reversal distance = number of blocks–number of cycles

43 43 Complexity of reversal distance

44 44 Types of Rearrangements Reversal 1 2 3 4 5 61 2 -5 -4 -3 6 Translocation 4 1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 Fusion Fission

45 45 Rearrangements in Multi-Chromosomal Genomes are not limited to reversals… translocations:

46 46 Rearrangements in Multi-chromosomal Genomes Besides reversals… translocations: fusions and fissions of chromosomes

47 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b

48 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b

49 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b

50 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b

51 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) a c d b a c d b

52 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) A reversal replaces two black edges with two other black edges a c d b a c d b

53 Reversals on Circular Chromosomes reversal A reversal replaces two black edges with two other black edges a c b a c d b d a b cda bcd

54 Not a Reversal P=(+a-b-c+d) Q=?????? This operation also replaces two black edges with two other black edges. But it is not a reversal. a c d b a c d b

55 Fissions ∗ P=(+a-b-c+d) Q=(+a-b)(-c+d) Fissions split a single chromosome into two – also replace two black edges with two other black edges. a c d b a c d b ∗ fission

56 Translocations/Fusions Translocations/Fusions transform two cycles (chromosomes) into a single one. They also replace two black edges with two other black edges. P=(+a-b-c+d) Q=(+a-b)(-c+d) ∗ a c d b a c d b ∗ fusion

57 2-Breaks 2-Break replaces any pair of black edges with another pair. P=(+a-b-c+d) Q=(+a-b-d+c) 2-break a c d b a c d b ∗ ∗

58 2-Break Distance Problem Given two genomes, find the shortest sequence of 2-Breaks transforming one genome into another.

59 Two Genomes as Black-Red and Green-Red Cycles P=(+a-b-c+d) Q=(+a+c+b-d) a c d b a b d c P Q

60 Common Red Edges a c d b a b d c P Q a b c d

61 Superimposing... a c d b a b d c P Q Q a b c d

62 a c d b a b d c P Q Q a b c d

63 a c d b a b d c P Q Q a b c d

64 a c d b a b d c P Q Q a b c d

65 Breakpoint Graph BG(P,Q) a c d b a b d c P Q a b c d

66 Breakpoint Graph: Red, Black, and Green Matchings Breakpoint graph is formed by red, black and green edges. a b c d

67 Black-Red Cycles (red genome) Breakpoint graph is formed by red, black and green edges. black and red edges form genome P a b c d

68 Green-Red Cycles (green genome) Breakpoint graph is formed by red, black and green edges. green and red edges form genome Q a b c d

69 Black-Green Cycles (breakpoint graph) Breakpoint graph is formed by red, black and green edges. black and green edges form black-green cycles cycle (P,Q) – number of cycles in the breakpoint graph of genomes P and Q a b cc d

70 Breakpoint Graph Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) green-red cycles (genome Q) black-green cycles G(P,Q) a b c d

71 Breakpoint Graph of Two Identical Genomes Trivial Breakpoint Graph is a breakpoint graph of two identical genomes. Q=(+a-b-c+d) BG(Q,Q) a b c d

72 Identity Breakpoint Graph Consists of Trivial Black-Green Cycles Identity Breakpoint Graph is a breakpoint graph of two identical genomes. Identity breakpoint graph consists of trivial cycles, each formed by one green and one black edge. a b c d # trivial cycles = # genes

73 Genome Rearrangements Affect Black-Green Cycles cycle(P,Q)=2 cyclescycle(Q,Q)= 4 trivial cycles Transforming genome P into genome Q corresponds to transforming black-green cycles in G(P,Q) into trivial cycles in G(Q,Q). a c d b a b c d

74 Rearrangements Change Breakpoint Graphs and Cycle(P,Q) cycle(P',Q) = 3 cycle(Q,Q) = 4 =#genes a c b d a c b d a c b d cycle(P,Q) = 2 BG(P,Q)‏ BG(P',Q)‏ BG(Q,Q) trivial cycles

75 Sorting by 2-Breaks 2-breaks P=Q 0 → Q 1 →... → Q d =Q BG(P,Q) → BG(Q 1, Q) →... → BG(Q,Q) cycle(P,Q) cycles→..............→cycle(Q,Q)=#genes # of black-green cycles increased by #genes - cycle(P,Q) How much each 2-break can contribute to this increase?

76 A 2-Break: adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges) removes 2 black edges and thus destroys at least 1 old cycle (containing two old old edges): change in the number of cycles ≤ 2-1=1. Each 2-Break Increases #Cycles by at Most 1

77 2-Break increases the number of cycles by at most one since any non-trivial cycle can be split into two cycles with a 2-break ∗∗ There Exist 2-Breaks Increasing #Cycles by 1

78 Any 2-Break increases the number of cycles by at most one Any non-trivial cycle can be split into two cycles with a 2-break Every sorting by 2-breaks must increase #cycles by #genes - cycle(P,Q) 2-Break distance between genomes P and Q: #genes - cycle(P,Q) 2-Breaks Distance

79 79 Human-mouse breakpoint graph

80 Human and mouse genomes can be viewed as strings in the alphabet of 280 synteny blocks (at least 0.5 million nucleotides in length) The breakpoint graph on these blocks has 35 cycles 2-Break distance between HUMAN and MOUSE: #genes - cycle(HUMAN,MOUSE)=280-35=245 2-Break Distance between HUMAN and MOUSE

81 Reversals on Circular Genomes reversal P=(+a-b-c+d) Q=(+a-b-d+c) Reversal replaces two black edges with two other black edges a c d b a c d b

82 Not a Reversal P=(+a-b-c+d) Q=?????? This operation also replaces two black edges with two other black edges. But it is not a reversal. a c d b a c d b

83 Fissions ∗ P=(+a-b-c+d) Q=(+a-b)(-c+d) Fissions split a single cycle (chromosome) into two. Fissions replace two black edges with two other black edges. a c d b a c d b ∗ fission

84 Translocations/Fusions Translocations/Fusions transform two cycles (chromosomes) into a single one. They also replace two black edges with two other black edges. P=(+a-b-c+d) Q=(+a-b)(-c+d) ∗ a c d b a c d b ∗ fusion

85 2-Breaks 2-Break replaces any pair of black edges with another pair forming matching on the same 4 vertices. Reversals/translocations/fusions/fissions represent all possible 2-Breaks. P=(+a-b-c+d) Q=(+a-b-d+c) 2-break a c d b a c d b ∗ ∗

86 2-Break Distance The 2-Break distance d 2 (P,Q) between genomes P and Q is the minimum number of 2-Breaks required to transform P into Q. 2-Break Distance Problem: Given two genomes, find the shortest sequence of 2-Breaks transforming one genome into another.

87 Two Genomes as Black-Red and Green-Red Cycles P=(+a-b-c+d) Q=(+a+c+b-d) a c d b a b d c P Q

88 Common Red Edges a c d b a b d c P Q a b c d

89 Superimposing... a c d b a b d c P Q Q a b c d

90 Breakpoint Graph G(P,Q) a c d b a b d c P Q a b c d

91 Breakpoint Graph: Red, Black, and Green Matchings Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: G(P,Q) a b c d

92 Black-Red Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) a b c d

93 Green-Red Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: green-red cycles (genome Q) G(P,Q) a b c d

94 Black-Green Cycles Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-green cycles a b cc d

95 Breakpoint Graph Breakpoint graph is formed by red, black and green matchings. Every pair of matchings forms a collection of alternating cycles: black-red cycles (genome P) green-red cycles (genome Q) black-green cycles G(P,Q) a b c d

96 Breakpoint Graph of Two Identical Genomes Identity Breakpoint Graph is a breakpoint graph of two identical genomes. P=(+a-b-c+d) G(P,P) a b c d

97 Identity Breakpoint Graph Consists of Trivial Black-Green Cycles Identity Breakpoint Graph is a breakpoint graph of two identical genomes. Identity breakpoint graph consists of trivial cycles. a b c d

98 Genome Rearrangements Affect Black-Green Cycles cycle(P,Q)=2 cyclescycle(P,P)= 4 trivial cycles Transforming genome Q into genome P by 2-breaks corresponds to transforming black-green cycles in G(P,Q) into trivial cycles in G(P,P). a c d b a b c d

99 Sorting by 2-Breaks 2-breaks Q=Q 0 → Q 1 →... → Q d =P G(P,Q) → G(P,Q 1 ) →... → G(P,P) cycle(P,Q) cycles →... → |P| cycles # of black-green cycles increased by |P| - cycle(P,Q) How much each 2-break can contribute to this increase?

100 A 2-Break: adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges) removes 2 black edges and thus destroys at least 1 old cycle (containing two old old edges): change in the number of cycles Δc ≤ 2-1=1. Each 2-Break Increases #Cycles by at Most 1

101 2-Break increases the number of cycles by at most one (Δc ≤ 1) Any non-trivial cycle can be split into two cycles with a 2-break (Δc = 1) ∗∗ There Exist 2-Breaks Increasing #Cycles by 1

102 Any 2-Break increases the number of cycles by at most one (Δc ≤ 1) Any non-trivial cycle can be split into two cycles with a 2-break (Δc = 1) Every sorting by 2-break must increase #cycles by |P| - cycle(P,Q) The 2-Break Distance between genomes P and Q: d 2 (P,Q) = |P| - cycle(P,Q) 2-Breaks Distance

103 103 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10)

104 104 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10) By concatenating chromosomes, this may be mimicked by a single reversal:

105 105 Multichromosomal rearrangements Translocation (5 9 4 10) (–6 –1 11 7 –2) (5 9 11 7 –2) (–6 –1 4 10) By concatenating chromosomes, this may be mimicked by a single reversal:

106 106 Multichromosomal rearrangements Fission and fusion (1 2 3 4 5) ( ) (1 2) (3 4 5) By concatenating chromosomes, this may be mimicked by a single reversal: Evolution: Number of chromosomes in mammals varies widely.

107 107 Multichromosomal rearrangements Concatenates Concatenate all the chromosomes of a genome into a single sequence. Sort the resulting uni- chromosomal genome by reversals

108 108 Multichromosomal rearrangements Concatenates Concatenate together all the chromosomes of a genome into a single sequence. Sort the resulting uni- chromosomal genome by reversals GRIMM web server

109 109 GRIMM Web Server: Multichromosomal rearrangements

110 110 Reduction from multichromosomal to unichromosomal case (Hannenhalli&Pevzner): Let d = minimum total number of reversals, translocations, fissions, and fusions among all rearrangement scenarios between two genomes. H&P introduced canonical concatenate that allows one to mimic a most parsimonious scenario by a d-step reversal scenario on the concatenate. Difficult patological cases (Tesler 2002, Ozery-Flato&Shamir, 2003): Tesler, 2003 - there are rare pathological multichromosomal cases requiring a (d + 1)-step reversal scenario with one chromosome flip. Multichromosomal rearrangements

111 111 GRIMM web server http:// www-cse.ucsd.edu/ groups/bioinformatics/ GRIMM

112 112 GRIMM-Synteny Analysis of microrearrangements

113 113 Mike Kamal @ Whitehead Institute (MIT) and Ming Li (Waterloo) provided 558,000 anchors (short similarity regions, length 40-9600bp). We combined them into large synteny blocks and applied GRIMM. GRIMM-Synteny Human/mouse comparison

114 114 What are the similarity blocks and how to find them? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

115 115 Finding Synteny Blocks 25,839 anchors Anchors enlarged for visibility. Apparent density may be an illusion. First, separate noise synteny blocks

116 116 GRIMM-Synteny on X chromosome (a) Macro/Micro-rearrangements 25,839 anchors Anchors enlarged for visibility. Apparent density may be an illusion. First, separate noise synteny blocks and then separate microrearrangements (inside synteny blocks) macrorearrangements (of whole blocks)

117 117 A single synteny block with 1114 anchors and 85 micro- rearrangem ents. GRIMM-Synteny Blowup of a synteny block

118 118 GRIMM-Synteny on X chromosome (a) From anchors to synteny blocks

119 119 Synteny Block Generation GRIMM-Synteny(Genome,w,  ) w: gap size  : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes (anchors) in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length   )

120 120 Yet Another Synteny Block Generation ST-Synteny(Genome,w,  ) w: gap size  : minimum synteny block size Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing <  elements

121 121 Two Algorithms: Which One is “Better”? GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length  C) ST-Synteny(Genome,w,  ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing <  elements

122 122 GRIMM-Synteny on X chromosome (b) From anchors to synteny blocks

123 123 GRIMM-Synteny on X chromosome (c) From anchors to synteny blocks

124 124 GRIMM-Synteny on X chromosome (d) From anchors to synteny blocks 11 synteny blocks. 176 micro- rearrangements within these blocks.

125 125 GRIMM-Synteny on X chromosome (e) From anchors to synteny blocks

126 126 GRIMM-Synteny on X chromosome (f) Breakpoint graph

127 127 GRIMM-Synteny on X chromosome (g) Breakpoint graph

128 128 GRIMM-Synteny on X chromosome (h) Breakpoint graph

129 129 GRIMM-Synteny on X chromosome (i) Breakpoint graph

130 130 GRIMM-Synteny Human-mouse breakpoint graph

131 131 Evidence for fragile regions (rearrangement hotspots) in mammalian evolution

132 132 GRIMM determines minimum number of rearrangements is 7 (naked eye gives 6). There are numerous 7-step scenarios. The true scenario may have more than 7 steps. GRIMM on X chromosome

133 133 GRIMM on X chromosome: breakpoint re-uses

134 134 GRIMM on ALL chromosomes GRIMM determines minimum number of rearrangemnts is 245 (naked eye gives 130). There are numerous 245-step scenarios. The true scenario may have more than 245 steps.

135 135 Breakpoint re-use Whole genome

136 136 Are There any Rearrangement Hotspots in Human Genome?

137 137 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes

138 138 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints

139 139 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints

140 140 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k=260 breakpoints

141 141 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path?

142 142 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130

143 143 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130

144 144 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path

145 145 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130?

146 146 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 The Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130? Yes, 245 >> 130

147 147 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates up to 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130

148 148 Are There any Rearrangement Hotspots in Human Genome? Proof continues: If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path Is 245 larger than 130? Yes, 245 >> 130 There was a vast breakpoint re-use – an argument against the random breakage model (according to scan statistics).

149 149 Human/mouse comparison reveals the size of breakpoint regions (regions between consecutive synteny blocks) is small, accounting for ~ 5% of genome breakpoint re-use is very high, approx. 1.9 uses per breakpoint region on average Mouse genome paper (Nature, 2002): The analysis suggests that chromosomal breaks may have a tendency to reoccur in certain regions. High Breakpoint re-use provides evidence against the Random Breakage Model

150 150 Breakpoint re-use X chromosome

151 151 History of Chromosome X Rat Consortium, Nature, 2004

152 152 Breakpoint re-use Nadeau-Taylor random breakage theory Human genome paper(Nature, Feb. 2001): [W]e estimate that true number of conserved segments is around 190-230, in good agreement with the original Nadeau-Taylor prediction. Mouse genome paper (Nature, Dec. 2002): The analysis suggests that chromosomal breaks may have a tendency to reoccur in certain regions.

153 153 Breakpoint re-use Fragile breakage model We postulate that mammalian genomes are mosaics of fragile regions with high propensity for rearrangements, and solid regions with low propensity for rearrangements. Estimate > 360 fragile regions, of which  260 are observed in human-mouse breakpoint regions. The others may be revealed by comparison with several mammalian species when such data becomes available.

154 154 Breakpoint re-use Fragile breakage model

155 155 Random Breakage Theory re-re-visited Sankoff and Trinh, 2004 refute this conclusion and suggested that RBM is correct.

156 156 Random Breakage Theory re-re-visited First, we thank David Sankoff for an insightful criticism If you are not criticized, you may not be doing much Donald Rumsfeld

157 157 Random Breakage Theory re-re-visited First, we thank David Sankoff for an insightful criticism If you are not criticized, you may not be doing much Donald Rumsfeld But how can one criticize a THEOREM???

158 158 Random Breakage Theory re-re-visited If you are not criticized, you may not be doing much Donald Rumsfeld But how can one criticize a THEOREM???

159 159 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: Every rearrangement creates 2 breakpoints If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints Human-mouse comparison reveals 2k≈260 breakpoints If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130

160 160 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes! Proof: ……………………………………………………… Is 245 larger than 130? Yes, 245 >> 130 ……………………………………………………… ……………………………………………………….

161 161 Are There any Rearrangement Hotspots in Human Genome? Theorem. Yes! Proof: ……………………………………………………… Is 245 larger than 130? Yes, 245 >> 130 ……………………………………………………… Sankoff did not question the validity of the proof - he questioned the validity of the numbers. The computed #breakpoint regions (260) and the rearrangement distance (245) are parameter-dependent and may be wrong

162 162 Sankoff-Trinh Argument Designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be???

163 163 Sankoff-Trinh Argument Sankoff & Trinh designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be??? S&T emphasized the importance of synteny block generation and parameter choice S&T argued that the breakpoint re-use we observed is caused by artifacts of parameter-dependent synteny block generation and micro-rearrangements

164 164 Criticizing Sankoff-Trinh Argument Sankoff and Trinh designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots How can it be??? Before you criticize people, you should walk a mile in their shoes. That way, when you criticize them, you are a mile away. And you have their shoes. J.K. Lambert

165 165 Walking in Sankoff-Trinh Shoes Sankoff and Trinh used a simple synteny block generation algorithm (ST-Synteny) and claimed that it is similar to GRIMM-Synteny ST-Synteny indeed appears to be similar to GRIMM-Synteny We reproduced Sankoff-Trinh’s simulation and their ST-Synteny algorithm

166 166 Chromosome Simulation Based on Random Breakage Model Simulation(n,m,k,w) Uni-chromosomal genome of size n = 5000 “genes” (elements) Generate m = 150 random inversions without breakpoint reuse Generate k micro-inversion of exactly w elements (inversion span) each randomly placed throughout the genome Produces a signed permutation  of n elements

167 167 Sankoff-Trinh Synteny Block Generation ST-Synteny( ,w,  ) Define each element of  as a block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if one block contains element i or –i the other block contains j or –j |i - j|  w, signs are ignored Delete any short block containing <  elements (  =3) Assign signs to the remaining blocks according to the majority sign rule

168 168 Sankoff-Trinh Synteny Block Generation ST-Synteny(Genome,w,  ) w: gap size  : minimum synteny block size Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing <  elements

169 169 Pevzner-Tesler Synteny Block Generation GRIMM-Synteny(Genome,w,  ) w: gap size  : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length   )

170 170 GRIMM-Synteny Block Generation GRIMM-Synteny(Genome,w,  ) w: gap size  : minimum synteny block size Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length   )

171 171 Comparing Two Algorithms GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length  C) ST-Synteny(Genome,w,  ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing <  elements

172 172 Comparing Two Algorithms GRIMM-Synteny(Genome,w,∆) Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks Delete small synteny blocks (length  C) ST-Synteny(Genome,w,  ) Define each gene in Genome as a separate block and iteratively amalgamate the resulting blocks Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome. Delete any short block containing <  elements The algorithms look very similar but do they produce similar results?

173 173 ST-Synteny vs. GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny( ,380,380) ST-Synteny( ,378,380) Number of blocks 44, 10 Total block length (Mb) 95.3, 139.8 Breakpoint regions (%) 37.98, 9.05

174 174 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w  1 yields [1…100] [-200…-101] [201…300] genome 1 genome 2

175 175 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w  1 yields [1…100] [-200…-101] [201…300] genome 1 genome 2

176 176 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w  1 yields [1…100] [-200…-101] [201…300] [1…100 -200…-101] [201…300] genome 1 genome 2

177 177 ST-Synteny Flaw I Amalgamates blocks that should be separate Identity permutation: 1…100 101… 200 201…300 With one inversion: 1…100 -200…-101 201…300 - clearly 3 blocks, 2 breakpoints - but ST-Synteny with w  1 yields [1…100] [-200…-101] [201…300] [1…100 -200…-101] [201…300] [1…100 -200…-101 201…300] genome 1 genome 2 2-D distance is ignored

178 178 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

179 179 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

180 180 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

181 181 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

182 182 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

183 183 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

184 184 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

185 185 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

186 186 ST-Synteny Flaw I Permutation -3 2 -1 -5 4 GRIMM- Synteny ST-Synteny Synteny blocks by GRIMM-Synteny & ST-Synteny hypothetical genome 1 hypothetical genome 2

187 187 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w,  = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] …

188 188 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w,  = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] …

189 189 ST-Synteny Flaw II Small deleted blocks may interrupt a large block … 100 101 200 102 103 104 300 105 106 107 … intuitively, one large block [100 101 102 103 104 105 106 107] with two small ones [200] [300] that may be deleted but ST-Synteny with small w,  = 3 yields … [100 101] [200] [102 103 104] [300] [105 106 107] … … [102 103 104] [105 106 107] … i.e. two blocks, but no breakpoint No way to connect the two blocks since there are no consecutive elements

190 190 GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny is in agreement with synteny blocks generated by other methods

191 191 GRIMM-Synteny on Human/Mouse X Chromosome GRIMM-Synteny is in agreement with synteny blocks generated by other methods, except …

192 192 ST-Synteny Flaw II GRIMM-Synteny vs. ST-Synteny Number of blocks 44, 10 Total block length (Mb) 95, 140 Breakpoint regions (%) 38, 9 Breakpoint re-use: 1.97, 1.64 Changing parameters does not help ST-Synteny

193 193 ST-Synteny Flaw III ST-Synteny is not even symmetric, i.e., the number of synteny blocks between human and mouse may differ from the number of synteny blocks between mouse and human

194 194 ST-Synteny Results in Much Higher Breakpoint Reuse than GRIMM-Synteny An artifact of ST-Synteny rather than any argument against the Fragile Breakage Model

195 195 Higher Breakpoint Reuse Rate Explained by Larger Breakpoint Regions

196 196 ST-Synteny Deletes Elements and Small Blocks Much Faster

197 197 ST-Synteny vs. GRIMM-Synteny Simulation(n=5000, m=15, k=500, w=5) r = 1.31, 1.09

198 198 ST-Synteny vs. GRIMM-Synteny Simulation(n=5000, m=15, k=500, w=5) r = 1.31, 1.09

199 199 Multi-Chromosome Genome Simulation with Random Breakage Model Previous simulations ignored the length of anchors Use human coordinates of human/mouse alignment anchors Concatenate all chromosomes (randomly oriented) Generate 150 inversions/translocations at random locations Simulate 300 blocks Generate k micro-inversions at random locations with inversion span between 1 and W Apply GRIMM-Synteny, compute BRR with G, C=1 Mb

200 200 Random Breakage Model re-re-re-visited Sankoff & Trinh, 2004 emphasized the importance of accurate synteny block generation but felt victims of their own flawed ST-Synteny algorithm ST-Synteny was never applied to real data Peng et al., 2006 (PLOS Computational Biology): If Sankoff & Trinh fixed their ST-Synteny algorithm, they would confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model Sankoff, 2006 (PLOS Computational Biology): Not only did we foist a hastily conceived and incorrectly executed simulation on an overworked RECOMB conference program committee, but worse—nostra maxima culpa—we obliged a team of high-powered researchers to clean up after us!

201 201 Random Breakage Model re-re-re-visited Sankoff & Trinh, 2004 emphasized the importance of accurate synteny block generation but felt victims of their own flawed ST-Synteny algorithm ST-Synteny was never applied to real data Peng et al., 2006 (PLOS Computational Biology): If Sankoff & Trinh fixed their ST-Synteny algorithm, they would confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model Sankoff, 2006 (PLOS Computational Biology): Not only did we foist a hastily conceived and incorrectly executed simulation on an overworked RECOMB conference program committee, but worse—nostra maxima culpa—we obliged a team of high-powered researchers to clean up after us! ”nostra maxima culpa” = It’s all our fault (Latin)

202 Kikuta et al., Genome Res. 2007: “... the Nadeau and Taylor hypothesis is not possible for the explanation of synteny in rat.” All Recent Studies Support FBM

203 203 Sankoff-Trinh Argument It is much easier to be critical than correct Benjamin Disraeli

204 204 Where are the rearrangement hotspots located? We demonstrated the existence of rearrangement hotspots but did not answer the question where they are. We combined forces with Harris Lewin and Bill Murphy who formed the “Mammalian Genomic Architectures” consortium to find fragile regions in human genome. The results (and the preliminary answer to the question above) are reported in Murphy et al., Science, 2005

205 205 Where are the rearrangement hotspots located? We demonstrated the existence of rearrangement hotspots but did not answer the question where they are. We presented the preliminary answer to the question in Murphy et al., Science, 2005 (joint work with “Mammalian Genomic Architectures” consortium). Many groups are currently trying to identify all fragile regions in mammalian genomes (Alekseyev and PP, Genome Biology, 2010)

206 Turnover Fragile Breakage Model Recent studies reveal evidence for the “birth and death” of the fragile regions, implying that they move to different locations in different lineages. Turnover Fragile Breakage Model (TFBM) Matching Segmental Duplications This discovery resulted in the Turnover Fragile Breakage Model (TFBM) that accounts for the “birth and death” of the fragile regions and sheds light on a possible relationship between rearrangements and Matching Segmental Duplications. TFBM points to locations of the currently fragile regions in the human genome.

207 Tests vs. Models Why biologists believed in RBM for 20 years? Because RBM implies the exponential distribution of the sizes of the blocks observed in real genomes. A flaw in this logic: RBM is not the only model that complies with the “exponential distribution” test. Why RBM was refuted? Because RBM does not comply with the “breakpoint reuse” test: RBM implies low reuse but real genomes reveal high reuse. FBM complies with both the “exponential distribution” and “breakpoint reuse” tests. But is there a test that both RBM and FBM fail? Exponential distribution Breakpoint reuse RBM YES NO FBM YES Model Test

208 Tests vs. Models RBM and FBM fail the Multispecies Breakpoint Reuse (MBR) test. Exponential distribution Breakpoint reuse MBR RBM YES NO FBM YES NO Model Test

209 Tests vs. Models TFBM passes all three tests. Exponential distribution Breakpoint reuse MBR RBM YES NO FBM YES NO TFBM YES Model Test

210 Implications of TFBM Where are the (currently) Fragile Regions in the Human genome?

211 Prediction Power of TFBM H Can we determine currently active regions in the human genome H from comparison with other mammalian genomes? RBM provides no clue H FBM suggests to consider the breakpoints between H and any other genome QH G(QH,H) H TFBM suggests to consider the closest genome such as the macaque-human ancestor QH. Breakpoints in G(QH,H) are likely to be reused in the future rearrangements of H.

212 Validation of Predictions for the Macaque-Human Ancestor (QH,H) Prediction of fragile regions on (QH,H) based on the mouse, rat, and dog genomes: M Using mouse genome M as a proxy: accuracy 34 / 552 ≈ 6% MRD Using mouse-rat-dog ancestor genome MRD: accuracy 18 / 162 ≈ 11% Q Using macaque genome Q: accuracy 10 / 68 ≈ 16% (using synteny blocks larger than 500K)

213 Putative Active Fragile Regions in the Human Genome

214 Unsolved Mystery: What Causes Fragility? Zhao and Bourque, Genome Res. 2009, suggested that fragility is promoted by Matching Segmental Duplications, a pair of long similar regions located within breakpoint regions flanking a rearrangement. TFBM is consistent with this hypothesis since the similarity between MSDs deteriorates with time, implying that MSDs are also subject to a “birth and death” process.

215 215 Chromosome X two way similarities (PatternHunter) synteny bocks (GRIMM-Synteny) rearrangement scenario (GRIMM + MGR)

216 216 Acknowledgements Guillaume Bourque (U. Montreal) Jerry Greenberg (SDSC) Michael Kamal (MIT) Uri Keich (UCSD) Bill Murphy (National Cancer Institute) Stephen O’Brien (National Cancer Inst.) Bin Ma (U. of Western Ontario) Colin Collins (UCSF Cancer Center) Stas Volik (UCSF Cancer Center) David Sankoff (U. Ottawa)

217 217 Acknowledgements Guillaume Bourque (U. Montreal) Jerry Greenberg (SDSC) Michael Kamal (MIT) Uri Keich (UCSD) Bill Murphy (National Cancer Institute) Stephen O’Brien (National Cancer Inst.) Bin Ma (U. of Western Ontario) Colin Collins (UCSF Cancer Center) Stas Volik (UCSF Cancer Center) David Sankoff (U. Ottawa)

218 218 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny)

219 219 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny) David Sankoff (emphasized potential pitfalls of synteny block generation and rearrangement analysis)

220 220 Acknowledgements Rearrangement-based phylogeny Guillaume Bourque (Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Consortium Mike Kamal (Broad)… Rat Consortium Bin Ma (Western Ontario)… Chicken Consortium Pierre Bork, Evgeny Zdobnov (EMBL)… Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy and Steven O’Brien (NCI)

221 221 Collaborators Rearrangement-based phylogeny Guillaume Bourque ( Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF Cancer Center) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Mike Kamal (Broad)… Rat Bin Ma (Western Ontario), …. Chicken Pierre Bork and Evgeny Zdobnov (EMBL) Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy, Steven O’Brien (NCI)

222 222 Reconstructing Genomic Architecture of Tumor Genomes 1)Pieces of tumor genome: clones (100-250kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to a pair of end sequences (ES pair) (x,y). yx

223 223 Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 Tumor Genome Reconstruction Puzzle Reconstruct tumor genome

224 224 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction

225 225 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction

226 226 BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 Tumor Genome Reconstruction

227 227 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human

228 228 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

229 229 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

230 230 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

231 231 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

232 232 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

233 233 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

234 234 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

235 235 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

236 236 B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome

237 237 Real data noisy and incomplete!

238 238 Breast Cancer MCF7 Cell Line Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael et al. 2003.

239 239 33/70 clusters Total length: 31Mb Complications with MCF7: Chromosomes 1,3,17, 20

240 240 s A t C-B s A t CB inversion Human u A B u A w DBCD v E w CD v E duplication/ transposition u AB w C v E ???? Rearrangement Signatures Tumor s A t -B s A t -CBDCD translocation

241 241 Complex Tumor Genomes

242 242 Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Tumor genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994)

243 243 Tumor Amplisomes

244 244 33 clusters Total length: 31Mb 172013 Reconstructed MCF7 amplisome Chromosome colors Explains 24/33 invalid clusters. Raphael and Pevzner, 2004.

245 245 Tumor Genomes Projects Tumor genomeHuman genome 1)Identify recurrent aberrations 2)Identify temporal sequence of aberrations 3)Use these data for tumor diagnostics and therapeutics Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3

246 246 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny)

247 247 Sequencing Tumor Clones Confirms Complex Mosaic Structure Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006

248 248 Sequencing Tumor Clones Confirms Complex Mosaic Structure Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006 Hampton et al., A sequence-level map of chromosome breakpoints yields insights into the evolution of cancer genome. Genome Res, 2008 (157 breakpoints found using next generation sequencing)

249 249 Collaborators Qian Peng (walked in Sankoff’s shoes) Glenn Tesler (GRIMM/GRIMM-Synteny) Ben Raphael (tumor genes) David Sankoff (emphasized potential pitfalls of synteny block generation and rearrangement analysis)

250 250 Acknowledgements Rearrangement-based phylogeny Guillaume Bourque (Genomics Institute of Singapore) Rearrangements in Tumors Ben Raphael (UCSD), Colin Collins and Stas Volik (UCSF) Mammalian Genomic Architectures Consortium Bill Murphy (Texas A&M), Harris Lewin (Illinois) …. Mouse Consortium Mike Kamal (Broad)… Rat Consortium Bin Ma (Western Ontario)… Chicken Consortium Pierre Bork, Evgeny Zdobnov (EMBL)… Dog Gregor Adelfinger (U. of Montreal) Cat Bill Murphy and Steven O’Brien (NCI)


Download ppt "1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?"

Similar presentations


Ads by Google