Download presentation
Presentation is loading. Please wait.
Published byJacob Scott Modified over 9 years ago
1
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California, San Diego (joint work with Colin Collins lab at UCSF Cancer Center)
2
Chromosome Painting: Normal Cells
3
Chromosome Painting: Tumor Cells
4
Rearrangements in Tumors Gleevec TM (Novartis 2001) targets BCR-ABL oncogene. Change gene structure and regulatory “wiring” of the genome. Create “bad” novel fusion genes and break “good” old genes. Example: translocation in leukemia. promoter ABL gene BCR genepromoter Chromosome 9 Chromosome 22 BCR-ABL oncogene
5
Complex Tumor Genomes 1)What are detailed architectures of tumor genomes? 2)What rearrangements/duplications produce these architectures and what is the order of these events? 3)What are the novel fusion genes and old “broken” genes?
6
What are the the “architectural blocks” forming the existing genomes and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements
7
History of Chromosome X Rat Consortium, Nature, 2004
8
Inversions Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,2,…,10 could be misread as: 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
9
Inversions 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Evolution: occurred one-two times every million years. Cancer: may occur every month.
10
Inversions 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in order).
11
Turnip vs Cabbage: Different Gene Order
15
Gene order comparison: Before After Evolution is manifested as the divergence in gene order
16
Human-Mouse-Rat Phylogeny Bourque et al., Genome Research, 2004
17
“Comparative Genomics” of Cancer Tumor genomeHuman genome 1)Identify recurrent aberrations Mitelman Database, >40,000 aberrations 2)Identify temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q 12p* -17p -18q Tree model: (Desper et al.1999) Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3
18
Measuring Structural Changes in Tumors: Cytogenetics Weakness: Physical location of chromosomal junctions not revealed. Low resolution. No/little information about copy number changes. Requires metaphase chromosomes. Directly visualize (fluorescently) labeled chromosomes. Chromosome banding, mFISH, SKY
19
Measuring Copy Number Changes in Tumors: CGH, array CGH Weakness: No information about structural rearrangements (inversions, translocations) or about the positions of duplicated material.
20
End Sequence Profiling (ESP) C. Collins et al. (UCSF Cancer Center) 1)Pieces of tumor genome: clones (100-250kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). yx
21
Order ES pair such that x < y. ES pair (x,y) is –valid if x,y on same chromosome. and l ≤ y – x ≤ L, min (max) size of clone. x, y have opposite, convergent orientations –invalid, otherwise. Results from rearrangement or experimental “noise”. x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 L ES Pairs yx
22
Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 Tumor Genome Reconstruction Puzzle Reconstruct tumor genome
23
BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction
24
BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction
25
BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 Tumor Genome Reconstruction
26
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human
27
B C E A D BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
28
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
29
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
30
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
31
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
32
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
33
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
34
B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?
35
B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome
36
Real data noisy and incomplete!
37
Computational Framework Use knowledge of known rearrangement mechanisms –e.g. inversions, translocations, etc. Find simplest explanation for data, given these mechanisms. Motivation: Sorting by Reversals
38
G = [0,M], unichromosomal genome. Reversal s,t (x)= x, if x t, t – (x – s), otherwise. Given: ES pairs (x 1, y 1 ), …, (x n, y n ) Find: Minimum number of reversals s1,t1, …, sn, tn such that if = s1,t1 … sn, tn then ( x 1, y 1 ), …, ( x n, y n ) are valid ES pairs. x1x1 y1y1 G ’ = G x1x1 y1y1 G BCA -BA x2x2 y2y2 x2x2 y2y2 t s ESP Sorting Problem
39
All ES pairs valid. t s Sequence of reversals. st x1x1 y1y1 x1x1 y1y1 BCA -C -B A y3y3 x3x3 y2y2 y3y3 t s x3x3 x2x2 y2y2 x2x2
40
Sources of Invalid Pairs 1.Chimeric clone: random joining of pieces from tumor genome (noise!). rarely use DNA from same genomic regions. Corresponds to isolated, invalid ES pair (x,y): d(x,x’) + d(y,y’) > 2L for all (x’,y’) ≠ (x,y). 2.Composite clone: contain rearrangement breakpoint Give clusters of invalid BES pairs. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor
41
ESP Genome Reconstruction: Discrete Approximation 1.Remove isolated invalid ES pairs (x,y). 2.Divide genome into synteny blocks using clusters C 1, …, C k Locations of block “junctions” underdetermined: l ≤ | x i – s| + | y i – t | ≤ L ACEBGDF C2 C3 C1 (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 ) t s
42
BCDEFGA B CDEFG A C1: ACEBGDF C2 C3 C1 Discrete Approximation C3: C2: B CDEFG AB CDEFG A -CF-DB-EGA Superimpose graphs Signed permutation of A B C D E F G Tumor genome:
43
Reversals (also called inversions) Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred about once-twice every million years on the evolutionary path between human and mouse. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
44
Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred one-two times every million years on the evolutionary path between human and mouse.
45
Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in order).
46
Sorting by reversals (Sankoff et al.1990): Sorting signed permutations Reversal (i,j): (i,j) = 1 … i-1 - j... - i j+1 … n Signed permutation: = 1 2 … n Given , find a series of reversals 1, …, t such that 1 2 …. t = (1, 2, …, n) and t is minimal.
47
Sorting by reversals Most parsimonious scenarios The reversal distance is the minimum number of reversals required to transform into . Here, the reversal distance is d=4.
48
Breakpoint graph Breakpoint Graph (Bafna and Pevzner, FOCS 94) DualityTheorem (Hannenhalli-Pevzner, STOC 1995): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = 8 + 1 – 5 + 0 + 0 = 4
49
Breakpoint graph DualityTheorem for Sorting by Reversals – simple but imprecise version. reversal distance = number of elements + 1 – number of cycles
50
Complexity of reversal distance
51
GRIMM-Synteny on X chromosome From anchors to synteny blocks
52
GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph
56
Breakpoint Graph Signed permutation = A -C F -D B -E G start-CF-DB-EGendA Black edges: adjacent elements of Gray edges: adjacent elements of i = A B C D E F G For = 1 2 … n, d( ) = n+1 - c( ) + h( ) + f( ) Discrete approximation to ESP constructs breakpoint graph from clusters.
57
Multichromosomal Extension Concatenate chromosomes Translocations modeled by reversals in concatenate Fissions/fusions modeled by reversals with “empty” chromosomes Minimal sequence polynomial time ( Hannenhalli & Pevzner 1995, Tesler 2003, Ozery-Flato and Shamir, 2003. ) B2A1 B2A1 -A2-B1 A2B1 A2A1 B2B1 A2A1-B2-B1 concatenation reversal concatenation translocation
58
Breast Cancer Tumor Genome MCF7 is human breast cancer cell line. Cytogenetic analysis (low-resolution) suggests complex architecture. –Many translocations, inversions. From Kytölä, et.al. Genes, Chroms & Cancer 28:308-317 (2000).
59
ESP Data from MCF7 tumor genome Each point (x,y) is ES pair. (Concatenation of 23 human chromosomes) 15005 clones (Oct. 2003) 11240 ES pairs 10453 valid (black) 737 invalid 489 isolated (red) 248 form 70 clusters (blue)
60
Breast Cancer MCF7 Cell Line Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael et al. 2003.
61
Sparse Data Assumptions tumor 1.Each cluster results from single reversal. 2. Each clone contains at most one breakpoint. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor
62
33/70 clusters Total length: 31Mb Complications with MCF7: Chromosomes 1,3,17, 20
63
s A t C-B s A t CB inversion Human u A B u A w DBCD v E w CD v E duplication/ transposition u AB w C v E ???? Rearrangement Signatures Tumor s A t -B s A t -CBDCD translocation
64
Complex Tumor Genomes
65
Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Tumor genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994)
66
Tumor Amplisomes
67
v w u Analyzing Duplications ABCDE u AB w CD v E duplication duple v,w are boundary elements of duple (-,+) (+,-) (+,+) (-,-)
68
Analyzing Duplications v D w u u A B u A w DBCD v E w CD v E duplication A B CDE Path between boundary elements resolves duple. (-,+) (+,-) (+,+) (-,-) AB
69
v w u Duplication Complications u AB w C v E ???? These configurations are frequent in MCF7 ESP data. (-,+) (+,-) (+,+) (-,-)
70
v w u Duplication Complications u AB These configurations are frequent in MCF7 ESP data. (-,+) (+,-) (+,+) (-,-) CD ?? u AB w C v D
71
v w u Resolving Duplication Complications (-,+) (+,-) (+,+) (-,-) u AB u AB wv ECD Path between boundary elements resolves duple. wv
72
v w u Resolving Duplication Complications Multiple paths between duple boundary elements. (-,+) (+,-) (+,+) (-,-) u AB u AB w C v E wv
73
Many Paths in MCF7!
74
Duplication by Amplisome Gives single model for all duplications
75
Amplisome Reconstruction Problem Approach 1.Identify duplicated sequences A 1, …, A m 2.Amplisome is shortest common superstring of A 1, …, A m Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions
76
ESP Graph Black edges: (1)ES pairs. (2)Adjacent blocks in human genome. Red edges: blocks in human genome. Alternating (red/black) path Tumor genome architecture
77
Approach –Identify duplicated sequences A 1, …, A m –Amplisome is shortest common superstring of A 1, …, A m Amplisome Reconstruction Problem Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions
78
Reconstruction with Amplisomes (-,+) (+,-) (+,+) (-,-)
79
Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication (duples). Search in graph to select amplisome structure. Amplisome Reconstruction Problem
80
ESP Graph Black edges: (1)ES pairs. (2)Adjacent blocks in human genome. Red edges: blocks in human genome.
81
Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication (duples). Search in graph to select amplisome structure. Amplisome Reconstruction Problem
82
Problem: Find amplisome whose insertions best explain ESP data. Approach: Search in graph to select amplisome structure. Find shortest path containing subpaths between each pair of duple boundary elements (v i, w i ). Amplisome Reconstruction Problem w1w1 v1v1
83
Alternating Superpath Problem Given: –Edge colored ESP graph H (red/black edges). –Pairs of vertices (v 1, w 1 ), …, (v m, w m ) in H. Find: – Shortest alternating [red/black] path (cycle) A in H such that A contains an alternating path from v i to w i for each i = 1, …, m. (or for largest number of i)
84
Problem: Find amplisome whose insertions best explain ESP data. Approach: Search in graph to select amplisome structure. Find shortest path containing subpaths between each pair of duple boundary elements (v i, w i ). Amplisome Reconstruction Problem
85
Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A1
86
Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A2
87
Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A3
88
33 clusters Total length: 31Mb 172013 Reconstructed MCF7 amplisome Chromosome colors Explains 24/33 invalid clusters. Raphael and Pevzner, 2004.
89
Sequencing Tumor Clones Confirms Complex Mosaic Structure
90
What’s Next? Human Genome Project, 2001 Mouse Genome Project, 2002 Rat Genome Project, 2003 Chicken Genome Project, 2004 Chimp Genome Project, 2005 ???
91
Tumor Genomes Projects Tumor genomeHuman genome 1)Identify recurrent aberrations 2)Identify temporal sequence of aberrations 3)Use these data for tumor diagnostics and therapeutics Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3
92
Current/Future Projects Unified model: Duplications and rearrangements. Combine ESP and array-CGH data. Annotation/experimental verification of genome/amplisome structure. Primary tumors: breast, brain, prostate, ovary…
93
Open Problems Analysis/algorithms for ESP Sorting Problem. Amplisome Reconstruction when allow insertions within insertions. Other models?
94
Acknowledgements Ray Brown University of California, San Diego Colin Collins Stas Volik Joe Gray Cancer Research Center University of California, San Francisco
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.