Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,

Similar presentations


Presentation on theme: "Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,"— Presentation transcript:

1 Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California, San Diego (joint work with Colin Collins lab at UCSF Cancer Center)

2 Chromosome Painting: Normal Cells

3 Chromosome Painting: Tumor Cells

4 Rearrangements in Tumors Gleevec TM (Novartis 2001) targets BCR-ABL oncogene. Change gene structure and regulatory “wiring” of the genome. Create “bad” novel fusion genes and break “good” old genes. Example: translocation in leukemia. promoter ABL gene BCR genepromoter Chromosome 9 Chromosome 22 BCR-ABL oncogene

5 Complex Tumor Genomes 1)What are detailed architectures of tumor genomes? 2)What rearrangements/duplications produce these architectures and what is the order of these events? 3)What are the novel fusion genes and old “broken” genes?

6 What are the the “architectural blocks” forming the existing genomes and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 80 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

7 History of Chromosome X Rat Consortium, Nature, 2004

8 Inversions Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,2,…,10 could be misread as: 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

9 Inversions 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Evolution: occurred one-two times every million years. Cancer: may occur every month.

10 Inversions 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in order).

11 Turnip vs Cabbage: Different Gene Order

12

13

14

15 Gene order comparison: Before After Evolution is manifested as the divergence in gene order

16 Human-Mouse-Rat Phylogeny Bourque et al., Genome Research, 2004

17 “Comparative Genomics” of Cancer Tumor genomeHuman genome 1)Identify recurrent aberrations Mitelman Database, >40,000 aberrations 2)Identify temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q  12p*  -17p  -18q Tree model: (Desper et al.1999) Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3

18 Measuring Structural Changes in Tumors: Cytogenetics Weakness: Physical location of chromosomal junctions not revealed. Low resolution. No/little information about copy number changes. Requires metaphase chromosomes. Directly visualize (fluorescently) labeled chromosomes. Chromosome banding, mFISH, SKY

19 Measuring Copy Number Changes in Tumors: CGH, array CGH Weakness: No information about structural rearrangements (inversions, translocations) or about the positions of duplicated material.

20 End Sequence Profiling (ESP) C. Collins et al. (UCSF Cancer Center) 1)Pieces of tumor genome: clones (100-250kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). yx

21 Order ES pair such that x < y. ES pair (x,y) is –valid if x,y on same chromosome. and l ≤ y – x ≤ L, min (max) size of clone. x, y have opposite, convergent orientations –invalid, otherwise. Results from rearrangement or experimental “noise”. x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 L ES Pairs yx

22 Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 Tumor Genome Reconstruction Puzzle Reconstruct tumor genome

23 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction

24 BCEAD -C -D E A B Tumor Human Tumor Genome Reconstruction

25 BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 Tumor Genome Reconstruction

26 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human

27 B C E A D BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

28 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

29 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

30 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

31 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

32 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

33 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

34 B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

35 B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome

36 Real data noisy and incomplete!

37 Computational Framework Use knowledge of known rearrangement mechanisms –e.g. inversions, translocations, etc. Find simplest explanation for data, given these mechanisms. Motivation: Sorting by Reversals

38 G = [0,M], unichromosomal genome. Reversal  s,t (x)= x, if x t, t – (x – s), otherwise. Given: ES pairs (x 1, y 1 ), …, (x n, y n ) Find: Minimum number of reversals  s1,t1, …,  sn, tn such that if  =  s1,t1 …  sn, tn then (  x 1,  y 1 ), …, (  x n,  y n ) are valid ES pairs. x1x1 y1y1 G ’ =  G x1x1 y1y1 G BCA -BA x2x2 y2y2 x2x2 y2y2  t s ESP Sorting Problem

39 All ES pairs valid. t s Sequence of reversals. st x1x1 y1y1 x1x1 y1y1 BCA -C -B A y3y3 x3x3 y2y2 y3y3  t s x3x3 x2x2 y2y2 x2x2

40 Sources of Invalid Pairs 1.Chimeric clone: random joining of pieces from tumor genome (noise!). rarely use DNA from same genomic regions. Corresponds to isolated, invalid ES pair (x,y): d(x,x’) + d(y,y’) > 2L for all (x’,y’) ≠ (x,y). 2.Composite clone: contain rearrangement breakpoint Give clusters of invalid BES pairs. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor

41 ESP Genome Reconstruction: Discrete Approximation 1.Remove isolated invalid ES pairs (x,y). 2.Divide genome into synteny blocks using clusters C 1, …, C k Locations of block “junctions” underdetermined: l ≤ | x i – s| + | y i – t | ≤ L ACEBGDF C2 C3 C1 (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 ) t s

42 BCDEFGA B CDEFG A C1: ACEBGDF C2 C3 C1 Discrete Approximation C3: C2: B CDEFG AB CDEFG A -CF-DB-EGA Superimpose graphs Signed permutation of A B C D E F G Tumor genome:

43 Reversals (also called inversions) Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred about once-twice every million years on the evolutionary path between human and mouse. 1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

44 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred one-two times every million years on the evolutionary path between human and mouse.

45 Reversals (also called inversions) 1 32 4 10 5 6 8 9 7 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The inversion introduced two breakpoints (disruptions in order).

46 Sorting by reversals (Sankoff et al.1990): Sorting signed permutations Reversal  (i,j):   (i,j) =  1 …  i-1 -  j... -  i  j+1 …  n Signed permutation:  =  1  2 …  n Given , find a series of reversals  1, …,  t such that   1  2 ….  t = (1, 2, …, n) and t is minimal.

47 Sorting by reversals Most parsimonious scenarios The reversal distance is the minimum number of reversals required to transform  into . Here, the reversal distance is d=4.

48 Breakpoint graph Breakpoint Graph (Bafna and Pevzner, FOCS 94) DualityTheorem (Hannenhalli-Pevzner, STOC 1995): d = n + 1 – c + h + f where c = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = 8 + 1 – 5 + 0 + 0 = 4

49 Breakpoint graph DualityTheorem for Sorting by Reversals – simple but imprecise version. reversal distance = number of elements + 1 – number of cycles

50 Complexity of reversal distance

51 GRIMM-Synteny on X chromosome From anchors to synteny blocks

52 GRIMM-Synteny on X chromosome 2-dimensional breakpoint graph

53

54

55

56 Breakpoint Graph Signed permutation  = A -C F -D B -E G start-CF-DB-EGendA Black edges: adjacent elements of  Gray edges: adjacent elements of i = A B C D E F G For  =  1  2 …  n, d(  ) = n+1 - c(  ) + h(  ) + f(  ) Discrete approximation to ESP constructs breakpoint graph from clusters.

57 Multichromosomal Extension Concatenate chromosomes Translocations modeled by reversals in concatenate Fissions/fusions modeled by reversals with “empty” chromosomes Minimal sequence polynomial time ( Hannenhalli & Pevzner 1995, Tesler 2003, Ozery-Flato and Shamir, 2003. ) B2A1 B2A1 -A2-B1 A2B1 A2A1 B2B1 A2A1-B2-B1 concatenation reversal concatenation translocation

58 Breast Cancer Tumor Genome MCF7 is human breast cancer cell line. Cytogenetic analysis (low-resolution) suggests complex architecture. –Many translocations, inversions. From Kytölä, et.al. Genes, Chroms & Cancer 28:308-317 (2000).

59 ESP Data from MCF7 tumor genome Each point (x,y) is ES pair. (Concatenation of 23 human chromosomes) 15005 clones (Oct. 2003) 11240 ES pairs 10453 valid (black) 737 invalid 489 isolated (red) 248 form 70 clusters (blue)

60 Breast Cancer MCF7 Cell Line Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael et al. 2003.

61 Sparse Data Assumptions tumor 1.Each cluster results from single reversal. 2. Each clone contains at most one breakpoint. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor

62 33/70 clusters Total length: 31Mb Complications with MCF7: Chromosomes 1,3,17, 20

63 s A t C-B s A t CB inversion Human u A B u A w DBCD v E w CD v E duplication/ transposition u AB w C v E ???? Rearrangement Signatures Tumor s A t -B s A t -CBDCD translocation

64 Complex Tumor Genomes

65 Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Tumor genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994)

66 Tumor Amplisomes

67 v w u Analyzing Duplications ABCDE u AB w CD v E duplication duple v,w are boundary elements of duple (-,+) (+,-) (+,+) (-,-)

68 Analyzing Duplications v D w u u A B u A w DBCD v E w CD v E duplication A B CDE Path between boundary elements resolves duple. (-,+) (+,-) (+,+) (-,-) AB

69 v w u Duplication Complications u AB w C v E ???? These configurations are frequent in MCF7 ESP data. (-,+) (+,-) (+,+) (-,-)

70 v w u Duplication Complications u AB These configurations are frequent in MCF7 ESP data. (-,+) (+,-) (+,+) (-,-) CD ?? u AB w C v D

71 v w u Resolving Duplication Complications (-,+) (+,-) (+,+) (-,-) u AB u AB wv ECD Path between boundary elements resolves duple. wv

72 v w u Resolving Duplication Complications Multiple paths between duple boundary elements. (-,+) (+,-) (+,+) (-,-) u AB u AB w C v E wv

73 Many Paths in MCF7!

74 Duplication by Amplisome Gives single model for all duplications

75 Amplisome Reconstruction Problem Approach 1.Identify duplicated sequences A 1, …, A m 2.Amplisome is shortest common superstring of A 1, …, A m Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions

76 ESP Graph Black edges: (1)ES pairs. (2)Adjacent blocks in human genome. Red edges: blocks in human genome. Alternating (red/black) path Tumor genome architecture

77 Approach –Identify duplicated sequences A 1, …, A m –Amplisome is shortest common superstring of A 1, …, A m Amplisome Reconstruction Problem Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions

78 Reconstruction with Amplisomes (-,+) (+,-) (+,+) (-,-)

79 Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication (duples). Search in graph to select amplisome structure. Amplisome Reconstruction Problem

80 ESP Graph Black edges: (1)ES pairs. (2)Adjacent blocks in human genome. Red edges: blocks in human genome.

81 Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication (duples). Search in graph to select amplisome structure. Amplisome Reconstruction Problem

82 Problem: Find amplisome whose insertions best explain ESP data. Approach: Search in graph to select amplisome structure. Find shortest path containing subpaths between each pair of duple boundary elements (v i, w i ). Amplisome Reconstruction Problem w1w1 v1v1

83 Alternating Superpath Problem Given: –Edge colored ESP graph H (red/black edges). –Pairs of vertices (v 1, w 1 ), …, (v m, w m ) in H. Find: – Shortest alternating [red/black] path (cycle) A in H such that A contains an alternating path from v i to w i for each i = 1, …, m. (or for largest number of i)

84 Problem: Find amplisome whose insertions best explain ESP data. Approach: Search in graph to select amplisome structure. Find shortest path containing subpaths between each pair of duple boundary elements (v i, w i ). Amplisome Reconstruction Problem

85 Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A1

86 Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A2

87 Problem: Find amplisome whose insertions best explain ESP data. Approach: Represent ESP data as a graph. Identify sites of duplication. Search in graph to select amplisome structure. Amplisome Reconstruction Problem A3

88 33 clusters Total length: 31Mb 172013 Reconstructed MCF7 amplisome Chromosome colors Explains 24/33 invalid clusters. Raphael and Pevzner, 2004.

89 Sequencing Tumor Clones Confirms Complex Mosaic Structure

90 What’s Next? Human Genome Project, 2001 Mouse Genome Project, 2002 Rat Genome Project, 2003 Chicken Genome Project, 2004 Chimp Genome Project, 2005 ???

91 Tumor Genomes Projects Tumor genomeHuman genome 1)Identify recurrent aberrations 2)Identify temporal sequence of aberrations 3)Use these data for tumor diagnostics and therapeutics Mutation, selection Tumor genome 2 Tumor genome 4 Tumor genome 3

92 Current/Future Projects Unified model: Duplications and rearrangements. Combine ESP and array-CGH data. Annotation/experimental verification of genome/amplisome structure. Primary tumors: breast, brain, prostate, ovary…

93 Open Problems Analysis/algorithms for ESP Sorting Problem. Amplisome Reconstruction when allow insertions within insertions. Other models?

94 Acknowledgements Ray Brown University of California, San Diego Colin Collins Stas Volik Joe Gray Cancer Research Center University of California, San Francisco


Download ppt "Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,"

Similar presentations


Ads by Google