Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI2950-C Lecture 9 Cancer Genomics

Similar presentations


Presentation on theme: "CSCI2950-C Lecture 9 Cancer Genomics"— Presentation transcript:

1 CSCI2950-C Lecture 9 Cancer Genomics
October 16, 2008

2 Outline Cancer Genomes Paired-end Sequencing Rearrangements
Comparative Genomic Hybridization

3 Cell Division and Mutation
Single nucleotide change A major contributor to the development of cancer are somatic mutations that occur during cell division Will focus on structural and later copy number, which is not to say that single are not as important. What is the effect of structural changes Copy number Structural

4 Rearrangements in Cancer
1) Change gene structure, create novel fusion genes Gleevec targets ABL-BCR fusion These rearrangements cause changes in gene structure…classic example is Philadelphia chrom….more subtle regulatory changes where a gene is placed in a different regulatory context. Here the myc cancer-promoting gene is activating by being put under control of an immunoglobulin gene. 2) Alter gene regulation Burkitt’s lymphoma IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD

5 Cancer Genomes Fusion gene in >50% prostate cancer patients
Classically, discovered through cytogenetic techniques like chromosome painting shown here. More complicated in solid tumors. Don’t reveal detailed architecture, genes affected. Rearrangements thought to be mostly noise. Recent identification of new fusion gene. Can we use genome sequencing? Fusion gene in >50% prostate cancer patients (Tomlins et al. Science 2005)

6 Shotgun Sequencing Get one or two reads from each segment
genomic segment cut many times at random (shotgun) No technology yet for reading a long strand of DNA in it’s entirety. Instead fragment. New tech. prodcuing shorter reads. Given that we have sequence of human genome, can detect rearrangements with less sequencing. Get one or two reads from each segment ~500 bp ~500 bp

7 Sequencing of Cancer Genomes
etc. What to sequence from each tumor? Whole genome: all alterations Specific genes: point mutations Hybrid approach: structural rearrangements All alternations,…if you could assemble it

8 End Sequence Profiling (ESP) C. Collins and S. Volik (2003)
Pieces of cancer genome: clones ( kb). Cancer DNA Sequence ends of clones (500bp). Map end sequences to human genome. Because of end sequencing protocol, clones have direction x y Human DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). Retain clones that correspond to a unique ES pair.

9 End Sequence Profiling (ESP) C. Collins and S. Volik (2003)
Pieces of cancer genome: clones ( kb). Cancer DNA Sequence ends of clones (500bp). L Valid ES pairs Lmin ≤ y – x ≤ Lmax, min (max) size of clone. Convergent orientation. Map end sequences to human genome. Because of end sequencing protocol, clones have direction x y Human DNA

10 End Sequence Profiling (ESP) C. Collins and S. Volik (2003)
Pieces of cancer genome: clones ( kb). Cancer DNA Sequence ends of clones (500bp). L Map end sequences to human genome. Because of end sequencing protocol, clones have direction. Some pairs cannot be mapped due to repeats in human genome. x y a b Human DNA Invalid ES pairs Putative rearrangement in cancer ES directions toward breakpoints (a,b): Lmin ≤ |x-a| + |y-b| ≤ Lmax a x y b

11 ESP of Normal Cell 2D Representation All ES pairs valid. x y Human DNA
Lmin ≤ y – x ≤ Lmax 2D Representation Each point (x,y) is ES pair. Sometimes useful to have a 2D representation. All ES pairs near diagonal. Genome Coordinate Genome Coordinate

12 ESP of Tumor Cell Valid ES pairs satisfy length/direction constraints
Lmin ≤ y – x ≤ Lmax Invalid ES pairs indicate rearrangements experimental errors First show how to deal with noise….

13 Clusters and Coverage Pieces of tumor genome: clones (100-250kb).
Cancer DNA Pieces of tumor genome: clones ( kb). Rearrangement Chimeric clone Sequence ends of clones (500bp). Primary source of noise is chimeric clones joined from random concatenation of non-contigouous portions of cancer genome. Can deal w/ noise w/ clustering. Clusters also allow us to identify breakpoints more precisely Cluster invalid pairs Isolated invalid pair Map end sequences to human genome. y x Human DNA

14 Clusters x2 y2 a b x1 y1 Clone size: Lmin  (a – x1) + (b – y1)  Lmax
Genome coordinate Lmax Clustering have a nice geometric interpretation – each clone is trapezoid. Intersection allows to localized shared breakpoint. Lmin (a,b) (a,b) (x1,y1) (x2,y2) Genome coordinate

15 Fusion Genes Gene 1 Gene 2 Human x y a b Tumor
Can identify fusion genes by location of breakpoint.

16 Fusion Genes Intersection → probability of fusion gene
x y a b Lmax Lmin (a,b) Genes give rectangle in 2D representation. Overlap w/ trapezoids. Used this rep. to compute how much sequencing needed to find fusion genes. (x1,y1) (x2,y2) Intersection → probability of fusion gene Respect direction of transcription Gene1 Gene2 Bashir, et al. (2008) PLOS Comp Biol.

17 Results: Fusion Gene in Breast Cancer BCAS3-BCAS4
Probability of Fusion = 1 Note: More precise sizing information available for some clones Bashir, et al. (2008) In Press.

18 ESP Data Coverage of human genome: ≈ 0.34 for MCF7, BT474
Sample End Sequenced Uniquely Mapped Invalid ES Pairs MCF7 19831 12143 491 BT474 9850 7547 186 SKBR3 9267 6950 187 Breast 9401 6540 164 7623 5381 113 Prostate 5013 3296 96 Ovary 5570 3714 87 Brain 4198 3051 67 Breast Cancer Cell Lines Tumors Fusion gene from MCF7 b.c. cell line. Have also applied techniqe to several other cell lines and tumor samples. Still do not have complete coverage of genome, but new sequencing technologies should make this more straightforward. Even w/ low coverage, still very interesting patterns in data. Coverage of human genome: ≈ 0.34 for MCF7, BT474 Raphael, et al. (2008)

19 Candidate Fusion Genes
x y a b Gene1 Gene2 Probability Fusion MCF7 ASTN2 PTPRG 1.000 BCAS4 BCAS3 KCND3 PPM1E NTNG1 BCAS1 0.996 BT474 NCOA2 ZNF704 ZFP64 PHACTR3 0.632 SKBR3 PTPRT PHF20 KCNQ3 RIMS2 0.933 Confirmed by clone sequencing 3 9 97kb PTPRG ASTN2 Sequenced Clone

20 Breakpoint Detection Detect a rearrangement breakpoint when clone includes breakpoint. Cancer Genome breakpoint ζ Normal Genome xC yC

21 Lander-Waterman Statistics
Given: N clones of length L from a genome of size G P(ζ covered by clone) = 1 – (1 – L/G)N ≈1 – e-c, where c = N L / G is coverage P(breakpoint ζ detected) ≈1 – e-c c P(detection) 1 0.632 2 0.864 4 0.982 8 0.999

22 Cancer Genome Organization
Can we do more and reconstruct this picture. Not just identify single rearrangements? What are detailed organization of cancer genomes? What sequence of rearrangements produce these architectures?

23 ESP Genome Reconstruction Problem
Human genome (known) A B C D E Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Problem we call ESP G.R. Problem. x2 y2 x3 x4 y1 x5 y5 y4 y3 x1 Location of ES pairs in human genome. (known)

24 ESP Genome Reconstruction Problem
Human genome (known) A B C D E Unknown sequence of rearrangements Tumor genome (unknown) -C -D E A B Map ES pairs to human genome. Reconstruct tumor genome x2 y2 x3 x4 y1 x5 y5 y4 y3 x1 Location of ES pairs in human genome. (known)

25 ESP Plot 2D Representation of ESP Data Each point is ES pair.
(x3,y3) (x4,y4) D (x2,y2) 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? Human (x1,y1) C B A A B C D E Human

26 ESP Plot → Tumor Genome E E D -D Human C -C B B A A A B C D E Human
Reconstructed Tumor Genome -C -D E A B

27 Real data noisy and incomplete!
Valid ES pairs satisfy length/direction constraints Lmin ≤ y – x ≤ Lmax Invalid ES pairs indicate rearrangements experimental errors Clustering can help us with the noise. Injecting a bit of extra biological knowledge also helps noise and addresses incomplete data. Namely the rearrangements in genomes results from specific processes that cause particular rearrangements.

28 Computational Approach
Use known genome rearrangement mechanisms s A t C -B B inversion Human Tumor s A t -B -C B D C translocation Appeal to parsimony. Find simplest explanation for ESP data, given these mechanisms. Motivation: Genome rearrangements studies in evolution/phylogeny.

29 ESP Sorting Problem Given: ES pairs (x1, y1), …, (xn, yn) Find:
G = [0,M], unichromosomal genome. Inversion (Reversal) s,t A B C G x1 y1 x2 y2 x, if x < s or x > t, t – (x – s), otherwise. s t s,t(x) = A -B C G’ = G x1 y1 x2 y2 s t Given: ES pairs (x1, y1), …, (xn, yn) Find: Minimum number of reversals s1,t1, …, sn, tn such that if  = s1,t1… sn, tn, then ( x1,  y1 ), …, ( xn,  yn) are valid ES pairs. Complexity of this problem unknown.

30 Sparse Data Assumptions
Each cluster results from single inversion or translocation. human y1 x2 x3 y3 y2 x1 y1 x2 x3 y3 y2 x1 tumor 2. Each clone contains at most one breakpoint. tumor

31 ESP Genome Reconstruction: Discrete Approximation
Human Remove isolated invalid pairs (x,y) We can get a discrete approximation of this problem as follows. First… Human

32 ESP Genome Reconstruction: Discrete Approximation
Human Remove isolated invalid pairs (x,y) Define segments from clusters Human

33 ESP Genome Reconstruction: Discrete Approximation
Human Remove isolated invalid pairs (x,y) Define segments from clusters ES Orientations define links between segment ends Human

34 ESP Genome Reconstruction: Discrete Approximation
(x2, y2) (x3, y3) t (x1, y1) s Human Remove isolated invalid pairs (x,y) Define segments from clusters ES Orientations define links between segment ends Human

35 ESP Graph Edges: Human genome segments ES pairs
2 3 5 1 4 2 3 5 1 4 Edges: Human genome segments ES pairs Paths in graph are tumor genome architectures. Tumor Genome ( ) Human ( ) Minimal sequence* of translocations and inversions 2 3 5 1 4 *Hannenhalli-Pevzner theory

36 Sorting Permutations by Reversals
(Sankoff et al.1990)  = 12…n signed permutation Reversal (i,j) [inversion] 1…i-1 -j ... -i j+1…n Problem: Given , find a sequence of reversals 1, …, t with such that:  . 1 . 2 … t = (1, 2, …, n) and t is minimal. Solution: Analysis of breakpoint graph ← ESP graph Polynomial time algorithms O(n4) : Hannenhalli and Pevzner, O(n2) : Kaplan, Shamir, Tarjan, 1997. O(n) [distance t] : Bader, Moret, and Yan, O(n3) : Bergeron, 2001.

37 Sorting Permutations 1 -3 -4 2 5 -3 -2 4 5 1 1 2 3 4 5

38 Breakpoint Graph Black edges: adjacent elements of   1 -3 -4 2 5
start 1 -3 -4 2 5 end Gray edges: adjacent elements of i = 1 2 3 4 5 start end Key parameter: Black-gray cycles

39 Breakpoint Graph ESP Graph → Tumor Permutation and Breakpoint Graph
Black edges: adjacent elements of  start 1 -3 -4 2 5 end start -3 -2 4 5 1 end Gray edges: adjacent elements of i = 1 2 3 4 5 start end Key parameter: Black-gray cycles ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform  to identity permutation i is: d() ≥ n+1 - c() where c() = number of gray-black cycles.

40 Breakpoint Graph ESP Graph → Tumor Permutation and Breakpoint Graph
Black edges: adjacent elements of  start 1 -3 -4 2 5 end start -3 -2 4 5 1 end Gray edges: adjacent elements of i = 1 2 3 4 5 start end ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform  to identity permutation i is: d() = n+1 - c() + h() + f() where c() = number of gray-black cycles.

41 Multichromosomal Sorting
Concatenate chromosomes Translocations modeled by reversals in concatenate Minimal sequence in polynomial time (Hannenhalli & Pevzner 1996, Tesler 2003, Ozery-Flato and Shamir, 2003.) A1 A2 A1 B2 translocation B1 B2 B1 A2 concatenation concatenation reversal A1 A2 -B2 -B1 A1 B2 -A2 -B1

42 MCF7 Breast Cancer Cell Line
Applied this technique to MCF7.

43 MCF7 Breast Cancer Cell Line
Can be many minimal sequences. Some events are early in many minimal sequences. Left out an important event: duplications. Sequence Human chromosomes MCF7 chromosomes 5 inversions 15 translocations Raphael, et al. (2003) Bioinformatics

44 What about duplications?
Clusters were biased to a very small region of the genome. These regions are duplicated – can be measured as such independently. Take a duplication centric view 11240 ES pairs 10453 valid (black) 737 invalid 489 isolated (red) 248 form 70 clusters (blue) 33/70 clusters Total length: 31Mb


Download ppt "CSCI2950-C Lecture 9 Cancer Genomics"

Similar presentations


Ads by Google