CSCI2950-C Lecture 9 Cancer Genomics

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Rearrangements and Duplications in Tumor Genomes.
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Physical Mapping II + Perl CIS 667 March 2, 2004.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Physical Mapping of DNA Shanna Terry March 2, 2004.
Todd J. Treangen, Steven L. Salzberg
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Sorting by Cuts, Joins and Whole Chromosome Duplications
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,
Chapter 21 Eukaryotic Genome Sequences
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Main Idea #4 Gene Expression is regulated by the cell, and mutations can affect this expression.
Identification of Copy Number Variants using Genome Graphs
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Genome Rearrangement By Ghada Badr Part I.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
DNA Sequencing Project
CSCI2950-C Genomes, Networks, and Cancer
CSCI2950-C Lecture 10 Cancer Genomics: Duplications
Genomes and Their Evolution
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
Identification of Multiple Complex Rearrangements Associated with Deletions in the 6q23-27 Region in Sézary Syndrome  Katarzyna Iżykowska, Mariola Zawada,
1 Department of Engineering, 2 Department of Mathematics,
Novel PMS2 Pseudogenes Can Conceal Recessive Mutations Causing a Distinctive Childhood Cancer Syndrome  Michel De Vos, Bruce E. Hayward, Susan Picton,
1 Department of Engineering, 2 Department of Mathematics,
High BCL6 expression predicts better prognosis, independent of BCL6 translocation status, translocation partner, or BCL6-deregulating mutations, in gastric.
1 Department of Engineering, 2 Department of Mathematics,
by David M. Weinstock, Beth Elliott, and Maria Jasin
Greedy Algorithms And Genome Rearrangements
Sequencing of t(2;7) Translocations Reveals a Consistent Breakpoint Linking CDK6 to the IGK Locus in Indolent B-Cell Neoplasia  Edward P.K. Parker, Reiner.
Diverse abnormalities manifest in RNA
CSCI2950-C Lecture 6 Genome Rearrangements and Duplications
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Introduction to Sequencing
Transposable Elements
A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders  Jennifer A. Lee, Claudia M.B. Carvalho, James.
Reciprocal Crossovers and a Positional Preference for Strand Exchange in Recombination Events Resulting in Deletion or Duplication of Chromosome 17p11.2 
Novel PMS2 Pseudogenes Can Conceal Recessive Mutations Causing a Distinctive Childhood Cancer Syndrome  Michel De Vos, Bruce E. Hayward, Susan Picton,
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
The MLPA assay and application to diagnosis of DGS
Presentation transcript:

CSCI2950-C Lecture 9 Cancer Genomics October 16, 2008 http://cs.brown.edu/courses/csci2950-c/

Outline Cancer Genomes Paired-end Sequencing Rearrangements Comparative Genomic Hybridization

Cell Division and Mutation Single nucleotide change A major contributor to the development of cancer are somatic mutations that occur during cell division Will focus on structural and later copy number, which is not to say that single are not as important. What is the effect of structural changes Copy number Structural

Rearrangements in Cancer 1) Change gene structure, create novel fusion genes Gleevec targets ABL-BCR fusion These rearrangements cause changes in gene structure…classic example is Philadelphia chrom….more subtle regulatory changes where a gene is placed in a different regulatory context. Here the myc cancer-promoting gene is activating by being put under control of an immunoglobulin gene. 2) Alter gene regulation Burkitt’s lymphoma IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD

Cancer Genomes Fusion gene in >50% prostate cancer patients Classically, discovered through cytogenetic techniques like chromosome painting shown here. More complicated in solid tumors. Don’t reveal detailed architecture, genes affected. Rearrangements thought to be mostly noise. Recent identification of new fusion gene. Can we use genome sequencing? Fusion gene in >50% prostate cancer patients (Tomlins et al. Science 2005)

Shotgun Sequencing Get one or two reads from each segment genomic segment cut many times at random (shotgun) No technology yet for reading a long strand of DNA in it’s entirety. Instead fragment. New tech. prodcuing shorter reads. Given that we have sequence of human genome, can detect rearrangements with less sequencing. Get one or two reads from each segment ~500 bp ~500 bp

Sequencing of Cancer Genomes etc. What to sequence from each tumor? Whole genome: all alterations Specific genes: point mutations Hybrid approach: structural rearrangements All alternations,…if you could assemble it

End Sequence Profiling (ESP) C. Collins and S. Volik (2003) Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). Map end sequences to human genome. Because of end sequencing protocol, clones have direction x y Human DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). Retain clones that correspond to a unique ES pair.

End Sequence Profiling (ESP) C. Collins and S. Volik (2003) Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). L Valid ES pairs Lmin ≤ y – x ≤ Lmax, min (max) size of clone. Convergent orientation. Map end sequences to human genome. Because of end sequencing protocol, clones have direction x y Human DNA

End Sequence Profiling (ESP) C. Collins and S. Volik (2003) Pieces of cancer genome: clones (100-250kb). Cancer DNA Sequence ends of clones (500bp). L Map end sequences to human genome. Because of end sequencing protocol, clones have direction. Some pairs cannot be mapped due to repeats in human genome. x y a b Human DNA Invalid ES pairs Putative rearrangement in cancer ES directions toward breakpoints (a,b): Lmin ≤ |x-a| + |y-b| ≤ Lmax a x y b

ESP of Normal Cell 2D Representation All ES pairs valid. x y Human DNA Lmin ≤ y – x ≤ Lmax 2D Representation Each point (x,y) is ES pair. Sometimes useful to have a 2D representation. All ES pairs near diagonal. Genome Coordinate Genome Coordinate

ESP of Tumor Cell Valid ES pairs satisfy length/direction constraints Lmin ≤ y – x ≤ Lmax Invalid ES pairs indicate rearrangements experimental errors First show how to deal with noise….

Clusters and Coverage Pieces of tumor genome: clones (100-250kb). Cancer DNA Pieces of tumor genome: clones (100-250kb). Rearrangement Chimeric clone Sequence ends of clones (500bp). Primary source of noise is chimeric clones joined from random concatenation of non-contigouous portions of cancer genome. Can deal w/ noise w/ clustering. Clusters also allow us to identify breakpoints more precisely Cluster invalid pairs Isolated invalid pair Map end sequences to human genome. y x Human DNA

Clusters x2 y2 a b x1 y1 Clone size: Lmin  (a – x1) + (b – y1)  Lmax Genome coordinate Lmax Clustering have a nice geometric interpretation – each clone is trapezoid. Intersection allows to localized shared breakpoint. Lmin (a,b) (a,b) (x1,y1) (x2,y2) Genome coordinate

Fusion Genes Gene 1 Gene 2 Human x y a b Tumor Can identify fusion genes by location of breakpoint.

Fusion Genes Intersection → probability of fusion gene x y a b Lmax Lmin (a,b) Genes give rectangle in 2D representation. Overlap w/ trapezoids. Used this rep. to compute how much sequencing needed to find fusion genes. (x1,y1) (x2,y2) Intersection → probability of fusion gene Respect direction of transcription Gene1 Gene2 Bashir, et al. (2008) PLOS Comp Biol.

Results: Fusion Gene in Breast Cancer BCAS3-BCAS4 Probability of Fusion = 1 Note: More precise sizing information available for some clones Bashir, et al. (2008) In Press.

ESP Data Coverage of human genome: ≈ 0.34 for MCF7, BT474 Sample End Sequenced Uniquely Mapped Invalid ES Pairs MCF7 19831 12143 491 BT474 9850 7547 186 SKBR3 9267 6950 187 Breast 9401 6540 164 7623 5381 113 Prostate 5013 3296 96 Ovary 5570 3714 87 Brain 4198 3051 67 Breast Cancer Cell Lines Tumors Fusion gene from MCF7 b.c. cell line. Have also applied techniqe to several other cell lines and tumor samples. Still do not have complete coverage of genome, but new sequencing technologies should make this more straightforward. Even w/ low coverage, still very interesting patterns in data. Coverage of human genome: ≈ 0.34 for MCF7, BT474 Raphael, et al. (2008)

Candidate Fusion Genes x y a b   Gene1 Gene2 Probability Fusion MCF7 ASTN2 PTPRG 1.000 BCAS4 BCAS3 KCND3 PPM1E NTNG1 BCAS1 0.996 BT474 NCOA2 ZNF704 ZFP64 PHACTR3 0.632 SKBR3 PTPRT PHF20 KCNQ3 RIMS2 0.933 Confirmed by clone sequencing 3 9 97kb PTPRG ASTN2 Sequenced Clone

Breakpoint Detection Detect a rearrangement breakpoint when clone includes breakpoint. Cancer Genome breakpoint ζ Normal Genome xC yC

Lander-Waterman Statistics Given: N clones of length L from a genome of size G P(ζ covered by clone) = 1 – (1 – L/G)N ≈1 – e-c, where c = N L / G is coverage P(breakpoint ζ detected) ≈1 – e-c c P(detection) 1 0.632 2 0.864 4 0.982 8 0.999

Cancer Genome Organization Can we do more and reconstruct this picture. Not just identify single rearrangements? What are detailed organization of cancer genomes? What sequence of rearrangements produce these architectures?

ESP Genome Reconstruction Problem Human genome (known) A B C D E Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Problem we call ESP G.R. Problem. x2 y2 x3 x4 y1 x5 y5 y4 y3 x1 Location of ES pairs in human genome. (known)

ESP Genome Reconstruction Problem Human genome (known) A B C D E Unknown sequence of rearrangements Tumor genome (unknown) -C -D E A B Map ES pairs to human genome. Reconstruct tumor genome x2 y2 x3 x4 y1 x5 y5 y4 y3 x1 Location of ES pairs in human genome. (known)

ESP Plot 2D Representation of ESP Data Each point is ES pair. (x3,y3) (x4,y4) D (x2,y2) 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? Human (x1,y1) C B A A B C D E Human

ESP Plot → Tumor Genome E E D -D Human C -C B B A A A B C D E Human Reconstructed Tumor Genome -C -D E A B

Real data noisy and incomplete! Valid ES pairs satisfy length/direction constraints Lmin ≤ y – x ≤ Lmax Invalid ES pairs indicate rearrangements experimental errors Clustering can help us with the noise. Injecting a bit of extra biological knowledge also helps noise and addresses incomplete data. Namely the rearrangements in genomes results from specific processes that cause particular rearrangements.

Computational Approach Use known genome rearrangement mechanisms s A t C -B B inversion Human Tumor s A t -B -C B D C translocation Appeal to parsimony. Find simplest explanation for ESP data, given these mechanisms. Motivation: Genome rearrangements studies in evolution/phylogeny.

ESP Sorting Problem Given: ES pairs (x1, y1), …, (xn, yn) Find: G = [0,M], unichromosomal genome. Inversion (Reversal) s,t A B C G x1 y1 x2 y2 x, if x < s or x > t, t – (x – s), otherwise. s t  s,t(x) = A -B C G’ = G x1 y1 x2 y2 s t Given: ES pairs (x1, y1), …, (xn, yn) Find: Minimum number of reversals s1,t1, …, sn, tn such that if  = s1,t1… sn, tn, then ( x1,  y1 ), …, ( xn,  yn) are valid ES pairs. Complexity of this problem unknown.

Sparse Data Assumptions Each cluster results from single inversion or translocation. human y1 x2 x3 y3 y2 x1 y1 x2 x3 y3 y2 x1 tumor 2. Each clone contains at most one breakpoint. tumor

ESP Genome Reconstruction: Discrete Approximation Human Remove isolated invalid pairs (x,y) We can get a discrete approximation of this problem as follows. First… Human

ESP Genome Reconstruction: Discrete Approximation Human Remove isolated invalid pairs (x,y) Define segments from clusters Human

ESP Genome Reconstruction: Discrete Approximation Human Remove isolated invalid pairs (x,y) Define segments from clusters ES Orientations define links between segment ends Human

ESP Genome Reconstruction: Discrete Approximation (x2, y2) (x3, y3) t (x1, y1) s Human Remove isolated invalid pairs (x,y) Define segments from clusters ES Orientations define links between segment ends Human

ESP Graph Edges: Human genome segments ES pairs 2 3 5 1 4 2 3 5 1 4 Edges: Human genome segments ES pairs Paths in graph are tumor genome architectures. Tumor Genome (1 -3 -4 2 5 ) Human (1 2 3 4 5) Minimal sequence* of translocations and inversions 2 3 5 1 4 *Hannenhalli-Pevzner theory

Sorting Permutations by Reversals (Sankoff et al.1990)  = 12…n signed permutation Reversal (i,j) [inversion] 1…i-1 -j ... -i j+1…n Problem: Given , find a sequence of reversals 1, …, t with such that:  . 1 . 2 … t = (1, 2, …, n) and t is minimal. Solution: Analysis of breakpoint graph ← ESP graph Polynomial time algorithms O(n4) : Hannenhalli and Pevzner, 1995. O(n2) : Kaplan, Shamir, Tarjan, 1997. O(n) [distance t] : Bader, Moret, and Yan, 2001. O(n3) : Bergeron, 2001.

Sorting Permutations  1 -3 -4 2 5 -3 -2 4 5 1 1 2 3 4 5

Breakpoint Graph Black edges: adjacent elements of   1 -3 -4 2 5 start 1 -3 -4 2 5 end Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles

Breakpoint Graph ESP Graph → Tumor Permutation and Breakpoint Graph Black edges: adjacent elements of   start 1 -3 -4 2 5 end start -3 -2 4 5 1 end Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform  to identity permutation i is: d() ≥ n+1 - c() where c() = number of gray-black cycles.

Breakpoint Graph ESP Graph → Tumor Permutation and Breakpoint Graph Black edges: adjacent elements of   start 1 -3 -4 2 5 end start -3 -2 4 5 1 end Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform  to identity permutation i is: d() = n+1 - c() + h() + f() where c() = number of gray-black cycles.

Multichromosomal Sorting Concatenate chromosomes Translocations modeled by reversals in concatenate Minimal sequence in polynomial time (Hannenhalli & Pevzner 1996, Tesler 2003, Ozery-Flato and Shamir, 2003.) A1 A2 A1 B2 translocation B1 B2 B1 A2 concatenation concatenation reversal A1 A2 -B2 -B1 A1 B2 -A2 -B1

MCF7 Breast Cancer Cell Line Applied this technique to MCF7.

MCF7 Breast Cancer Cell Line Can be many minimal sequences. Some events are early in many minimal sequences. Left out an important event: duplications. Sequence Human chromosomes MCF7 chromosomes 5 inversions 15 translocations Raphael, et al. (2003) Bioinformatics

What about duplications? Clusters were biased to a very small region of the genome. These regions are duplicated – can be measured as such independently. Take a duplication centric view 11240 ES pairs 10453 valid (black) 737 invalid 489 isolated (red) 248 form 70 clusters (blue) 33/70 clusters Total length: 31Mb