Rearrangements and Duplications in Tumor Genomes.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Comparative genomic hybridization (CGH) is a technique for studying chromosomal changes in cancer. As cancerous cells multiply, they can undergo dramatic.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Greedy Algorithms And Genome Rearrangements
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Sorting by Cuts, Joins and Whole Chromosome Duplications
1 A Robust Framework for Detecting Structural Variations February 6, 2008 Seunghak Lee 1, Elango Cheran 1, and Michael Brudno 1 1 University of Toronto,
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,
Nature Genetics Vol.36 Sept 2004 Detection of Large-scale Variation In the Human Genome Iafrate, Feuk, Rivera, Listewnik, Donahoe, Qi, Scherer, Lee any.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Main Idea #4 Gene Expression is regulated by the cell, and mutations can affect this expression.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Genome Rearrangement By Ghada Badr Part I.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Canadian Bioinformatics Workshops
Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
CSCI2950-C Genomes, Networks, and Cancer
Canadian Bioinformatics Workshops
DNA Sequencing Project
CSCI2950-C Lecture 9 Cancer Genomics
CSCI2950-C Genomes, Networks, and Cancer
CSCI2950-C Lecture 10 Cancer Genomics: Duplications
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
Identification of Multiple Complex Rearrangements Associated with Deletions in the 6q23-27 Region in Sézary Syndrome  Katarzyna Iżykowska, Mariola Zawada,
Linking Genetic Variation to Important Phenotypes
Greedy Algorithms And Genome Rearrangements
CSCI2950-C Lecture 3 September 13, 2007.
Genomic alterations in breast cancer cell line MDA-MB-231.
Diverse abnormalities manifest in RNA
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Identification and characterization of a novel KRAS rearrangement in metastatic prostate cancer. Identification and characterization of a novel KRAS rearrangement.
Canadian Bioinformatics Workshops
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Rearrangements and Duplications in Tumor Genomes

Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions.

Rearrangements in Tumors Change gene structure, create novel fusion genes Gleevec (Novartis 2001) targets ABL-BCR fusion

Rearrangements in Tumors Alter gene regulation Burkitt lymphoma translocation IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA Regulatory fusion in prostate cancer (Tomlins et al.Science Oct. 2005)

Complex Tumor Genomes 1)What are detailed architectures of tumor genomes? 2)What genes affected? 3)What processes produce these architectures? 4)Can we create custom treatments for tumors based on mutational spectrum? (e.g. Gleevec)

Common Alterations across Tumors Mutations activate/repress circuits. Multiple points of attack. “Master genes”: e.g. p53, Myc. Others probably tissue/tumor specific. repression activation Duplicated genes Deleted genes

Human Cancer Genome Project What tumors to sequence? What to sequence from each tumor? 1.Whole genome: all alterations 2.Specific genes: point mutations 3.Hybrid approach: structural rearrangements etc.

Human Cancer Genome Project What tumors to sequence? What to sequence from each tumor? 1.Whole genome: all alterations 2.Specific genes: point mutations 3.Hybrid approach: structural rearrangements etc.

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). Retain clones that correspond to a unique ES pair. yx

Valid ES pairs l ≤ y – x ≤ L, min (max) size of clone. Convergent orientation. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA yx L

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA yx Invalid ES pairs Putative rearrangement in tumor ES directions toward breakpoints L

Outline What does ESP reveal about tumor genomes? 1.Identify locations of rearrangements. 2.Reconstruct genome architecture, sequence of rearrangements. 3. In combination with other genome data (CGH).

ESP Data (Jan. 2006) Coverage of human genome: ≈ 0.34 for MCF7, BT474 ES pairs Clones BT474 MCF7 SKBR3 Normal Brain Breast1 Breast2 Ovary Prostate Breast Cancer Cell Lines Tumors

1. Rearrangement breakpoints Known cancer genes (e.g. ZNF217, BCAS3/4, STAT3) Novel candidates near breakpoints. MCF7 breast cancer Small-scale scrambling of genome more extensive than expected.

Structural Polymorphisms Human genetic variation more than nucleotide substitutions Short indels/inversions present (Iafrate et al. 2004, Sebat et al. 2004, Tuzun et al. 2005, McCarroll et al. 2006, Conrad et al etc.) ≈ 3% (53/1570) invalid ES pairs explained by known structural variants. s 1.6 Mb inversion s A t C-B inversion Human Variant ACB Reference Human t

2. Tumor Genome Architecture 1)What are detailed architectures of tumor genomes? 2)What sequence of rearrangements produce these architectures?

Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 ESP Genome Reconstruction Problem Reconstruct tumor genome

Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 ESP Genome Reconstruction Problem Reconstruct tumor genome

-C -D E A B -C-DEAB Tumor Human ESP Genome Reconstruction: Comparative Genomics BCEAD Tumor

BCEAD -C -D E A B Tumor Human ESP Genome Reconstruction: Comparative Genomics

BCEAD -C -D E A B Tumor Human ESP Genome Reconstruction: Comparative Genomics

BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 ESP Genome Reconstruction: Comparative Genomics

B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human

B C E A D BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? ESP Plot

B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome ESP Plot → Tumor Genome

B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

Human 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs?

Real data noisy and incomplete! Valid ES pairs satisfy length/direction constraints l ≤ y – x ≤ L Invalid ES pairs indicate rearrangements experimental errors

Computational Approach 2.Find simplest explanation for ESP data, given these mechanisms. 3.Motivation: Genome rearrangements studies in phylogeny. 1.Use known genome rearrangement mechanisms s A t C-B s A t CB inversion HumanTumor s A t -B s A t -CBDCD translocation

G = [0,M], unichromosomal genome. Reversal  s,t (x)= x, if x t, t – (x – s), otherwise. Given: ES pairs (x 1, y 1 ), …, (x n, y n ) Find: Minimum number of reversals  s1,t1, …,  sn, tn such that if  =  s1,t1 …  sn, tn then (  x 1,  y 1 ), …, (  x n,  y n ) are valid ES pairs. x1x1 y1y1 G ’ =  G x1x1 y1y1 G BCA -BA x2x2 y2y2 x2x2 y2y2  t s ESP Sorting Problem

All ES pairs valid. t s Sequence of reversals. st x1x1 y1y1 x1x1 y1y1 BCA -C -B A y3y3 x3x3 y2y2 y3y3  t s x3x3 x2x2 y2y2 x2x2

Filtering Experimental Noise 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Rearrangement Cluster invalid pairs Chimeric clone Isolated invalid pair y x

Sparse Data Assumptions tumor 1.Each cluster results from single inversion. 2. Each clone contains at most one breakpoint. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor

Human ESP Genome Reconstruction: Discrete Approximation 1)Remove isolated invalid pairs (x,y)

Human 2)Define segments from clusters ESP Genome Reconstruction: Discrete Approximation 1)Remove isolated invalid pairs (x,y)

Human 3)ES Orientations define links between segment ends ESP Genome Reconstruction: Discrete Approximation 2)Define segments from clusters 1)Remove isolated invalid pairs (x,y)

Human ESP Genome Reconstruction: Discrete Approximation (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 ) t s 3)ES Orientations define links between segment ends 2)Define segments from clusters 1)Remove isolated invalid pairs (x,y)

ESP Graph Tumor genome ( ) = signed permutation of ( ) Paths in graph are tumor genome architectures. Edges: 1.Human genome segments 2.ES pairs

(Sankoff et al.1990) Sorting permutations by reversals Polynomial time algorithms O(n 4 ) : Hannenhalli and Pevzner, O(n 2 ) : Kaplan, Shamir, Tarjan, O(n) [distance t] : Bader, Moret, and Yan, O(n 3 ) : Bergeron, Reversal  (i,j) [inversion]  =  1  2 …  n signed permutation Problem: Given , find a sequence of reversals  1, …,  t with such that:  ¢  1 ¢  2 ¢ ¢ ¢  t = (1, 2, …, n) and t is minimal.  1 …  i-1 -  j... -  i  j+1 …  n Solution: Analysis of breakpoint graph ← ESP graph

Sorting Permutations 

Breakpoint Graph end  start Black edges: adjacent elements of  end Gray edges: adjacent elements of i = Key parameter: Black-gray cycles

Breakpoint Graph end  start Theorem: Minimum number of reversals to transform  to identity permutation i is: d(  ) ≥ n+1 - c(  ) where c(  ) = number of gray-black cycles. Black edges: adjacent elements of  end start end Gray edges: adjacent elements of i = ESP Graph → Tumor Permutation and Breakpoint Graph Key parameter: Black-gray cycles

MCF7 Breast Cancer Cell Line Low-resolution chromosome painting suggests complex architecture. Many translocations, inversions.

ESP Data from MCF7 tumor genome Each point (x,y) is ES pair. Coordinate in human genome 6239 ES pairs (June 2003) 5856 valid (black) 383 invalid 256 isolated (red) 127 form 30 clusters (blue)

MCF7 Genome Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael, Volik, Collins, Pevzner. Bioinformatics Sequence of

Array Comparative Genomic Hybridization (aCGH) 3. Combining ESP with other genome data

CGH Analysis Divide genome into segments of equal copy number Copy number profile Copy number Genome coordinate

CGH Analysis Divide genome into segments of equal copy number Copy number profile Numerous methods (e.g. clustering, Hidden Markov Model, Bayesian, etc.) Segmentation No information about: Structural rearrangements (inversions, translocations) Locations of duplicated material in tumor genome. Copy number Genome coordinate

CGH Segmentation How are the copies of segments linked??? Copy number Genome Coordinate Tumor genome ES pairs links segments

ESP + CGH ES near segment boundaries Copy number Genome Coordinate CGH breakpoint ESP breakpoint

ESP and CGH Breakpoints BT474 MCF7 ESP breakpoints CGH breakpoints 33 (P = 5.4 x ) (P = 1.2 x ) 730 ESP breakpoints CGH breakpoints /39 clusters 8/33 clusters

Microdeletion in BT Copy number ES pair ≈ 600kb Valid ES pair < 250kb “interesting” genes in this region

Combining ESP and CGH ES pairs links segments. Copy number balance at each segment boundary: 5 = Copy number Genome Coordinate 3 2 5

Combining ESP and CGH CGH copy number not exact. What genome architecture “most consistent” with ESP and CGH data? Copy number Genome Coordinate ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 3 1 ≤ f(e) ≤ 4

Combining ESP and CGH Copy number Genome Coordinate Edge for each CGH segment. 2.Edge for each ES pair consistent with segments. 3.Range of copy number values for each CGH edge. Build graph 3 ≤ f(e) ≤ 51 ≤ f(e) ≤ 31 ≤ f(e) ≤ 4

Network Flow Problem Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 f(e) Flow constraint on each CGH edge l(e) ≤ f(e) ≤ u(e) 8 e

Network Flow Problem Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 f(e) Flow in = flow out at each vertex  (u,v) f( (u,v) ) =  (v,w) f( v,w) ) 8 v l(e) ≤ f(e) ≤ u(e) 8 e

Network Flow Problem Minimum Cost Circulation with Capacity Constraints (Sequencing by Hybridization, Sequence Assembly) Source/sink min  e  (e) Subject to: Costs:  (e) = 0, e ESP or CGH edge 1, e incident to source/sink f(e)  (u,v) f( (u,v) ) =  (v,w) f( v,w) ) 8 v l(e) ≤ f(e) ≤ u(e) 8 e Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1

Network Flow Results Unsatisfied flow are putative locations of missing ESP data. Prioritize further sequencing. Source/sink f(e) Targeted ESP by screening library with CGH probes.

Network Flow Results Identify amplified translocations –14 in MCF7 –5 in BT474 Eulerian cycle in combined graph gives tumor genome architecture. Flow values → Edge multiplicities

Human Cancer Genome Project What tumors to sequence? What to sequence from each tumor? 1.Whole genome: all alterations 2.Specific genes: point mutations 3.Hybrid approach: structural rearrangements etc.

Human Cancer Genome Project