Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Comparative genomic hybridization (CGH) is a technique for studying chromosomal changes in cancer. As cancerous cells multiply, they can undergo dramatic.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
Rearrangements and Duplications in Tumor Genomes.
Immune profiling with high-throughput sequencing Harlan Robins 1,2 Cindy Desmarais 2, Chris Carlson 1,2 Fred Hutchinson Cancer Research Center, Seattle,
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Greedy Algorithms And Genome Rearrangements
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Physical Mapping II + Perl CIS 667 March 2, 2004.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Large Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization Devin P. Locke, Richard Segraves, Lucia Carbone,
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Sorting by Cuts, Joins and Whole Chromosome Duplications
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Incidentor coloring: methods and results A.V. Pyatkin "Graph Theory and Interactions" Durham, 2013.
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Identification of Copy Number Variants using Genome Graphs
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Genome Rearrangement By Ghada Badr Part I.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
DNA Sequencing Project
CSCI2950-C Lecture 9 Cancer Genomics
CSCI2950-C Genomes, Networks, and Cancer
CSCI2950-C Lecture 10 Cancer Genomics: Duplications
CS223 Advanced Data Structures and Algorithms
Lecture 3: Genome Rearrangements and Duplications
Identification of Multiple Complex Rearrangements Associated with Deletions in the 6q23-27 Region in Sézary Syndrome  Katarzyna Iżykowska, Mariola Zawada,
Harvey A. Greisman, Noah G. Hoffman, Hye Son Yi 
Resolving the Breakpoints of the 17q21
by David M. Weinstock, Beth Elliott, and Maria Jasin
Greedy Algorithms And Genome Rearrangements
CSCI2950-C Lecture 3 September 13, 2007.
R. Johnsonbaugh Discrete Mathematics 5th edition, 2001
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Graphs G = (V, E) V are the vertices; E are the edges.
A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders  Jennifer A. Lee, Claudia M.B. Carvalho, James.
Identification and characterization of a novel KRAS rearrangement in metastatic prostate cancer. Identification and characterization of a novel KRAS rearrangement.
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Next-Generation Sequencing of Duplication CNVs Reveals that Most Are Tandem and Some Create Fusion Genes at Breakpoints  Scott Newman, Karen E. Hermetz,
Presentation transcript:

Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions.

Rearrangements in Tumors Change gene structure, create novel fusion genes Gleevec (Novartis 2001) targets ABL-BCR fusion

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). Retain clones that correspond to a unique ES pair. yx

Valid ES pairs l ≤ y – x ≤ L, min (max) size of clone. Convergent orientation. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA yx L

End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA y a Invalid ES pairs Putative rearrangement in tumor ES directions toward breakpoints (a,b): l ≤ |x-a| + |y-b| ≤ L L b x

Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 ESP Genome Reconstruction Problem Reconstruct tumor genome

Human genome (known) Tumor genome (unknown) Unknown sequence of rearrangements Location of ES pairs in human genome. (known) Map ES pairs to human genome. -C -D EA B B CEA D x2x2 y2y2 x3x3 x4x4 y1y1 x5x5 y5y5 y4y4 y3y3 x1x1 ESP Genome Reconstruction Problem Reconstruct tumor genome

-C -D E A B -C-DEAB Tumor Human ESP Genome Reconstruction: Comparative Genomics BCEAD Tumor

BCEAD -C -D E A B Tumor Human ESP Genome Reconstruction: Comparative Genomics

BCEAD -C -D E A B Tumor Human ESP Genome Reconstruction: Comparative Genomics

BCEAD -C -D E A B Tumor (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) y 4 y 3 x 1 x 2 x 3 x 4 y 1 y 2 ESP Genome Reconstruction: Comparative Genomics

B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? (x 2,y 2 ) (x 3,y 3 ) (x 4,y 4 ) (x 1,y 1 ) ESP Plot Human

B C E A D BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the positions of the ES pairs? ESP Plot

B C E A D Human B -D E A DAC E -C B -D EA B Reconstructed Tumor Genome ESP Plot → Tumor Genome

B C E A D Human BCEAD 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the ES pairs?

Human 2D Representation of ESP Data Each point is ES pair. Can we reconstruct the tumor genome from the ES pairs?

Real data noisy and incomplete! Valid ES pairs satisfy length/direction constraints l ≤ y – x ≤ L Invalid ES pairs indicate rearrangements experimental errors

Computational Approach 2.Find simplest explanation for ESP data, given these mechanisms. 3.Motivation: Genome rearrangements studies in evolution/phylogeny. 1.Use known genome rearrangement mechanisms s A t C-B s A t CB inversion HumanTumor s A t -B s A t -CBDCD translocation

 s,t (x) = Given: ES pairs (x 1, y 1 ), …, (x n, y n ) Find: Minimum number of reversals  s1,t1, …,  sn, tn such that if  =  s1,t1 …  sn, tn, then (  x 1,  y 1 ), …, (  x n,  y n ) are valid ES pairs. G ’ =  G G  ESP Sorting Problem s A t C -B s A t B C x1x1 x2x2 y2y2 y1y1 x1x1 x2x2 y2y2 y1y1 G = [0,M], unichromosomal genome. Inversion (Reversal)  s,t x, if x t, t – (x – s), otherwise.

Filtering Experimental Noise 1)Pieces of tumor genome: clones ( kb). Human DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. Tumor DNA Rearrangement Cluster invalid pairs Chimeric clone Isolated invalid pair y x

Sparse Data Assumptions tumor 1.Each cluster results from single inversion. 2. Each clone contains at most one breakpoint. human y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 y1y1 x2x2 x3x3 y3y3 y2y2 x1x1 tumor

Human ESP Genome Reconstruction: Discrete Approximation 1)Remove isolated invalid pairs (x,y)

Human 2)Define segments from clusters ESP Genome Reconstruction: Discrete Approximation 1)Remove isolated invalid pairs (x,y)

Human 3)ES Orientations define links between segment ends ESP Genome Reconstruction: Discrete Approximation 2)Define segments from clusters 1)Remove isolated invalid pairs (x,y)

Human ESP Genome Reconstruction: Discrete Approximation (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 ) t s 3)ES Orientations define links between segment ends 2)Define segments from clusters 1)Remove isolated invalid pairs (x,y)

ESP Graph Paths in graph are tumor genome architectures. Edges: 1.Human genome segments 2.ES pairs Tumor Genome ( ) Human Genome ( ) Minimal sequence of translocations and inversions

Breakpoint Graph end  start Theorem: Minimum number of reversals to transform  to identity permutation i is: d(  ) ≥ n+1 - c(  ) where c(  ) = number of gray-black cycles. Black edges: adjacent elements of  end start end Gray edges: adjacent elements of i = ESP Graph → Tumor Permutation and Breakpoint Graph Key parameter: Black-gray cycles

MCF7 Breast Cancer Cell Line Low-resolution chromosome painting suggests complex architecture. Many translocations, inversions.

MCF7 Genome Human chromosomesMCF7 chromosomes 5 inversions 15 translocations Raphael, et al. Bioinformatics Sequence

3. Rearrangement/duplication mechanisms Does ESP suggest mechanisms that scramble tumor genomes?

33/70 clusters Total length: 31Mb Another look at MCF ES pairs valid (black) 737 invalid 489 isolated (red) 248 form 70 clusters (blue)

Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994) Tumor genome

Structure of Duplications in Tumors? Mechanisms not well understood. Human genome Tumor genome Duplicated segments may co-localize (Guan et al. Nat.Gen.1994)

Analyzing Duplications duplication u AB w CD v E u A B w DCD v E u AB w C v D ???? HumanTumor

Analyzing Duplications duplication u AB w CD v E u A B w DCD v E u AB w C v D HumanTumor u ABCD ??

Analyzing Duplications co-duplication u AB w CD v E u A B w DCD v E u AB w C v D HumanTumor u ABCD Additional ES pair resolves duplication duplication

Duples and Boundary Elements duplication u AB w CD v E u A B w DCD v E u AB w C v D HumanTumor Call this configuration a duple with boundary elements v and w. u ABCD

Duplications in ESP graph u AB w CD v E duplication duple boundary elements v,w are vertices in ESP graph v w u A B C D E u A B w DCD v E

Duplications in ESP graph u AB u A B w DCD v E w CD v E duplication Path between boundary elements resolves duple. v u A B C D E w duple boundary elements v,w are vertices in ESP graph

v w u Duplication Complications u AB w C v E ???? These configurations frequent in MCF7 data.

u Resolving Duplication as Paths u AB u AB wv ECD Path between boundary elements resolves duple. v w

v w u Resolving Duplications as Paths Multiple paths between duple boundary elements. u AB u AB w C v E

Many Paths in MCF7!

Tumor Amplisomes (Maurer, et al. 1987; Wahl, 1989…) Other terms: Episome Amplicon Double-minute

Duplication by Amplisome Gives single model for all duplications

Amplisome Reconstruction Problem Approach 1.Identify duplicated sequences A 1, …, A m 2.Amplisome is shortest common superstring of A 1, …, A m Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions

Amplisome Reconstruction Problem Assume 1.Tumor genome sequence is known. 2.Insertions are independent, –i.e. no insertions within insertions

ESP Amplisome Reconstruction Problem Approach 1.Identify duples with boundary elements (v 1, w 1 ), … (v m, w m ) 2.Amplisome is shortest path in ESP graph containing subpaths v 1 …w 1, v 2 …w 2, …, v m …w m Assume 1.Insertions are independent, –i.e. no insertions within insertions u AB w C v E

33 clusters Total length: 31Mb Reconstructed MCF7 amplisome Chromosomes Amplisome model explains 24/33 invalid clusters. Raphael and Pevzner. Bioinformatics 2004.

Resulting clone: yiyi axixi b x2x2 y2y2 abx1x1 y1y1 (b – y 1 )(a – x 1 )+ Clone size: Duplicated Translocation Breakpoint (a,b) in one clone suggests sizes (a-x i ) + (b – y i ) for other clones in cluster Cluster of 20 ES pairs. One clone sequenced. Experimental sizes agreed with inferred sizes All clones share same breakpoint. Duplication of region occurs after translocation

Clone Sequencing (Joint work with Jan-Fang Cheng, LBNL ) Draft sequencing of 29 clones kb 117kb Three clones from MCF7 with indicated lengths. Colors and labels indicate chromosome of origin. 50 rearrangement breakpoints Some clones have complex internal organization

Array Comparative Genomic Hybridization (aCGH) 4. Combining ESP with other genome data Joint work with Z. Yakhini, D. Lipson (Agilent and Technion)

CGH Analysis Divide genome into segments of equal copy number Copy number profile Copy number Genome coordinate

CGH Analysis Divide genome into segments of equal copy number Copy number profile Numerous methods (e.g. clustering, Hidden Markov Model, Bayesian, etc.) Segmentation No information about: Structural rearrangements (inversions, translocations) Locations of duplicated material in tumor genome. Copy number Genome coordinate ESP!

CGH Segmentation How are the copies of segments linked??? Copy number Genome Coordinate Tumor genome ES pairs links segments

ESP + CGH ES near segment boundaries Copy number Genome Coordinate CGH breakpoint ESP breakpoint

ESP and CGH Breakpoints BT474 MCF7 ESP breakpoints CGH breakpoints 33 (P = 5.4 x ) (P = 1.2 x ) 730 ESP breakpoints CGH breakpoints /39 clusters 8/33 clusters

Microdeletion in BT Copy number ES pair ≈ 600kb Valid ES < 250kb “interesting” genes in this region

Combining ESP and CGH ES pairs links segments. Copy number balance at each segment boundary: 5 = Copy number Genome Coordinate 3 2 5

Combining ESP and CGH CGH copy number not exact. What genome architecture “most consistent” with ESP and CGH data? Copy number Genome Coordinate ≤ f(e) ≤ 5 1 ≤ f(e) ≤ 3 1 ≤ f(e) ≤ 4

Combining ESP and CGH Copy number Genome Coordinate Edge for each CGH segment. 2.Edge for each ES pair consistent with segments. 3.Range of copy number values for each CGH edge. Build graph 3 ≤ f(e) ≤ 51 ≤ f(e) ≤ 31 ≤ f(e) ≤ 4

Network Flow Problem Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 f(e) Flow constraint on each CGH edge l(e) ≤ f(e) ≤ u(e) 8 e

Network Flow Problem Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1 f(e) Flow in = flow out at each vertex  (u,v) f( (u,v) ) =  (v,w) f( v,w) ) 8 v l(e) ≤ f(e) ≤ u(e) 8 e

Network Flow Problem Minimum Cost Circulation with Capacity Constraints (Sequencing by Hybridization, Sequence Assembly) Source/sink min  e  (e) Subject to: Costs:  (e) = 0, e ESP or CGH edge 1, e incident to source/sink f(e)  (u,v) f( (u,v) ) =  (v,w) f( v,w) ) 8 v l(e) ≤ f(e) ≤ u(e) 8 e Flow constraints: l(e) ≤ f(e) ≤ u(e) CGH edge: l(e) and u(e) from CGH ESP edge: l(e) = 1, u(e) = 1

Network Flow Results Unsatisfied flow are putative locations of missing ESP data. Prioritize further sequencing. Source/sink f(e) Targeted ESP by screening library with CGH probes.

Network Flow Results Identify amplified translocations –14 in MCF7 –5 in BT474 Paths of high weight edges: amplicon structures Flow values → Edge weights