Genome Rearrangements in Evolution and Cancer Guillaume Bourque Genome Institute of Singapore HKU-Pasteur Research Centre - Hong Kong August 28 th, 2009.

Slides:



Advertisements
Similar presentations
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Advertisements

Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement Henry Yves et al 2006, in press.
Comparative genomics Joachim Bargsten February 2012.
The bonobo genome compared with the chimpanzee and human genomes Kay Pruüfer et al. Nature (June,2012) Presenter: Chia-Ying Chen.
Molecular Evolution Revised 29/12/06
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Some new sequencing technologies. Molecular Inversion Probes.
Bioinformatics and Phylogenetic Analysis
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Mouse Genome Sequencing
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Identification of Copy Number Variants using Genome Graphs
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
PREETI MISRA Advisor: Dr. HAIXU TANG SCHOOL OF INFORMATICS - INDIANA UNIVERSITY Computational method to analyze tandem repeats in eukaryote genomes.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PatternHunter: A Fast and Highly Sensitive Homology Search Method Bin Ma Department of Computer Science University of Western Ontario.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Genome Rearrangement By Ghada Badr Part I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Canadian Bioinformatics Workshops
Reconstructing the Evolutionary History of Complex Human Gene Clusters
CSCI2950-C Genomes, Networks, and Cancer
Conservation of Combinatorial Structures in Evolution Scenarios
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Jin Zhang, Jiayin Wang and Yufeng Wu
Lecture 3: Genome Rearrangements and Duplications
Reference based assembly
Mattew Mazowita, Lani Haque, and David Sankoff
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Genome Rearrangements in Evolution and Cancer Guillaume Bourque Genome Institute of Singapore HKU-Pasteur Research Centre - Hong Kong August 28 th, 2009

2 Outline Genome Rearrangements in Evolution [ ??? ] Cancer genomics

3 Genome rearrangements in evolution 1999

4 High hopes Explain the physical clustering of gene families (regulation, editing or retention). Understand whether even longer linkage associations were preserved by chance or by selection (developmental or functional). Resolve the mammalian phylogeny using genomic segment exchanges as characters. Discover molecular fossils of precipitous genomic events. Identify genetic determinants of reproductive isolation, adaptation, survival and species formation. O’Brien et al, Science 1999

5 Comparing 2 sequences GGCACAAATCCAAATCCAAATCCGGGTTGGGGTTGGGGTTGGGGTTGCGACACATTTGGCCTGTCGTCGTCCGTCGTC GGCACAAATCCAAATCCAAATCCAATGTGTCGCAACCCCAACCCCAACCCCAACCCTGGCCTGTCGTCGTCCGTCGTC Need to reverse complement

6 If you have 3 sequences… Seq_1 vs Seq_2Seq_1 vs Seq_3Seq_2 vs Seq_ Seq_1 : Seq_2 : Seq_3 :

7 Seq_1: Seq_2: Seq_3: Inversion Block 2 Inversion Block 4 A: Rearrangement Phylogeny

8 Synteny blocks

9 Genome rearrangements Reversal Translocation Fusion Fission

10 Algorithms for sorting genomes Polynomial algorithm for computing the rearrangement distance and the most parsimonious scenario between 2 unichromosomal genomes (Hannenhalli and Pevzner 1995). For example: Further developed for multi-chromosomal genomes (Tesler 2002) and multiple genomes (Bourque and Pevzner 2002).

11 Chromosome X two way similarities (PatternHunter) synteny bocks (GRIMM-Synteny) rearrangement scenario (MGR)

12 History of Chromosome X

13 Mammalian phylogeny Murphy et al, Science, 2005 cow pig dog catratmousehuman

14 X chromosome evolution

15 Overview of the Results Nearly 20% of chromosome breakpoint regions were reused. Gene-density is higher in evolutionary breakpoint regions. Segmental duplications populate the majority of primate- specific breakpoints.

16 Human Chromosome 11

17 Debate on ancestral reconstructions

18 Debate on ancestral reconstructions

19 Recovering true ancestral events Analyses of genome rearrangements are typically evaluated on: –Quality of the ancestral reconstructions –Ability to recover the correct topology –Total number of rearrangements in the scenario recovered (parsimony) We decided to focus on the accuracy of the rearrangements recovered Start by measuring accuracy using simulations and then apply the approach to real data sets Why? –Look for events that could have been involved in speciation –Look at sequence features associated with these events (e.g. repeats, genes, etc.) –Gain mechanistic insights into genome rearrangements

20 EMRAE :: Efficient Method to Recover Ancestral Events Relies on adjacencies conserved in a significant fraction of the genomes. Combines conserved adjacencies (and nearly conserved adjacencies) to predict rearrangement events. Applicable to uni and multi-chromosomal genomes. Currently models: inversions, translocations, fusions, fissions and transpositions. But also amenable to insertions and deletions. Achieves high specificity with comparable sensitivity.

21 Conserved adjacencies Define an adjacency a(c i, c i+1 ) as an ordered pair of integers c i c i+1 or its inverse -c i+1 -c i found in a given genome. For a given edge e, if the adjacency a is found in every genome of S A but not in any genome of S B we say that a is a conserved adjacency of S A.

22 Conserved adjacencies :: example

23 Simulation results Higher specificity

24 Mammalian rearrangements events ( reversals, translocations, transpositions, fusions/fissions ) Predicted 1109 events at a 10Kb resolution: 831 reversals 237 transpositions 15 translocations 26 fusions/fissions

25 Mammalian rearrangements events ( reversals, translocations, transpositions, fusions/fissions ) Predicted 1109 events at a 10Kb resolution: 831 reversals 237 transpositions 15 translocations 26 fusions/fissions

Human-chimp-specific reversal

27 Human-specific breakpoints are enriched in SDs Human-specific breakpoint regions are significantly enriched in SDs as compared to size-matched random regions (p-value < 0.001). Indeed, 93.2% of the human-specific breakpoint regions (69 out of 74) contain SDs. This is true for only approximately 60% of size-matched random regions.

28 Homologous matching pairs of SDs are enriched in human-specific breakpoints Taking the 74 human-specific breakpoints identified in this study, we observed 100 pairs of regions with matching pairs of SDs instead of an average of 25 pairs observed in the random simulated data sets.

29 Primate reversals are associated with SDs The average percent identity of the SDs that are associated with reversals correlates with the relative age of these events. This helps confirms the direct link between SDs and many rearrangements events.

Extension from primate specific reversals to all the predicted mammalian reversals We used BLAST to detect homology between breakpoints of the predicted reversals Many reversals are flanked by regions of high sequence identity (BLAST score >1000) If not SDs, what?

31 Homology flanking mammalian reversals We found that 58%, 29%, 24%, 42%, 47% and 20% of the human, chimp, rhesus, rat, mouse and dog reversals are supported by regions with Blast scores greater than What is the source of this homology? Is it expected? We restricted our analysis to the reversals with breakpoints defined within 100Kb and assessed the overlap between these regions of homology and repeats. We annotated each reversal to a particular repeat family when the overlap between the homologous segment identified and a repeat instance was greater than 50% and compared the results to matched simulated data sets.

32 Overrepresentation of paired L1 repeats

33 Outline Genome Rearrangements in Evolution [ ??? ] Cancer genomics

Sequencing Revolution Sanger sequencing (1970s) Next-Generation sequencing (2007-now) IlluminaSOLiD

Data Explosion Sequencing is no longer the rate limiting step This year, we expect: –2X increase in CPU –2X increase in memory –10X increase in sequencing (estimate from Illumina and SOLiD) or even 100X increase (Helios, Complete Genomics, etc.) Informatics challenges that we face now will only grow… 35

36

Paradigm Shift Things that are out: –Storing all primary data (images) –“All versus all” types of analysis –Single large repository (NCBI) –Careless data management (duplicated files, extra transferring steps, etc.) Things that are in: –Clusters and high performance storage –Cloud computing –Careful data management & planning –Bioinformaticians & IT engineers (even for relatively small labs) 37

38 Sequencing Human Genomes 1000 Genomes Project $$$ The Human Genome $$$$$$ Your Genome $ (?)

39 New opportunities… Evolution Populations Cancer In the study of …

40 Outline Genome Rearrangements in Evolution [ ??? ] Cancer genomics

41 Gene Identification Signature Ng, et al., Nature Methods, 2005

42 PET technology ~ ~ ~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ cDNA PET Cancer Cell Human Genome

43 Highly rearranged cancer genome Provided by Nalla Palanisamy, GIS

44 Impact of rearrangements on PETs Cancer Normal InversionDeletion Translocation Cancer Normal

45 GIS-PET MCF-7 Transcriptome 584,624 cDNA equivalents 135,757 Unique PETs One location (tag1)Unmappable (tag0) 92,928 PETs (69%)33,097 PETs (24%)9,732 PETs (7%) Multi-location

46 Sequence-based clustering All unmappable PETs (tag0) Cluster based on sequence similarity ---GGAGCCGCGGCCGCC ACGATCCCAC-AGCCTC ----GAGCCGCGGCCGCC---AAGAACGATACCAC-AGCCTC ATTGGAGCTGCGGCCGC ACGATCCCAC-AGCCTC --TGGAGCCGCGGCCGCCGA-----ACGATCCCAC-AGCCTC GCGGCGGCCGCC---AAGAACGATCCCAC-AGCCCC ----GAGCCGCGGCCGCCG---AGCACGATCCCACTAGCCTC Align ATTGGAGCCGCGGCCGCCGA AGAACGATCCCACAGCCTC 5’3’ Extract consensus Map to human genome 5’ 3’

47 5’3’ 77 unique PETs 339 total PETs 20q1317q23 BCAS4 BCAS3 … Largest unmappable cluster

48 BCAS4-3 fusion transcript

49 Fusion transcript discovery pipeline Ruan et al. Genome Res, 2007

Genomic DNA fragmentation PET library construction & sequencing PET sequences mapping to reference genome PET mapping span 1Kb10Kb 1Kb peak 10Kb peak Genomic PET (gPET)

51 Putting everything together… Mitelman 342 entries Fragile sites 118 entries ChimerDB 848 entries Sanger 428 entries aCGH Exon Array High-resolution map of aberrations in cancer 5’ 3’ Tag0s prioritize annotate GIS-PET & gPET

52 Acknowledgments From my group: –Zhao Hao, Chi Ho Lin, Johni Masli (NUS) –Galih Kunarso, Justin Jeyakani –Woo Xing Yi, Kelson Zawack With the help of: –Yijun Ruan, Yao Fei, Axel Hillmer, Chia-Lin Wei –Charlie Lee, Pramila Ariyaratne, Ken Sung –Ed Liu –Jian Ma (UCSC), Pavel Pevzner and Glenn Tesler (UCSD) –GIS and A*STAR for financial support