The impact of whole genome duplications: insights from Paramecium tetraurelia.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Shiri Freilich Janet Thornton’s group, EBI Cambridge University Relating the evolution of gene content to tissue specialization.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Bioinformatis and Evolutionary Genomics Genome Duplications.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Ohnologs and Regulatory Networks Robbie Sedgewick Group Meeting March 2, 2006.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Sources of Genetic Variation
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Population GENETICS.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Mehdi Layeghifard Evolutionary Mechanisms Underlying the Functional Divergence of Vertebrates’ Circadian Rhythm Genes.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Comparative genomics Haixu Tang School of Informatics.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Overview -Overview of Grass flower morphology -Floral organ identity and the evolution of the Grass flower -SEPALLATA3 genes and floral organ developent.
Mutation & genetic variation. Mutations gene – stretch of dna that codes for a distinctive type of rna or protein allele – versions of the same gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Evolution of Duplicated Genomes Talline Martins
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Single Nucleotide Polymorphisms (SNPs) By Amira Jhelum Rahul Shweta.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Daniel Kahn, Jean-François Gout & Laurent Duret
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Evolution of eukaryotic genomes
Evolution of gene function
Genetics and Evolutionary Biology
The neutral theory of molecular evolution
Basics of Comparative Genomics
15-2 Mechanisms of Evolution
Gene duplications: evolutionary role
Evolution of eukaryote genomes
Volume 11, Issue 3, Pages (March 2018)
Volume 2, Issue 5, Pages (November 2012)
Evolutionary genetics
Models for the evolution of gene-duplicates: Applications of Phase-Type distributions. Tristan Stark1, David Liberles1, Małgorzata O’Reilly2,3 and Barbara.
Basics of Comparative Genomics
Volume 11, Issue 3, Pages (March 2018)
Chapter 18: Evolution and Origin of Species
Presentation transcript:

The impact of whole genome duplications: insights from Paramecium tetraurelia

Genome Annotation Ab initio gene predictions Comparative approach 90,000 ESTs

A compact Mac genome Protein-coding regions: 78% of the genome Short intergenic regions Average = 352 bp Introns: Short (average = 25 bp) … … but numerous : 80% of genes contain introns (average = 2.9 introns / gene)

39642 annotated genes Gene content E. cuniculi S. cerevisiae N. crassa D. discoideum T. brucei T. pseudonana P. falciparum P. tetraurelia C. intestinalis D. melanogaster C. elegans X. tropicalis T. nigrovidis H. sapiens M. musculus A. thaliana O. sativa Number of genes Not due to annotation artefacts (control with cDNA data, distribution of protein length, manual curation on chrom. 1, …) 39642

Many genes belong to multigenic families Computing Best Reciprocal Hits (BRH) within Paramecium proteins SW comparisons + filtering proteins pairs of proteins in BRH

BRH are found in large duplicated blocs (paralogons). Example: scaffold 1 & 8

Building paralogons Using a sliding window of size w genes For each window : –Select a paralogous region if at least p % of w genes are BRH with the sequence Merging overlapping windows Add syntenic genes which do not have BRH

Whole genome duplication (WGD) Settings : W = 10 p = 61% Coverage : 61.3 Mb (85%) genes (90%) Résults : genes in 2 copies (68%) genes in 1 copie (32%) 51% of ancestral genes are still in 2 copies

Progressive loss of gene duplicates ~1500 recent pseudogenes (recognizable) Length distribution of genic and intergenic sequences : relics of more ancient pseudogenes in intergenic regions Single-copy gene Intergenic region encompassing a gene loss Other intergenic regions Sequence length (bp) Frequency (%)

BRH from supercontig 8 Number of BRH (>3000) remains outside of paralogons

Paralogous genes Inferring ancestral blocs Arbitrary order Ancestral blocs Building paralogons with 131 ancestral blocs

Intermediary WGD Settings : W = 10 p = 40% Coverage : 31,129 genes (79%) Content before WGD : 20,578 genes genes in 2 copies (39%) genes in 1 copy (61%)

Old WGD Settings : W = 20 p = 30% Coverage : 18,792 genes (47%) Content before WGD : 9,999 genes genes in 2 copies (15%) genes in 1 copy (85%)

Gene content at each WGD genes Old WGD Intermediary WGD Recent WGD x 1.1 x 1.2 x 1.5 x 2 (not x 8)

Protein sequence similarity between duplicates (ohnologs) Old WGD Intermediary WGD Recent WGD

Distribution of the rate of synonymous substitution (dS) between ohnologs Old WGD Intermediary WGD Recent WGD dS computed with PAML saturation Recent gene conversion

Recent WGD dN/dS Frequency (%) Distribution of dN/dS => both ohnologs are under strong negative selective pressure Yet … the fate of most ohnologs is to be pseudogenized ! => gene-silencing mutations can be tolerated … … but deleterious mutations affecting the coding sequence of one copy are counterselected (i.e. dominant effect of mutations, despite the presence of a duplicate) Once a gene has been silenced (e.g. by mutation of regulatory elements), mutations can accumulate in coding regions

Gene duplicates are evolutionarily unstable Gene duplication... Time Pseudogene Ancient paralogs Selective pressure to maintain 2 copies

Retention of gene duplicates Different (non-exclusive) models have been proposed for the retention of gene duplicates: –Robustness against mutations –Functional changes: neo- or sub-functionalization –Dosage constraints Which are the genes that are preferentially retained after a WGD ? How does the pattern of gene retention vary with time ? –Compare the pattern of retention after a recent WGD and a more ancient WGD –Paramecium: 3 successive WGDs !

Mutational robustness Under certain conditions (high mutation rate and very large population size) redundant genes may be maintained by selection acting against double null alleles (Force et al. 1999) Essential genes (e.g. ribosomal proteins) are more retained than the average … but most of them are present in more than 2 copies ! … their high rate of retention may be due to other factors (see later)

Functional changes... Time Function: F Function: F’ Neofunctionalization (adaptation) Subfunctionalization (neutral evolution)... Function: F1F2 Function: F1 Function: F2 Functional changes: - changes in gene expression pattern - changes in the encoded protein Force et al. (1999)

Prediction of the subfunctionalization model A gene that has been preserved by subfunctionalization at a given WGD, is less likely to be retained in two copies at a subsequent WGD (Force et al. 1999) F1F2 F1 F2 WGD1 WGD2 F1F2 F1 F2 F1 F2 WGD1 WGD2

Test of the subfunctionalization model (1) Apparent contradiction with the subfunctionalization model Due to variations in retention rate between different functional classes ? Intermediate WGD Retained: 47% Retained: 57% Retention at the recent WGD ? N=7,996 N=12,582

Test of the subfunctionalization model (2) A gene that has been preserved at a given WGD, is less likely to be retained in two copies at a subsequent WGD Difference significant (p<5%), but not very strong Subfunctionalization is an unlikely evolutionary pathway in species with large population sizes (Lynch 2005) Old WGD Intermediate WGD Retention at the recent WGD ? N = 343 gene families Retained: 67% Retained: 60%

Test of the neofunctionalization model Analysis of gene expression (work in progress) Analysis of the rate of protein evolution: Outgroup (function F) Ohnolog 1 (function F) Ohnolog 2 (function F’) Relative rate test (PAML); correction for multiple tests Frequency of ohnologs with asymetric substitution rates: –Recent WGD (N=2297) : 11% –Intermediate WGD (N=293 ) : 16% More functional redundancy among recent duplicates Functional changes account for retention on the long term

Fate of neofunctionalized genes at subsequent WGD Intermediate WGD Slow copy: 66% retained Fast copy: 26% retained Retention at the recent WGD ? Neofunctionalized genes are more prone to pseudogenization at subsequent WGD N = 62

Retention for dosage constraints (1): high expression level Genes that have to be expressed at very high level are often present in multiple copies (e.g. histones) The loss of one copy is counterselected because it cannot be compensated for by the upregulation of other copies => More retention among highly expressed genes

Retention rates For each WGD, the retention rate for a given gene category is : Proportion of genes retained in duplicates in this category Ratio = Proportion of total genes retained in duplicates Ratio = 1 no specific retention above the mean value for all genes Ratio > 1 over-retained category Ratio < 1 under-retained category

Expression versus Retention

Retention for dosage constraints (2): the balance hypothesis (Papp et al. 2003) The relative expression levels of proteins involved in a same functional network have to be controled to ensure the proper stoichiometry of the network Initially, the loss of one copy is counterselected because it creates an imbalance within the network On the long term, gene losses may occur because they can be compensated for by the upregulation of other copies

Testing the balance hypothesis (1): Genes involved in multi-protein complexes Protein complexes predicted by homology with yeast: –MIPS database (curation from the litterature) –TAP / MS data (Gavin et al. Nature 2006)

Multi-protein complexes Genes involved in the coding of protein complexes are initially over-retained

Additive effects of Expression and Inclusion in Complex

Proteins involved in complexes are over- retained at the recent WGD Does this mean that complex stoichiometry tends to be conserved ?

Constraint of stoichiometry and fate of duplicates Complexesp-value with conserved stoichiometry Recent WGD265 (44%)2.6x (68%)4.3x10 -4 Intermediary WGD114 (20%)1.5x (43%)2.4x10 -4 Old WGD106 (24%)1.2x (43%)2.5x10 -3 MIPS complexes Complexes from Gavin et al. Nature 2006 Number of copy of A Number of copy of B complex A B

Testing the balance hypothesis (2): genes involved in central metabolism

Retention of central metabolism gene duplicates Genes involved in the central metabolism are initially over- retained and then under-retained (less neofunctionalization ?)

Dating genome duplications Phylogenetic analyses of orthologous genes in other ciliate species => date WGDs relative to speciation events

Tetrahymena thermophila P. putrinum P. bursaria P. polycaryum P. nephridiatum P. duboscqui P. multimicronucleatum P. caudatum P. tetraurelia P. pentaurelia P. primaurelia P. sexaurelia P. jenningsi P. octaurelia P. novaurelia P. tredecaurelia P. quadecaurelia Paramecium aurelia complex Intermediate WGD Old WGD Recent WGD Complex aurelia: 15 sibling species (same kind of habitat, initially thought to correspond to a single species)

How does WGD relate to speciation?

Ptetra Pprim With the kind permission of K. Wolfe Polyploid paramecia

Ptetra Pprim Polyploid paramecia Mating, meiosis

Dobzhansky-Muller incompatibility by reciprocal gene loss For 1 locus, 1/4 of the offspring is inviable. For n loci, offspring viability is (3/4) n  Reproductive isolation

Conclusions (1) At least 3 WGDs in paramecium (probably 4) WGDs are rare events … that occured recurrently in the evolution of eukaryotes (fungi, animals, plants, ciliates …) Major impact on the evolution of the gene repertoire

Conclusions (2) Dosage constraints appear as an essential force shaping the gene repertoire after WGD Functional changes contribute to gene retention on the long term … … but the fate of the vast majority of genes is to get pseudogenized

Conclusions (3) Relationship between the number of genes and organism complexity –The number of genes is driven by selection … –… and contingency (time since the last WGD) WGDs may be reponsible for (non- adaptative) explosive radiation of species (Dobzhansky-Muller incompatibility by reciprocal gene loss)

CNRS-UPR CGM - Gif sur Yvette –Jean Cohen –Linda Sperling CNRS-UMR8541 – ENS - Paris –Eric Meyer –Mireille Bétermier CNRS-UMR8125 – IGR - Villejuif –Philippe Dessen CNRS-UMR5558 – PBIL - Lyon –Laurent Duret –Vincent Daubin Genoscope - CNRS UMR 8030 –Jean-Marc Aury –Olivier Jaillon –Benjamin Noel –Betina Porcel –Vincent Schachter –Patrick Wincker –Jean Weissenbach