Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Recombination and genetic variation – models and inference
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Xuhua Xia Mutation Xuhua Xia
14 Molecular Evolution and Population Genetics
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Scott Williamson and Carlos Bustamante
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), McLean,
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
The Biology and Genetic Base of Cancer. 2 (Mutation)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Variations of neutral substitution patterns along mammalian genomes Julien Meunier, Laurent Duret Laboratoire de Biométrie et Biologie Evolutive CNRS -
Calculating branch lengths from distances. ABC A B C----- a b c.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
What is a SNP?. Lecture topics What is a SNP? What use are they? SNP discovery SNP genotyping Introduction to Linkage Disequilibrium.
Selectionist view: allele substitution and polymorphism
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
NEW TOPIC: MOLECULAR EVOLUTION.
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Schematic of Eukaryotic Protein-Coding Locus
BME 130 – Genomes Lecture 20 Population Genomics I.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Indel rates and probabilistic alignments Gerton Lunter Budapest, June 2008.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Evolution of gene function
The neutral theory of molecular evolution
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Very important to know the difference between the trees!
Gene – Expression – Mutation - polymorphism
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Detection of the footprint of natural selection in the genome
Gene duplications: evolutionary role
Volume 13, Issue 23, Pages (December 2003)
Presentation transcript:

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5 octobre 2007

What’s in our genome ? bp Repeated sequences: ~50% 20,000-25,000 protein-coding genes Protein-coding regions : 1.2% Other functional elements in non-coding regions: 4-10%

How to identify functional elements ?

What make chimps different from us ? What are the functional elements responsible for adaptative evolution ? point substitutions + indels + duplications (copy number variations)

Genome annotation by comparative genomics Basic principle : –Functional element constrained by natural selection –Detecting the hallmarks of selection in genomic sequences Negative selection (conservation) Positive selection (adaptation)

Evolution : mutation, selection, drift Base modification, replication error, deletion, insertion,... = premutation Mutation DNA repair germline transmission to the offspring (polymorphism) Loss of the allele Individual Population (N) Fixation Substitution no transmission to the offspring soma

Evolution : mutation, selection, drift Probability of fixation: p = f(s, N e ) s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection N e : effective population size: stochastic effects of gamete sampling are stronger in small populations |N e s| < 1 : effectively neutral mutation

Demonstrate the action of selection = reject the predictions of the neutral model Base modification, replication error, deletion, insertion, etc. Mutation Polymorphism Individual Population (N e ) Fixation Substitution Substitution rate = f(mutation rate, fixation probability) |N e s| < 1 : substitution rate = mutation rate

Tracking natural selection... Mutation rate: u Substitution rate: K Negative selection => K < u Neutral evolution => K = u Positive selection => K > u How to estimate u ? => Use of neutral markers

Tracking natural selection... Synonymous substitution rate: Ks Non-synonymous substitution rate: Ka Hypothesis: synonymous sites evolve (nearly) neutraly  Ks ~ u Negative selection => Ka < Ks Neutral evolution => Ka = Ks Positive selection => Ka > Ks

Tracking natural selection... is not so easy Patterns of neutral substitution vary along chromosomes –Impact of molecular processes (replication, DNA-repair, transcription, recombination, …) –Genomic environment (susceptibility to mutagens)

Mammalian genomic landscapes Large scale variations of base composition along chromosomes (isochores) GC% kb 100 kb Sliding windows : 20 kb, step = 2 kb chromosome 19 chromosome 21

GC content variations affect both coding and non-coding regions 3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)

What is the evolutionary process responsible for these large-scale variations in base composition ?

Variation in mutation patterns ? Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)

Selection ? What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???

Biased Gene Conversion ?

Biased Gene Conversion (BGC) If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC Non-crossoverCrossover Molecular events of meiotic recombination Heteroduplex DNA T G mismatch repair T A C G (G -> A) (T -> C)

BGC: a neutral process that looks like selection The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983) BGC intensity depends on: –Recombination rate –Bias in the repair of DNA mismatches –Effective population size GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006) This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)

Does BGC affect substitution patterns ? BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination Relationship between neutral substitution patterns and recombinaion rate ?

Substitution patterns in the hominidae lineage Human, chimp, macaca whole genome alignments: –Genomicro: database of whole genome alignments –2700 Mb (introns and intergenic regions) Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin) Substitution rates: –4 transversion rates: A->T; C->G; A->C; C->A –2 transition rates: A->G; G->A –transitions at CpG sites: G->A Cross-over rate: HAPMAP

GC-content expected at equilibrium (GC*) Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC- content Ratio of AT  GC over GC  AT substitution rates (taking into account CpG hypermutability)

GC-content expected at equilibrium and recombination 30% 40% 50% 60% R 2 = 36% p < Cross-Over Rate (cM/Mb) Equilibrium GC-content GC* N = 2707 non-overlapping windows (1 Mb), from autosomes

GC-content and Recombination Strong correlation: suggests direct causal relationship GC-rich sequences promote recombination ? –Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006) Recombination promotes AT  GC substitutions ?

GC-content and recombination N = 2707 R 2 = 14% p < Cross-Over Rate (cM/Mb) Present GC- content 40% 50% 60% 70%

GC-content expected at equilibrium and recombination 30% 40% 50% 60% R 2 = 36% p < Cross-Over Rate (cM/Mb) Equilibrium GC-content GC* N = 2707 non-overlapping windows (1 Mb), from autosomes

Recombination and GC-content Recombination events: crossover + non-crossover Genetic maps: crossover Non-crossoverCrossover Molecular events of meiotic recombination => The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination

Evolution of GC-content: distance to telomeres Distance to Telomere (Mb) N = 2707 R 2 = 41% p < Equilibrium GC-content GC* GC* vs. crossover rate + distance telomeres: R 2 = 53%

BGC: a realistic model ? Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005) Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005)  Can BGC affect the evolution of Mb-long isochores ?

BGC: a realistic model ? Probability of fixation of a AT-allele Probability of fixation of a GC-allele Effective population size N ~ 10,000 s : BGC coefficient –Recombination hotspots: s = (Spencer et al. 2006) –No BGC outside hotspots: s = 0 Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% ) Pattern of mutation: constant across chromosomes

BGC: a realistic model ? Crossover rate (cM/Mb) Equilibrium GC-content GC* Observations Predictions of the BGC model

Summary (1) Recombination : –Strong impact on patterns of substitutions –drives the evolution of GC-content Most probably an consequence of BGC –Mutation: ! fixation bias favoring GC alleles ! –Selection: ! correlation with recombination rate ! –BGC: all observations fit the predictions of the model

BGC can affect functional regions Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus X specific PAR Recombination rate normal extreme GC synonymous sites normal very high (55%) (90%)

Amino-acid substitutions in Fxy HomoRattusM. spretusM. musculus Y X PAR Y X Time (Myrs) 5’ part of Fxy : ’ part of Fxy :

Amino-acid substitutions in Fxy HomoRattusM. spretusM. musculus Time (Myrs) 5’ part of Fxy : ’ part of Fxy : non-synonymous substitutions, all AT  GC NB: strong negative selection (Ka/Ks < 0.1)

Amino-acid substitutions in Fxy BGC can drive the fixation of deleterious mutations

BGC: a neutral process that looks like selection BGC can confound selection tests

HARs: human-accelerated regions Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements Identify regulatory elements that have significantly accelerated in the human lineage = HARs

Positive selection in the human lineage ? 49 significant HARs HAR1: 120 bp –Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected) –Part of a non-coding RNA gene –Expressed in the brain –Involved in the evolution of human-specific brain features ?

Positive selection ? GC-biased substitution pattern in HARs –HAR1: the 18 substitutions are all AT  GC changes –Known functional elements (coding or non-coding) are not GC-rich HAR1-5: no evidence of selective sweep (Pollard et al. 2006) HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element

Positive selection or BGC ? HARs are located in regions of high recombination Recombination occurs in hotspots (<2 kb) Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots  HARs = substitution hotspots caused by BGC in recombination hotspots

Conclusion (1) GC-rich isochores = result of BGC in highly recombining parts of the genome Recombination drives the evolution of GC-content in mammals Probably a universal process: correlation GC / recombination in many taxa (yeast, drosophila, nematode, paramecia, …)

Conclusion (2) Recombination hotspots = the Achilles’ heel of our genome BGC => substitution hotspots in recombination hotspots

Conclusion (3) Probability of fixation depends on: - selection - drift (population size) - BGC Extending the null hypothesis of neutral evolution: mutation + BGC Galtier & Duret (2007) Trends Genet

Thanks Vincent Lombard (Génomicro) Nicolas Galtier (Montpellier) Peter Arndt (Berlin) Katherine Pollard (UC Davis)

Sex-specific effects Correlation GC* / crossover rate (deCODE genetic map): –male: R 2 = 31% –female: R 2 = 15% The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non- crossover / crossover along chromosomes ?

Chromosome length (Mb)Crossover rate (cM/Mb) GC* Crossover rate (cM/Mb) R 2 =0.84R 2 =0.66 Crossover rate (cM/Mb) R 2 =0.82R 2 =0.81 Human Chicken Crossover rate (cM/Mb) Current GC Chromosome length (Mb) Chromosome size, recombination and GC-content

Recombination and GC-content: a universal relationship ?

G+C content vs. chromosome length: yeast R 2 = 61% Bradnam et al. (1999) Mol Biol Evol

G+C content vs. chromosome length: Paramecium GC-content Chromosome size (kb) R 2 = 67%

Evolution of GC-content Equilibrium GC-content correlates with... –Cross-over rate (HAPMAP): R 2 = 36% –Distance to telomere: R 2 = 41% –Cross-over rate + distance telomeres: R 2 = 53% Recombination pattern: ratio non-crossover / crossover higher near telomeres ?

Frequency distribution of GC and AT alleles <5%5%-15%15%-50%>50% allele frequency proportion of SNPs GC  AT  GC Distribution expected in absence of fixation bias NB: the shape of the distribution may vary according to population history, but should be identical for GC and AT alleles

Frequency distribution of AT and GC alleles at silent sites 410 SNPs with allele frequency (Cargill et al 1999) Chimpanzee as an outgroup to orientate mutations GC alleles segregate at significantly higher frequencies than AT alleles in GC-median and GC-rich genes Duret et al. 2002

Frequency distribution of GC and AT alleles Spencer (2006): analysis of HAPMAP data (SNPs from 60 unrelated individuals) The fixation bias in favor of GC increases near recombination hotspots

Frequency distribution of GC and AT alleles Spencer (2006) Average Derived Frequency Allele AT->GC Allele GC->AT Allele GC->GC Allele AT->AT