Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.

Similar presentations


Presentation on theme: "Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5."— Presentation transcript:

1 Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5 octobre 2007

2 What’s in our genome ? 3.1 10 9 bp Repeated sequences: ~50% 20,000-25,000 protein-coding genes Protein-coding regions : 1.2% Other functional elements in non-coding regions: 4-10%

3 How to identify functional elements ?

4 What make chimps different from us ? What are the functional elements responsible for adaptative evolution ? 30 10 6 point substitutions + indels + duplications (copy number variations)

5 Genome annotation by comparative genomics Basic principle : –Functional element constrained by natural selection –Detecting the hallmarks of selection in genomic sequences Negative selection (conservation) Positive selection (adaptation)

6 Evolution : mutation, selection, drift Base modification, replication error, deletion, insertion,... = premutation Mutation DNA repair germline transmission to the offspring (polymorphism) Loss of the allele Individual Population (N) Fixation Substitution no transmission to the offspring soma

7 Evolution : mutation, selection, drift Probability of fixation: p = f(s, N e ) s : relative impact on fitness s = 0 : neutral mutation (random genetic drift) s < 0 : disadvantageous mutation = negative (purifying) selection s > 0 : advantageous mutation = positive(directional) selection N e : effective population size: stochastic effects of gamete sampling are stronger in small populations |N e s| < 1 : effectively neutral mutation

8 Demonstrate the action of selection = reject the predictions of the neutral model Base modification, replication error, deletion, insertion, etc. Mutation Polymorphism Individual Population (N e ) Fixation Substitution Substitution rate = f(mutation rate, fixation probability) |N e s| < 1 : substitution rate = mutation rate

9 Tracking natural selection... Mutation rate: u Substitution rate: K Negative selection => K < u Neutral evolution => K = u Positive selection => K > u How to estimate u ? => Use of neutral markers

10 Tracking natural selection... Synonymous substitution rate: Ks Non-synonymous substitution rate: Ka Hypothesis: synonymous sites evolve (nearly) neutraly  Ks ~ u Negative selection => Ka < Ks Neutral evolution => Ka = Ks Positive selection => Ka > Ks

11 Tracking natural selection... is not so easy Patterns of neutral substitution vary along chromosomes –Impact of molecular processes (replication, DNA-repair, transcription, recombination, …) –Genomic environment (susceptibility to mutagens)

12 Mammalian genomic landscapes Large scale variations of base composition along chromosomes (isochores) 30 40 50 60 GC% 02004006008001000 kb 100 kb Sliding windows : 20 kb, step = 2 kb chromosome 19 chromosome 21

13 GC content variations affect both coding and non-coding regions 3661 human genes from 1652 large genomic sequences (> 50 kb; average = 134 kb). Total = 221 Mb (98% non-coding)

14 What is the evolutionary process responsible for these large-scale variations in base composition ?

15 Variation in mutation patterns ? Analysis of polymorphism data: in GC-rich regions, AT->GC mutations have a higher probability of fixation than GC->AT mutations (Eyre-Walker 1999; Duret et al. 2002; Spencer et al. 2006)

16 Selection ? What could be the selective advantage confered by a single AT->GC mutations in a Mb-long genomic region ???

17 Biased Gene Conversion ?

18 Biased Gene Conversion (BGC) If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of each base) => BGC Non-crossoverCrossover Molecular events of meiotic recombination Heteroduplex DNA T G mismatch repair T A C G (G -> A) (T -> C)

19 BGC: a neutral process that looks like selection The dynamics of the fixation process for one locus under BGC is identical to that under directional selection (Nagylaki 1983) BGC intensity depends on: –Recombination rate –Bias in the repair of DNA mismatches –Effective population size GC-alleles have a higher probability of fixation than AT-alleles (Eyre-Walker 1999, Duret et al. 2002, Lercher et al. 2002, Spencer et al. 2006) This fixation bias in favor of GC-alleles increases with recombination rate (Spencer 2006)

20 Does BGC affect substitution patterns ? BGC should affect the relative rates of AT->GC vs GC->AT substitutions in regions of high recombination Relationship between neutral substitution patterns and recombinaion rate ?

21 Substitution patterns in the hominidae lineage Human, chimp, macaca whole genome alignments: –Genomicro: database of whole genome alignments –2700 Mb (introns and intergenic regions) Substitutions infered by maximum likelihood approach (collaboration with Peter Arndt, Berlin) Substitution rates: –4 transversion rates: A->T; C->G; A->C; C->A –2 transition rates: A->G; G->A –transitions at CpG sites: G->A Cross-over rate: HAPMAP

22 GC-content expected at equilibrium (GC*) Equilibrium GC-content : the GC content that sequences would reach if the pattern of substitution remains constant over time = the future of GC- content Ratio of AT  GC over GC  AT substitution rates (taking into account CpG hypermutability)

23 GC-content expected at equilibrium and recombination 30% 40% 50% 60% 0123456789 R 2 = 36% p < 0.0001 Cross-Over Rate (cM/Mb) Equilibrium GC-content GC* N = 2707 non-overlapping windows (1 Mb), from autosomes

24 GC-content and Recombination Strong correlation: suggests direct causal relationship GC-rich sequences promote recombination ? –Gerton et al. (2000), Petes & Merker (2002), Spencer et al. (2006) Recombination promotes AT  GC substitutions ?

25 GC-content and recombination N = 2707 R 2 = 14% p < 0.001 Cross-Over Rate (cM/Mb) Present GC- content 40% 50% 60% 70% 0123456789

26 GC-content expected at equilibrium and recombination 30% 40% 50% 60% 0123456789 R 2 = 36% p < 0.0001 Cross-Over Rate (cM/Mb) Equilibrium GC-content GC* N = 2707 non-overlapping windows (1 Mb), from autosomes

27 Recombination and GC-content Recombination events: crossover + non-crossover Genetic maps: crossover Non-crossoverCrossover Molecular events of meiotic recombination => The correlation between GC* and crossover rate might underestimate the real correlation between GC* and recombination

28 Evolution of GC-content: distance to telomeres 0.30 0.40 0.50 0.60 0.1110100 Distance to Telomere (Mb) N = 2707 R 2 = 41% p < 0.0001 Equilibrium GC-content GC* GC* vs. crossover rate + distance telomeres: R 2 = 53%

29 BGC: a realistic model ? Recombination occurs predominantly in hotspots that cover only 3% of the genome (Myers et al 2005) Recombination hotspots evolve rapidly (their location is not conserved between human and chimp) (Ptak et al. 2005, Winkler et al. 2005)  Can BGC affect the evolution of Mb-long isochores ?

30 BGC: a realistic model ? Probability of fixation of a AT-allele Probability of fixation of a GC-allele Effective population size N ~ 10,000 s : BGC coefficient –Recombination hotspots: s = 1.3 10 -4 (Spencer et al. 2006) –No BGC outside hotspots: s = 0 Hotspots density: 3% (in average), variations along chromosomes (0.05% to 10.7% ) Pattern of mutation: constant across chromosomes

31 BGC: a realistic model ? Crossover rate (cM/Mb) Equilibrium GC-content GC* Observations Predictions of the BGC model

32 Summary (1) Recombination : –Strong impact on patterns of substitutions –drives the evolution of GC-content Most probably an consequence of BGC –Mutation: ! fixation bias favoring GC alleles ! –Selection: ! correlation with recombination rate ! –BGC: all observations fit the predictions of the model

33 BGC can affect functional regions Fxy gene : translocated in the pseudoautosomal region (PAR) of the X chromosome in Mus musculus X specific PAR Recombination rate normal extreme GC synonymous sites normal very high (55%) (90%)

34 Amino-acid substitutions in Fxy HomoRattusM. spretusM. musculus Y X PAR Y X 0 20 80 40 60 Time (Myrs) 5’ part of Fxy : 4 21 01 0 28 3’ part of Fxy : 5 1 0 31

35 Amino-acid substitutions in Fxy HomoRattusM. spretusM. musculus 0 20 80 40 60 Time (Myrs) 5’ part of Fxy : 4 21 01 0 28 3’ part of Fxy : 5 1 0 31 28 non-synonymous substitutions, all AT  GC NB: strong negative selection (Ka/Ks < 0.1)

36 Amino-acid substitutions in Fxy BGC can drive the fixation of deleterious mutations

37 BGC: a neutral process that looks like selection BGC can confound selection tests

38 HARs: human-accelerated regions Pollard et al. (Nature, Plos Genet. 2006) : searching for positive selection in non-coding regulatory elements Identify regulatory elements that have significantly accelerated in the human lineage = HARs

39 Positive selection in the human lineage ? 49 significant HARs HAR1: 120 bp –Rate of evolution >> neutral rate (18 fixed substitutions in the human lineage, vs. 0.7 expected) –Part of a non-coding RNA gene –Expressed in the brain –Involved in the evolution of human-specific brain features ?

40 Positive selection ? GC-biased substitution pattern in HARs –HAR1: the 18 substitutions are all AT  GC changes –Known functional elements (coding or non-coding) are not GC-rich HAR1-5: no evidence of selective sweep (Pollard et al. 2006) HAR1: the accelerated region covers >1 kb, i.e. is not restricted to the functional element

41 Positive selection or BGC ? HARs are located in regions of high recombination Recombination occurs in hotspots (<2 kb) Given known parameters (population size, fixation bias), the BGC model predicts substitution hotspots within recombination hotspots  HARs = substitution hotspots caused by BGC in recombination hotspots

42 Conclusion (1) GC-rich isochores = result of BGC in highly recombining parts of the genome Recombination drives the evolution of GC-content in mammals Probably a universal process: correlation GC / recombination in many taxa (yeast, drosophila, nematode, paramecia, …)

43 Conclusion (2) Recombination hotspots = the Achilles’ heel of our genome BGC => substitution hotspots in recombination hotspots

44 Conclusion (3) Probability of fixation depends on: - selection - drift (population size) - BGC Extending the null hypothesis of neutral evolution: mutation + BGC Galtier & Duret (2007) Trends Genet

45 Thanks Vincent Lombard (Génomicro) Nicolas Galtier (Montpellier) Peter Arndt (Berlin) Katherine Pollard (UC Davis)

46 Sex-specific effects Correlation GC* / crossover rate (deCODE genetic map): –male: R 2 = 31% –female: R 2 = 15% The rate of cross-over is a poor predictor of the total recombination rate in female: more variability in the ratio non- crossover / crossover along chromosomes ?

47 Chromosome length (Mb)Crossover rate (cM/Mb) GC* Crossover rate (cM/Mb) R 2 =0.84R 2 =0.66 Crossover rate (cM/Mb) R 2 =0.82R 2 =0.81 Human Chicken Crossover rate (cM/Mb) Current GC Chromosome length (Mb) Chromosome size, recombination and GC-content

48 Recombination and GC-content: a universal relationship ?

49 G+C content vs. chromosome length: yeast R 2 = 61% Bradnam et al. (1999) Mol Biol Evol

50 G+C content vs. chromosome length: Paramecium GC-content Chromosome size (kb) R 2 = 67%

51 Evolution of GC-content Equilibrium GC-content correlates with... –Cross-over rate (HAPMAP): R 2 = 36% –Distance to telomere: R 2 = 41% –Cross-over rate + distance telomeres: R 2 = 53% Recombination pattern: ratio non-crossover / crossover higher near telomeres ?

52 Frequency distribution of GC and AT alleles <5%5%-15%15%-50%>50% 0 0.2 0.4 0.6 allele frequency proportion of SNPs GC  AT  GC Distribution expected in absence of fixation bias NB: the shape of the distribution may vary according to population history, but should be identical for GC and AT alleles

53 Frequency distribution of AT and GC alleles at silent sites 410 SNPs with allele frequency (Cargill et al 1999) Chimpanzee as an outgroup to orientate mutations GC alleles segregate at significantly higher frequencies than AT alleles in GC-median and GC-rich genes Duret et al. 2002

54 Frequency distribution of GC and AT alleles Spencer (2006): analysis of HAPMAP data (SNPs from 60 unrelated individuals) The fixation bias in favor of GC increases near recombination hotspots

55 Frequency distribution of GC and AT alleles Spencer (2006) Average Derived Frequency Allele AT->GC Allele GC->AT Allele GC->GC Allele AT->AT


Download ppt "Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5."

Similar presentations


Ads by Google