Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker.

Similar presentations


Presentation on theme: "Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker."— Presentation transcript:

1 Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker

2 Genomic G+C content

3 Genomic GC content

4 Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG

5 Genomic GC content

6 Variation

7 Correlations

8 Explanations Mutation bias Suoeka (1961) & Freese (1962) Intrinsic and/or extrinsic Selection Many authors Biased gene conversion Anonymous referees

9 Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher amongst N fixers Temperature higher amongst thermophiles?

10 Evidence of selection I Escherichia coli Mutation pattern 273 GC  AT versus 131 AT  GC Predicted GC content = 0.32 Observed GC content = 0.50 Observed GC at neutral sites = 0.58 Lynch (2007) Origins of genome architecture

11 Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010)

12 Phylogenetic analysis GAAGGG

13 Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010) Excess of GC  AT

14 Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC  AT mutations = # AT  GC mutations

15 Why? If GC stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts

16 Identifying mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA

17 Orienting mutations Outgroup ACT GCT TTC GCT TTA TGG Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA GC  AT = 1 AT  GC = 1

18 Orienting mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCGCGCGAGA GC  AT = 1 AT  GC = 1

19 Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC  AT = # AT  GC

20 Four-fold synonymous sites

21 Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG

22 Data Popset Keyword “bacteria” 8 or more sequences from same species 149 bacterial species 8 phyla, 15 classes and 77 genera 1 or more genes 10 or more synonymous polymorphisms 4-fold diversity < 0.1

23 Overall result No. of SNPs GC  AT11045 AT  GC8309 P< 0.0001

24 Bias versus GC4 Z = GC  AT GC  AT No. speciesZ > 0.5P-value GC-rich8269<0.0001 GC-poor67250.050

25 Phylogenetic distribution PhylumClassNo. of speciesGC4 range Mean Z (GC4<0.34) Mean Z (GC4>0.34) Actinobacteria 30.64-0.93no species0.64 Bacteroidetes 30.12-0.460.430.36 Chlamydiae+Chlamydiae 20.21-0.300.45no species CyanobacteriaChroococcales 20.38-0.51no species0.53 CyanobacteriaNostocales 30.26-0.310.45no species CyanobacteriaOscillatoriales 20.41no species0.38 CyanobacteriaStigonemales 10.40no species0.59 FirmicutesBacilli 270.085-0.680.440.58 FirmicutesClostridia 50.050-0.280.34no species ProteobacteriaAlphaproteobacteria 160.099-0.940.430.65 ProteobacteriaBetaproteobacteria 60.66-0.96no species0.67 Proteobacteriadelta/epsilon 60.15-0.990.490.78 Proteobacteria Gammaproteobacte ria 620.095-0.950.500.66 Spirochaetes 70.12-0.600.450.54 TenericutesMollicutes 40.023-0.240.33no species

26 Potential problems Infinite sites assumption Sequencing error

27 Infinite sites assumption Each mutation occurs at a site which is not polymorphic

28 Infinite sites assumption If GC content stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts

29 Finite sites assumption If GC content stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts But some mutations not evident as poly

30 Finite sites GC rich sequence Implies rate of AT  GC > rate of GC  AT Mutation rate low #AT  GC poly = # GC  AT poly Mutation rate high #AT  GC poly < # GC  AT poly

31 Finite sites theory GCAT uμ vμ Assume : stationary popn stationary GC

32 Finite sites theory

33 0.6 0.7 0.8 0.90.95

34 Predicting Z Assume finite sites neutrality Use GC4 to get f Use observed diversity to estimate μ Predict Z

35 Z pred

36 Z-Z pred No. of speciesZ-Z pred > 0P-value GC-rich8261<0.0001 GC-poor67380.33

37 Mutation rate variation

38 Z-Z pred (exponential rates) No. of speciesZ-Z pred > 0P-value GC-rich82560.0012 GC-poor67460.003

39 Sequencing error No. of speciesZ > 0.5P-value GC-rich8260<0.0001

40 Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

41 Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

42 Non-stationary GC content

43 Non-stationary base composition

44 Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

45 Selection on codon usage Amino AcidCodonHigh usageLow usage PhenylalanineUUU0.220.71 UUC0.780.29 ValineGUU0.460.36 GUC0.090.19 GUA0.240.23 GUG0.210.23

46 Translational efficiency No. of speciesZ > 0.5P-value GC-rich3129<0.0001

47 Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

48 Biased gene conversion ATAT CGCG AGAG CTCT CGCG CGCG

49 Four gamete test G A G T C A G A G T C A C T No recombination Recombination

50 Biased gene conversion No. speciesZ > 0.5P-value GC-rich28190.087 GC  ATAT  GCP-value No. of SNPs1079844<0.0001

51 Biased gene conversion GCAT -ww if N e w >> 1 BGC effective if N e w << 1 BGC ineffective

52 Biased gene conversion r / mp-value GC4-0.0760.67 Z0.0030.99 Z-Z pred 0.0260.88 GC4 pred -0.1150.52 34 species with estimate of r / m Vos & Didelot (2009) ISME J.

53 Biased gene conversion θ r / mp-value GC40.0390.83 Z0.110.55 Z-Z pred 0.180.30 GC4 pred -0.0310.86

54 Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

55 Selection on GC content GCAT uμ vμ +s-s

56 Selection on GC content

57 Selection on GC4

58 f = α + β GC4 f = 0.2 + 0.35 GC4

59 Selection on GC4 f = α + β GC4 f = 0.2 + 0.35 GC4

60 Summary Large excess of GC  AT mutations at 4-fold sites Particularly in GC-rich species Not due to Infinite sites Sequencing error Translational selection Biased gene conversion Therefore Selection on GC4

61 Selection on genomic GC Genomic GC GC4

62 Environmental meta-genomics Foerstner et al. (2005) EMBO Reports

63 Environmental meta-genomics

64 Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher GC amongst N fixers Temperature higher amongst thermophiles?

65 Thanks Falk Hildebrand Axel Meyer

66 Further reading Hildebrand et al. (2010) PLoS Genetics Hershberg and Petrov (2010) PLoS Genetics Rocha and Feil (2010) PLoS Genetics

67 Protein coding sites


Download ppt "Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker."

Similar presentations


Ads by Google