Download presentation
Presentation is loading. Please wait.
Published byScot Sutton Modified over 9 years ago
1
Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker
2
Genomic G+C content
3
Genomic GC content
4
Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG
5
Genomic GC content
6
Variation
7
Correlations
8
Explanations Mutation bias Suoeka (1961) & Freese (1962) Intrinsic and/or extrinsic Selection Many authors Biased gene conversion Anonymous referees
9
Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher amongst N fixers Temperature higher amongst thermophiles?
10
Evidence of selection I Escherichia coli Mutation pattern 273 GC AT versus 131 AT GC Predicted GC content = 0.32 Observed GC content = 0.50 Observed GC at neutral sites = 0.58 Lynch (2007) Origins of genome architecture
11
Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010)
12
Phylogenetic analysis GAAGGG
13
Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010) Excess of GC AT
14
Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC AT mutations = # AT GC mutations
15
Why? If GC stationary #GC AT subs = #AT GC subs All neutral mutations have same chance of fixation #GC AT muts = #AT GC muts
16
Identifying mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA
17
Orienting mutations Outgroup ACT GCT TTC GCT TTA TGG Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA GC AT = 1 AT GC = 1
18
Orienting mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCGCGCGAGA GC AT = 1 AT GC = 1
19
Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC AT = # AT GC
20
Four-fold synonymous sites
21
Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG
22
Data Popset Keyword “bacteria” 8 or more sequences from same species 149 bacterial species 8 phyla, 15 classes and 77 genera 1 or more genes 10 or more synonymous polymorphisms 4-fold diversity < 0.1
23
Overall result No. of SNPs GC AT11045 AT GC8309 P< 0.0001
24
Bias versus GC4 Z = GC AT GC AT No. speciesZ > 0.5P-value GC-rich8269<0.0001 GC-poor67250.050
25
Phylogenetic distribution PhylumClassNo. of speciesGC4 range Mean Z (GC4<0.34) Mean Z (GC4>0.34) Actinobacteria 30.64-0.93no species0.64 Bacteroidetes 30.12-0.460.430.36 Chlamydiae+Chlamydiae 20.21-0.300.45no species CyanobacteriaChroococcales 20.38-0.51no species0.53 CyanobacteriaNostocales 30.26-0.310.45no species CyanobacteriaOscillatoriales 20.41no species0.38 CyanobacteriaStigonemales 10.40no species0.59 FirmicutesBacilli 270.085-0.680.440.58 FirmicutesClostridia 50.050-0.280.34no species ProteobacteriaAlphaproteobacteria 160.099-0.940.430.65 ProteobacteriaBetaproteobacteria 60.66-0.96no species0.67 Proteobacteriadelta/epsilon 60.15-0.990.490.78 Proteobacteria Gammaproteobacte ria 620.095-0.950.500.66 Spirochaetes 70.12-0.600.450.54 TenericutesMollicutes 40.023-0.240.33no species
26
Potential problems Infinite sites assumption Sequencing error
27
Infinite sites assumption Each mutation occurs at a site which is not polymorphic
28
Infinite sites assumption If GC content stationary #GC AT subs = #AT GC subs All neutral mutations have same chance of fixation #GC AT muts = #AT GC muts
29
Finite sites assumption If GC content stationary #GC AT subs = #AT GC subs All neutral mutations have same chance of fixation #GC AT muts = #AT GC muts But some mutations not evident as poly
30
Finite sites GC rich sequence Implies rate of AT GC > rate of GC AT Mutation rate low #AT GC poly = # GC AT poly Mutation rate high #AT GC poly < # GC AT poly
31
Finite sites theory GCAT uμ vμ Assume : stationary popn stationary GC
32
Finite sites theory
33
0.6 0.7 0.8 0.90.95
34
Predicting Z Assume finite sites neutrality Use GC4 to get f Use observed diversity to estimate μ Predict Z
35
Z pred
36
Z-Z pred No. of speciesZ-Z pred > 0P-value GC-rich8261<0.0001 GC-poor67380.33
37
Mutation rate variation
38
Z-Z pred (exponential rates) No. of speciesZ-Z pred > 0P-value GC-rich82560.0012 GC-poor67460.003
39
Sequencing error No. of speciesZ > 0.5P-value GC-rich8260<0.0001
40
Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition
41
Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition
42
Non-stationary GC content
43
Non-stationary base composition
44
Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition
45
Selection on codon usage Amino AcidCodonHigh usageLow usage PhenylalanineUUU0.220.71 UUC0.780.29 ValineGUU0.460.36 GUC0.090.19 GUA0.240.23 GUG0.210.23
46
Translational efficiency No. of speciesZ > 0.5P-value GC-rich3129<0.0001
47
Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition
48
Biased gene conversion ATAT CGCG AGAG CTCT CGCG CGCG
49
Four gamete test G A G T C A G A G T C A C T No recombination Recombination
50
Biased gene conversion No. speciesZ > 0.5P-value GC-rich28190.087 GC ATAT GCP-value No. of SNPs1079844<0.0001
51
Biased gene conversion GCAT -ww if N e w >> 1 BGC effective if N e w << 1 BGC ineffective
52
Biased gene conversion r / mp-value GC4-0.0760.67 Z0.0030.99 Z-Z pred 0.0260.88 GC4 pred -0.1150.52 34 species with estimate of r / m Vos & Didelot (2009) ISME J.
53
Biased gene conversion θ r / mp-value GC40.0390.83 Z0.110.55 Z-Z pred 0.180.30 GC4 pred -0.0310.86
54
Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition
55
Selection on GC content GCAT uμ vμ +s-s
56
Selection on GC content
57
Selection on GC4
58
f = α + β GC4 f = 0.2 + 0.35 GC4
59
Selection on GC4 f = α + β GC4 f = 0.2 + 0.35 GC4
60
Summary Large excess of GC AT mutations at 4-fold sites Particularly in GC-rich species Not due to Infinite sites Sequencing error Translational selection Biased gene conversion Therefore Selection on GC4
61
Selection on genomic GC Genomic GC GC4
62
Environmental meta-genomics Foerstner et al. (2005) EMBO Reports
63
Environmental meta-genomics
64
Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher GC amongst N fixers Temperature higher amongst thermophiles?
65
Thanks Falk Hildebrand Axel Meyer
66
Further reading Hildebrand et al. (2010) PLoS Genetics Hershberg and Petrov (2010) PLoS Genetics Rocha and Feil (2010) PLoS Genetics
67
Protein coding sites
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.