Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker.

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

Microbial Genetics Chapter 8. Structure and Function of Genetic Material w DNA & RNA w DNA deoxyribonucleic acid w RNA ribonucleic acid w Nucleotides.
DNA Function genetic information –how to build/grow, operate, and repair cells –Specifically how and when to make proteins passed from one cell generation.
A novel method for measuring codon usage bias and estimating its statistical significance Codon usage bias or CUB, a phenomenon in which synonymous codons.
Luciano Brocchieri, PhD Research Interests. Summary of Research Interests 1.Gene identification and genome annotation 2.The evolution of genome-sequence.
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Tomris Cesuroglu, MD Institute for Public Health Genomics PAOG nascholing Jeugdgezondheidszorg Maastricht, 25 January 2011.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
From population genetics to variation among species: Computing the rate of fixations.
The Distribution of Fitness Effects of Mutations in Humans and Flies
Scott Williamson and Carlos Bustamante
1. How does conjugation work? Sex in Bacteria How do bacteria exchange DNA.
The phylogenetics project data revealed! October 4, 2010 BIOS E-127.
Selection upon codons BIOS E *Aside: shallow trees are strange… And ignore question 7. of assignment…
Introduction to Molecular Biology. G-C and A-T pairing.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
The phylogenetics project data revealed! October 4, 2010 OEB 192.
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Nature and Action of the Gene
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4.
Lab 11 :Test of Neutrality and Evidence for Selection.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
Introduction to Bioinformatics.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
Introduction: DNA REPLICATION ________ Chromosomes in the original cell ________ Chromosomes after DNA replication Two cells; each with _______ Chromosomes.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Selectionist view: allele substitution and polymorphism
NEW TOPIC: MOLECULAR EVOLUTION.
Lab 11 :Test of Neutrality and Evidence for Selection
Finding genes in the genome
The ‘redundant’ code.
First lesson back TASK 1 – GOT THROUGH HW TRANSLATION QUESTIONS TASK 2 – REVISE TRANSCRIPTION AND TRANSLATION.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
DNA, RNA and Protein.
1. 2 Discovering the codon bias 3 Il codice genetico è DEGENERATO.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Discovering the codon bias
Catalyst What are the subunits of proteins?
OMICS Journals are welcoming Submissions
Causes of Variation in Substitution Rates
Modelling Proteomes.
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Distances.
Molecular basis of evolution.
What are the Patterns Of Nucleotide Substitution Within Coding and
Summary and Recommendations
Molecular evolution: traditional tests of neutrality
16.1 – Genetic Variation in Bacteria
Extra chromosomal Agents Transposable elements
Broad-Spectrum Antibiotic Activity of the Arylomycin Natural Products Is Masked by Natural Target Mutations  Peter A. Smith, Tucker C. Roberts, Floyd.
Evolution of Biodiversity
Rare and abundant codons.
Summary and Recommendations
Patterns of amino acid usage and its GC-content of synonymous codons in 65 nuclear genomes in this study. Patterns of amino acid usage and its GC-content.
Presentation transcript:

Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker

Genomic G+C content

Genomic GC content

Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG

Genomic GC content

Variation

Correlations

Explanations Mutation bias Suoeka (1961) & Freese (1962) Intrinsic and/or extrinsic Selection Many authors Biased gene conversion Anonymous referees

Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher amongst N fixers Temperature higher amongst thermophiles?

Evidence of selection I Escherichia coli Mutation pattern 273 GC  AT versus 131 AT  GC Predicted GC content = 0.32 Observed GC content = 0.50 Observed GC at neutral sites = 0.58 Lynch (2007) Origins of genome architecture

Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010)

Phylogenetic analysis GAAGGG

Evidence of selection II Phylogenetic analyses Mycobacterium leprae (Lynch 2007) Escherichia coli (Balbi et al. 2009) 5 pathogenic bacteria (Hershberg and Petrov 2010) Excess of GC  AT

Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC  AT mutations = # AT  GC mutations

Why? If GC stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts

Identifying mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA

Orienting mutations Outgroup ACT GCT TTC GCT TTA TGG Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCCGCGGAGA GC  AT = 1 AT  GC = 1

Orienting mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTC GCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TCTCGCGCGAGA GC  AT = 1 AT  GC = 1

Test of mutation bias If GC content is Due to mutation bias alone Stationary And the infinite sites assumption holds Then # GC  AT = # AT  GC

Four-fold synonymous sites

Codons ATA CCC CTA CCT Non-synonymous Synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG

Data Popset Keyword “bacteria” 8 or more sequences from same species 149 bacterial species 8 phyla, 15 classes and 77 genera 1 or more genes 10 or more synonymous polymorphisms 4-fold diversity < 0.1

Overall result No. of SNPs GC  AT11045 AT  GC8309 P<

Bias versus GC4 Z = GC  AT GC  AT No. speciesZ > 0.5P-value GC-rich8269< GC-poor

Phylogenetic distribution PhylumClassNo. of speciesGC4 range Mean Z (GC4<0.34) Mean Z (GC4>0.34) Actinobacteria no species0.64 Bacteroidetes Chlamydiae+Chlamydiae no species CyanobacteriaChroococcales no species0.53 CyanobacteriaNostocales no species CyanobacteriaOscillatoriales 20.41no species0.38 CyanobacteriaStigonemales 10.40no species0.59 FirmicutesBacilli FirmicutesClostridia no species ProteobacteriaAlphaproteobacteria ProteobacteriaBetaproteobacteria no species0.67 Proteobacteriadelta/epsilon Proteobacteria Gammaproteobacte ria Spirochaetes TenericutesMollicutes no species

Potential problems Infinite sites assumption Sequencing error

Infinite sites assumption Each mutation occurs at a site which is not polymorphic

Infinite sites assumption If GC content stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts

Finite sites assumption If GC content stationary #GC  AT subs = #AT  GC subs All neutral mutations have same chance of fixation #GC  AT muts = #AT  GC muts But some mutations not evident as poly

Finite sites GC rich sequence Implies rate of AT  GC > rate of GC  AT Mutation rate low #AT  GC poly = # GC  AT poly Mutation rate high #AT  GC poly < # GC  AT poly

Finite sites theory GCAT uμ vμ Assume : stationary popn stationary GC

Finite sites theory

Predicting Z Assume finite sites neutrality Use GC4 to get f Use observed diversity to estimate μ Predict Z

Z pred

Z-Z pred No. of speciesZ-Z pred > 0P-value GC-rich8261< GC-poor

Mutation rate variation

Z-Z pred (exponential rates) No. of speciesZ-Z pred > 0P-value GC-rich GC-poor

Sequencing error No. of speciesZ > 0.5P-value GC-rich8260<0.0001

Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

Non-stationary GC content

Non-stationary base composition

Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

Selection on codon usage Amino AcidCodonHigh usageLow usage PhenylalanineUUU UUC ValineGUU GUC GUA GUG

Translational efficiency No. of speciesZ > 0.5P-value GC-rich3129<0.0001

Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

Biased gene conversion ATAT CGCG AGAG CTCT CGCG CGCG

Four gamete test G A G T C A G A G T C A C T No recombination Recombination

Biased gene conversion No. speciesZ > 0.5P-value GC-rich GC  ATAT  GCP-value No. of SNPs <0.0001

Biased gene conversion GCAT -ww if N e w >> 1 BGC effective if N e w << 1 BGC ineffective

Biased gene conversion r / mp-value GC Z Z-Z pred GC4 pred species with estimate of r / m Vos & Didelot (2009) ISME J.

Biased gene conversion θ r / mp-value GC Z Z-Z pred GC4 pred

Explanations Non-stationary base composition Selection for translational efficiency Biased gene conversion Selection upon base composition

Selection on GC content GCAT uμ vμ +s-s

Selection on GC content

Selection on GC4

f = α + β GC4 f = GC4

Selection on GC4 f = α + β GC4 f = GC4

Summary Large excess of GC  AT mutations at 4-fold sites Particularly in GC-rich species Not due to Infinite sites Sequencing error Translational selection Biased gene conversion Therefore Selection on GC4

Selection on genomic GC Genomic GC GC4

Environmental meta-genomics Foerstner et al. (2005) EMBO Reports

Environmental meta-genomics

Correlates Genome size positive correlation Lifestyle higher GC in free living Aerobiosis higher GC in aerobic Nitrogen utilization higher GC amongst N fixers Temperature higher amongst thermophiles?

Thanks Falk Hildebrand Axel Meyer

Further reading Hildebrand et al. (2010) PLoS Genetics Hershberg and Petrov (2010) PLoS Genetics Rocha and Feil (2010) PLoS Genetics

Protein coding sites