A genome-wide perspective on translation of proteins Dec 2012 Regulatory Genomics Lecturer: Prof. Yitzhak Pilpel
Teaching assistant: Idan Frumkin Submit Sunday at midnight
The Central Dogma of Molecular Biology Expressing the genome DNAmRNAProtein Inactive DNA RNA
In the presence of Lactose The Lac Operon (Jacob and Monod) 4
Catabolism (breakdown of molecules, e.g. lactose) Anabolism (synthesis of molecules, e.g. amino acids) Gene is ON when substrate is present Gene is OFF when substrate is absent Gene is ON when substrate is absent Gene is OFF when substrate is present The basic logic of metabolic control
A combined transcription - translation control switch At the Attenuation mechanism Charles Yanofsky
The trp operon in e. coli
A negative control at the transcription level (similar and different from the lac operon)
How not to make too much triptophene? A fail safe mechanism complements transcription control At the translation level!
The up-stream ORF structure of the trp operon An uORF Mutual palindromes 1-2 are complementary 2-3 are complementary 3-4 are complementary
The various palindromic pairings 1-2, and Transcription terminator! Not a terminator!
High Trp Low Trp The structure of the Attenuation switch Ribosome RNA pol
Could that be implemented in eukaryotes as well? No! because requires co transcription- translation
Where does translation take place?
Spatial organization of the flow of genetic information in bacteria (Llopis Nature 2010) DNA =DNA =mRNA =Protein
Translation consists of initiation, elongation and termination 5’ 3’STOP Codon Anti-codon
The dynamics of translation
The ribosome reads nucleotide sequence and produces amino acid sequence based on the genetic code Some important properties of the code The code is (almost) universal There are 61 amino acid codons, and 3 STOP codons The code is “redundant” - many amino acids have more than one codon The genetic code is optimal wrt to many properties, such as error tolerance
The tRNA The generic formA specific formIn 3D
Aminoacyl tRNA synthetase: The really “smart” part 20 amino acids, 61 codons, 20 Aminoacyl tRNA synthetases Error rate: 1/10,000-1/100,000 (in-vitro; higher in-vivo)
The 20 canonical amino acids
Possible mechanisms of translational regulation optimality of ribosomal attachment site mRNA secondary structure codon usage
Multiple codons for the same amino acid C1 C2 C3 C4 C5 C6 Serine: UCU UCC UCA UCG AGC AGU Cysteine: UGU UGC Methionine: UGG STOP: UAA, UAG UGA
G T R Y E C Q A S F D C1C1C1C1C1C1C1C1C1C1C1 C2C2C2C2C2C2C2C2C2C2C2 C1C1C2C1C1C2C1C1C2C1C1 C2C2C2C2C1C1C1C1C1C1C1 C1C1C1C1C1C1C1C2C2C2C2 For a hypothetical protein of 300 amino acids with two-codon each, There are 2^300 possible nucleotide sequences These variants will code for the same protein, and are thus considered “synonymous”. Indeed evolution would easily exchange between them But are they all really equivalent??
The codon bias in genomes
Two potential types of sources for codon bias Mutation pattern (neutral) Selection Codon bias
The effect of (or on?) GC content Nucleotide composition Codon bias Coding Inter- genic Inter-genic composition (esp in bacteria) explain codon bias Mutation pressure SelectionAmino acid composition
Selection of codons might affect: Accuracy Throughput Costs Folding RNA-structure
AAACCAGAAUCGAAG … … … A simple model for translation efficiency Average: 4 AA Codon Amount Lys AAA 8 Asp AAC 6 Lys AAG 1 Asp AAU Thr ACA Thr ACC. Phe UUU 5’ 3’
The same protein can be encoded in many ways… amino acid sequence: MPKSNFRFGE ATG ATGCCT ATGCCC ATGCCA ATGCCG most efficient least efficient intermediate efficiency relative concentration of tRNA in the cell
Scoring coding sequences for efficiency in translation ATCCCAAAATCGAAT coding sequence translation efficiency score ( (geometric) average of all tRNA gene copy numbers) … … … … Efficient intermediate non-efficient tRNA Gene copies ( dos Reis et al. Nucleic Acids Res, 2004)
W i /W max if W i 0 w i = w mean else { dos Reis et al. NAR 2004 The tRNA Adaptation Index (tAI) ATCCCAAAATCGAAT … … … A simple model for translation efficiency Wobble Interaction
Correlation of tAI with experimentally determined protein levels r=0.63 Predicted translation efficiency Measured protein abundance (Ghaemmaghami et al. Nature 2003) Physiological
The correlation is quite high, but why not even higher? The limitations of the model tRNA gene copy numbers Model only capture elongation Difference in mRNA levels Protein are also degraded at different rates
The effective number of codons (Nc) - a measure of overall synonymous codon usage bias AA. Gly. codon. GGT GGC GGA GGG. Codon count Highly biased synonymous codon usage (Nc=20) Gene1 AA. Gly. codon. GGT GGC GGA GGG. Codon count. 3. No bias in synonymous codon usage (Nc≥61) Gene2 Wright, F. (1990). "The 'effective number of codons' used in a gene." Gene 87(1): 23-9.
Codon usage bias is correlated with translation efficiency r=-0.79 (p<0.001) Mutation pattern (neutral) Selection Codon bias
But not in all species (e.g. A. gossypii) r=-0.48 (p=0.218) Mutation pattern (neutral) Selection Codon bias
S. cerevisiaeS. bayanusC. glabrata A. gossypiiD. hansenii C. albicansY. lipolyticaS. pombe r p< < <0.001 Translation selection acts in some but not all species (e.g. debate on human…)
Correlation does not imply causality!! r=0.63 Predicted translation efficiency Measured protein abundance (Ghaemmaghami et al. Nature 2003) Physiological Evolutionary Physiological Z