Translation elongation, amino acid usage, and codon usage indices Xuhua Xia xxia@uottawa.ca http:// dambe.bio.uottawa.ca
Objectives Understand how amino acid and codon usage biases affect translation efficiency and gene expression Biomedical and biopharmaceutical relevance Protein drug production in pharmaceutical industry Transgenic experiments in agriculture Factors affecting amino acid and codon usage bias Indices measuring codon usage bias Develop bioinformatic skills to study the genomic codon usage. Slide 2 Xuhua Xia
Energetic Cost Amino acid 1-letter code Precursor metabolites Energetic cost ~P H Total ~P Ala A pyr 1.0 5.3 11.7 Cys C 3pg 7.3 8.7 24.7 Asp D oaa 1.3 5.7 12.7 Glu E _kg 2.7 6.3 15.3 Phe F 2 pep,eryP 13.3 19.3 52.0 Gly G 2.3 4.7 His penP 20.3 9.0 38.3 Ile I pyr, oaa 4.3 14.0 32.3 Lys K oaa, pyr 13.0 30.3 Leu L 2 pyr, acCoA 12.3 27.3 Met M oaa, Cys, _pyr 9.7 34.3 Asn N 3.3 14.7 Pro P 3.7 8.3 Gln Q 16.3 Arg R 10.7 Ser S Thr T 7.7 18.7 Val V 2 pyr 2.0 23.3 Trp W 2 pep, eryP, PRPP, _pyr 27.7 74.3 Tyr Y eryP, 2 pep 18.3 50.0 Hiroshi Akashi and Takashi Gojobori 2002, PNAS 99:3695–3700 Slide 3 Xuhua Xia
Numerical Prediction Prediction: Usage of energetically expensive (and also rare) amino acids should decrease with gene expressionLarge ~P/Copy should be associated with small NumCopy and small ~P/Copy should be associated with large NumCopy. Slide 4 Xuhua Xia
AA usage and tRNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics. 149: 37:44 Slide 5
AA usage and tRNA gene copies F G H I K L1 L2 N P Q R1 R2 S1 S2 T V W Y y = 231.88x + 244.93 r = 0.8426 p<0.0001 200 400 600 800 1000 1200 1400 1600 1800 1 2 3 4 5 6 7 Number of tRNA genes in E. coli AA Freq in 11 ssDNA coliphages Chithambaram, S. et al. 2014. Genetics: 197:301-315
Number of synonymous codons Slide 7 Xuhua Xia
Mutation bias Amino acid usage in E. coli K12 (NC_000913) and S. cerevisiae (NC_001133-NC_001148) coding sequences. Amino acids encoded by AT-rich codons are in red, and those encode by GC-rich codons are bold blue. Yeast (Saccharomyces cerevisiae) is relatively more AT-rich (0.3090, 0.1917, 0.1913, and 0.3080 for A, C, G, and T, respectively). AA Codon Ecoli Yeast Ecoli% Yeast% Ala GCT,GCC,GCA,GCG 125332 160810 9.5527 5.4966 Arg CGT,CGC,CGA,CGG,AGA,AGG 72502 130068 5.5260 4.4458 Asn AAT,AAC 51075 179836 3.8929 6.1469 Asp GAT,GAC 67349 171072 5.1333 5.8473 Cys TGT,TGC 15188 37093 1.1576 1.2679 Gln CAA,CAG 58360 115741 4.4481 3.9561 Glu GAA,GAG 75786 191267 5.7763 6.5376 Gly GGT,GGC,GGA,GGG 96701 145433 7.3705 4.9710 His CAT,CAC 29751 63505 2.2676 2.1706 Ile ATT,ATC,ATA 78845 191677 6.0095 6.5516 Leu TTG,TTA,CTT,CTC,CTA,CTG 140571 277988 10.7142 9.5017 Lys AAA,AAG 57620 214842 4.3917 7.3434 Me t ATG 60672 2.8272 2.0738 Phe TTT,TTC 51131 129516 3.8972 4.4269 Pro CCT,CCC,CCA,CCG 58293 128177 4.4430 4.3811 Ser TCT,TCC,TCA,TCG,AGT,AGC 75661 263096 5.7668 8.9927 Thr ACT,ACC,ACA,ACG 70494 173084 5.3730 5.9161 Trp TGG 20060 30387 1.5290 1.0386 Tyr TAT,TAC 37134 98746 2.8303 3.3752 Val GTT,GTC,GTA,GTG 93061 162642 7.0930 5.5592 Slide 8 Xuhua Xia
Summary of AA usage Selection: Mutation Energetic cost: mass-produced proteins should use cheap amino acids. Translation efficiency: mass-produced proteins should use abundant amino acids amino acids carried by many tRNAs (need to control for number of synonymous codons to evaluate its effect) Mutation AT-rich codons increases with AT-biased mutation GC-rich codon increases with GC-biased mutation Slide 9 Xuhua Xia
Codon Usage Bias Observation: Strongly biased codon usage in a variety of species ranging from viruses, mitochondria, plastids, prokaryotes and eukaryotes. Hypotheses: Differential mutation hypothesis, e.g., Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144:1309-1320 ) Different selection hypothesis, e.g., (Xia 1998 Genetics 149: 37-44) Predictions: From mutation hypothesis: Concordance between codon usage and mutation pressure From Selection hypothesis: Concordance between differential availability of tRNA and differential codon usage. The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is positively correlated with gene expression). UCC~tRNA~Gly GCC~tRNA~Gly Polycistronic mRNA Ribosome Gene 1 Gene 2 Gene 3 RNA polymerase Protein Slide 10 Xuhua Xia
Codon usage of HEGs in yeast You may be wondering about Cys codon family which has 4 tRNAs matching UGC, but none matching UGU. We would have predicted that UGC should be preferred, but the opposite is true. Why? One might think that, because Cys is rarely used, the codon family is not under selection, so that codon usage will be at the mercy of mutation bias. Because the yeast genome is AT-biased, we expect U-ending codon to be more than C-ending codon. Unfortunately, the explanation is wrong because 1) the mutation bias is not sufficient for the 3/39 ratio, and 2) the lowly expressed genes, which should be even more affected by mutation bias, did not exhibit a strong bias comparable to 3/9. This criticism is also applicable to another explanation stating that the GCA anticodon can decode C-ending and U-ending codons equally well. In short, one should UGU to code Cys to improve translation efficiency, but we do not know why. Slide 11 Xuhua Xia Xia 2007. Bioinformatics and the cell.
Major and minor codons Major codon: the codon in a synonymous codon family that can be most efficiently translated in a species, typically with three associated properties: it is over-represented in highly expressed genes relative to lowly expressed genes. it corresponds to the most abundant isoacceptor tRNA replacing it with another codon leads to reduced translation efficiency (reduced protein production) Minor codon is the opposite Their identification is NOT based on the codon frequencies of all coding sequences in a species Different species may have different major and minor codons in the same synonymous codon family. Slide 12 Xuhua Xia
Calculation of RSCU RSCU and proportion: Different scaling. RSCU (Sharp et al. 1986) is codon-specific Slide 13 Xuhua Xia
Codon adaptation: E. coli & phage y = 0.4046x + 0.5954 R 2 = 0.672 0.5 1 1.5 0.0 1.0 2.0 2.5 3.0 3.5 E. coli RSCU Phage TLS RSCU Problem with mutation bias shared by both host and phage
Calculation of CAI N2,3,4: Number of 2-, 3-, 4-fold codon families Compound 6- or 8-fold codon families should be broken into two codon families CAI is gene-specific. 0 CAI 1 CAI computed with different reference sets are not comparable. Problem with computing w as Fi/Fi.max: Suppose an amino acid is rarely used in highly expressed genes, then there is little selection on it, and the codon usage might be close to even, with wi 1. Now if we have a lowly expressed gene that happen to be made of entire of this amino acid, then the CAI for this lowly expressed gene would be 1, which is misleading. There has been no good alternative. Further research is needed. Slide 15 Xuhua Xia
Weak mRNA predictive power y = 5.6507x + 4.1367 R 2 = 0.1936 10 20 30 40 50 60 70 80 0.5 1.5 2.5 3.5 4.5 mRNA abundance Protein abundance ENO1 FRS2 Slide 16 Xuhua Xia
Effect of Codon Usage Bias y = 70.398x - 11.739 R 2 = 0.5668 10 20 30 40 50 60 70 80 0.05 0.25 0.45 0.65 0.85 Codon usage bias Protein abundance ENO1 FRS2 Slide 17 Xuhua Xia
Hypothesis and Predictions Met Leu Glu Lys Gln Arg Trp tRNAMet/CAU tRNALeu/UAA tRNAGlu/UUC tRNALys/UUU tRNAGln/UUG tRNAArg/UCU tRNATrp/UCA AUG UUG GAG AAG CAG AGG UGG AUA UUA GAA AAA CAA AGA UGA A-ending codons are favoured by both mutation and tRNA-mediated selection. AUA is favoured by mutation, but not by tRNA-mediated selection Predictions: 1. Proportion of A-ending codons (PNNA = NNNA/NNNG) or RSCU should be smaller in the Met codon family than in other R-ending codon families: 2. Availability of tRNAMet/UAU should increase PAUA. Xuhua Xia Xia et al. 2007
Testing prediction 1 Carullo, M. and Xia, X. 2008 J Mol Evol 66:484–493. Slide 19 Xuhua Xia
Testing prediction 2 Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNAMet/CAU and tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene.
Why a systems biology perspective? No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. ... in his correct but somewhat awkward English: “”. In short, you should consider everything that is relevant. Of course his statement did not come out of a vacuum. At that time, a lot of data involving unbalanced experimental designs and multi-factor interactions have accumulated, and one is prone to draw wrong conclusions if one does not use balanced factorial designs and does not think broadly and critically. Here is one real data set to illustrate this point – Simpson’s paradox. --Ronald A. Fisher (1926). Journal of the Ministry of Agriculture of Great Britain 33: 503–513
Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones 73% (192/263) 69% (55/80) Pooled 78% (273/350) 83% (289/350) C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292 (6524): 879–882 Treatment A: all open procedures Treatment B: percutaneous nephrolithotomy Question: which treatment is better? This example is from a study of the efficacy of two treatment for kidney stones. (pointing to the first cell). Here 87 is the total number of patients in this category and 81 is the number of successes. 93% is the percentage of the success in each group. As we can see clearly, treatment A is more efficacious than treatment B in both “Small stones” group and the “Large Stones” group. However, if we pool these two groups together, we see that treatment B has greater success rate than treatment B. We thus would draw a wrong conclusion if we fail to consider the confounding effect of stone size. But can we now conclude that treatment A is better than treatment B? Such a conclusion would be highly significant because it can guide us in our choice of the treatment if we happen to have a kidney stone. Unfortunately, we cannot draw this conclusion because the success rate of both treatments changes over time. We can only say that treatment A is better than treatment B at the time of data collection and cannot provide us any guidance today. Such a conclusion, albeit scientifically correct, seems quite useless and trivial. You see that a correct conclusion is often trivial, and a potentially wrong generalization that treatment A is better than treatment B appears much more significant. So if you want your conclusions to be highly significant, don’t be too correct, because it will then be trivial.
RSCU (HIV-1 vs Human) (a) E G I K L P Q R S T V 0.5 1 1.5 2 2.5 RSCU (Human) RSCU (HIV-1) A-ending C-ending G-ending U-ending Fig. 1. Relative synonymous codon usage (RSCU) of HIV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. van Weringh et al. 2011. MBE. Slide 23 Xuhua Xia
RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. Slide 24 Xuhua Xia
Differential adaptation: early & late genes The conventional codon-anticodon prediction The new one The HIV-1 Rev response element (RRE) is a highly structured, ~350 nucleotide RNA segment present in the Env coding region of unspliced and partially spliced viral mRNAs. In the presence of the HIV-1 accessory protein Rev, HIV-1 mRNAs that contain the RRE can be exported from the nucleus to the cytoplasm for downstream events such as translation and virion packaging
Any problem with the mutation hypothesis? Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding sequences (CDS). Gene CDS (bp) CAI tat 261 0.66875 rev 351 0.66211 nef 621 0.67523 gag 1503 0.62784 pol 3012 0.58139 vif 579 0.61941 vpr 291 0.64272 vpu 249 0.49068 env 2571 0.61924 CAI values may change depending on what reference set of highly expressed genes is used, but the relative magnitude should be maintained (unless the reference set is not of highly expressed genes) van Weringh et al. 2011. Molecular Biology and Evolution 28:1827-1834.
tRNA 1. Here we have two variables that need some explanation. Icodon measures deviation of HIV-1 codon usage from human codon usage, and ItRNA measures the selective enrichment. For example, human uses few AUA for Ile and has few tRNA with UAU anticodon. So HIV-1 is using a lot of AUA codons whereas human cells do not have a lot of AUA-decoding tRNAs. One would expect that, if there is any selective tRNA enrichment, then tRNA for AUA should be particularly enriched, which is true. These two variables are significantly and positively correlated. van Weringh et al. 2011. MBE. Slide 28 Xuhua Xia
I/A wobble pair is error-prone
Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library of 154 genes, all encoding the same protein but differing in degrees of codon adaptation, to quantify the effect of differential codon usage on protein production in E. coli. They concluded that “codon bias did not correlate with gene expression” and that “translation initiation, not elongation, is rate-limiting for gene expression” Slide 30 of x
Problem with CAI and a new ITE AA Codon Cfnon-HEG CFHEG tRNA A GCA 20 40 3 GCG 80 60 Identification of major and minor codons CAI ITE AA Codon CFnon-HEG CFHEG w pHEG pnon-HEG s A GCA 20 40 2/3 0.4 0.2 2 1 GCG 80 60 0.6 0.8 0.75 0.375 50 0.5 1.2 CAI is a special case of ITE (when there is no background codon usage bias) Slide 31 Xuhua Xia
Problem with CAI and a new ITE AA Codon CFnon-HEG CFHEG w Gene1 Gene2 A GCA 20 40 2/3 10 GCG 80 60 1 30 𝐶𝐴𝐼= 𝑒 𝐹 𝑖 ln( 𝑤 𝑖 ) 𝐹 𝑖 CAI1 = 0.9221; CAI2 = 0.8503 Wrong conclusions: 1. Excellent codon adaptation in the codon family (high CAI values) 2. Gene 1 has better codon adaptation than Gene2. AA Codon CFnon-HEG CFHEG pHEG pnon-HEG s w Gene1 Gene2 A GCA 20 40 0.4 0.2 2 1 10 GCG 80 60 0.6 0.8 0.75 0.375 30 E. coli data 𝐼 𝑇𝐸 = 𝑒 𝐹 𝑖 ln( 𝑤 𝑖 ) 𝐹 𝑖 ITE.1 = 0.4563;ITE.2 = 0.5552 Correct conclusions: 1. Poor codon adaptation in the codon family (low ITE values) 2. Gene 2 has better codon adaptation than Gene1. Slide 32 Xuhua Xia