Translation elongation, amino acid usage, and codon usage indices

Slides:

Advertisements

Similar presentations

Xuhua Xia Multiple regression Xuhua Xia

Advertisements

Molecular Genetics PaCES Summer Program in Environmental Science.

Transcription and Translation

CHAPTER 12 PROTEIN SYNTHESIS AND MUTATIONS -RNA -PROTEIN SYNTHESIS -MUTATIONS.

Sec 5.1 / 5.2. One Gene – One Polypeptide Hypothesis early 20 th century – Archibald Garrod physician that noticed that some metabolic errors were found.

Starter Read 11.4 Answer concept checks 2-4.

Codon usage bias Ref: Chapter 9 Xuhua Xia dambe.bio.uottawa.ca.

7. Protein Synthesis and the Genetic Code a). Overview of translation i). Requirements for protein synthesis ii). messenger RNA iii). Ribosomes and polysomes.

Codon usage bias Ref: Chapter 9

Cell Division and Gene Expression

Chapter 14 Genetic Code and Transcription. You Must Know The differences between replication (from chapter 13), transcription and translation and the.

1 Codon Usage. 2 Discovering the codon bias 3 In the year 1980 Four researchers from Lyon analyzed ALL published mRNA sequences of more than about 50.

M3/31EXAM IIChapters 8-12, parts of 2, 3 W4/2Transcription and TranslationChapters 4, 15 M4/7"Molecular" GeneticsChapter 16 W4/9"Classical" GeneticsChapter.

©1998 Timothy G. Standish From DNA To RNA To Protein Timothy G. Standish, Ph. D.

Stephen Taylor i-Biology.net Photo credit: Firefly with glow, by Terry Priest on Flickr (Creative Commons)

Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.

Today 14.2 & 14.4 Transcription and Translation /student_view0/chapter3/animation__p rotein_synthesis__quiz_3_.html.

Figure 17.4 DNA molecule Gene 1 Gene 2 Gene 3 DNA strand (template) TRANSCRIPTION mRNA Protein TRANSLATION Amino acid ACC AAACCGAG T UGG U UU G GC UC.

1. 2 Discovering the codon bias 3 Il codice genetico è DEGENERATO.

Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising

How Genes Work: From DNA to RNA to Protein Chapter 17.

Gene Translation:RNA -> Protein How does a particular sequence of nucleotides specify a particular sequence of amino acids?nucleotidesamino acids The answer:

Discovering the codon bias

F. PROTEIN SYNTHESIS [or translating the message]

LO: SWBAT explain how protein shape is determined and differentiate between the different types of mutations. DN: h/0 protein synthesis HW: Read pp.

Translation PROTEIN SYNTHESIS.

Whole process Step by step- from chromosomes to proteins.

Please turn in your homework

Which of the following would be the corresponding amino acid sequence that would be translated as a protein product of the following segment of DNA? A.

Transcription, Translation & Protein Synthesis

The blueprint of life; from DNA to Protein

From gene to protein DNA mRNA protein trait nucleus cytoplasm

Where is Cytochrome C? What is the role? Where does it come from?

Basic concepts in molecular evolution

Molecular Biology DNA Expression

Warm-Up 3/12/13 After transcription, an mRNA molecule with the sequence A U A C G C A G U was created. What was the sequence of the original DNA strand?

Transcription and Translation

What is Transcription and who is involved?

From Gene to Phenotype- part 2

Protein Synthesis: Transcription & Translation

Ch. 17 From Gene to Protein Thought Questions

Gene Expression: From Gene to Protein

Overview: The Flow of Genetic Information

Overview: The Flow of Genetic Information

Central Dogma of Molecular Biology From Genes to Protein

Outline What is an amino acid / protein

PROTEIN SYNTHESIS.

The genetic code © 2016 Paul Billiet ODWS.

Gene Expression: From Gene to Protein

Unbalanced design, relative contribution of IVs, and type I and type III SS Xuhua Xia Department of Biology University of Ottawa

Modeling Protein Synthesis

TRANSLATION Protein Synthesis

NOTE SHEET 13 – Protein Synthesis

1 Corinthians 1:10 10 Now I beseech you, brethren, by the name of our Lord Jesus Christ, that ye all speak the same thing, and that there be no divisions.

Warm Up 3 2/5 Can DNA leave the nucleus?

Today’s notes from the student table Something to write with

After leaving the nucleus, mRNA heads to a ribosome.

Central Dogma and the Genetic Code

1 Corinthians 1:10 10 Now I beseech you, brethren, by the name of our Lord Jesus Christ, that ye all speak the same thing, and that there be no divisions.

Protein Synthesis: Transcription & Translation

DNA, RNA, Amino Acids, Proteins, and Genes!.

RNA interference is essential for cellular quiescence

Protein Synthesis: Translation

Gene Protein Genome Proteome Genomics Proteomics.

DNA to proteins.

Protein Synthesis - Making Proteins

Nucleic Acids Review.

Presentation transcript:

Translation elongation, amino acid usage, and codon usage indices Xuhua Xia xxia@uottawa.ca http:// dambe.bio.uottawa.ca

Objectives Understand how amino acid and codon usage biases affect translation efficiency and gene expression Biomedical and biopharmaceutical relevance Protein drug production in pharmaceutical industry Transgenic experiments in agriculture Factors affecting amino acid and codon usage bias Indices measuring codon usage bias Develop bioinformatic skills to study the genomic codon usage. Slide 2 Xuhua Xia

Energetic Cost Amino acid 1-letter code Precursor metabolites Energetic cost ~P H Total ~P Ala A pyr 1.0 5.3 11.7 Cys C 3pg 7.3 8.7 24.7 Asp D oaa 1.3 5.7 12.7 Glu E _kg 2.7 6.3 15.3 Phe F 2 pep,eryP 13.3 19.3 52.0 Gly G 2.3 4.7 His penP 20.3 9.0 38.3 Ile I pyr, oaa 4.3 14.0 32.3 Lys K oaa, pyr 13.0 30.3 Leu L 2 pyr, acCoA 12.3 27.3 Met M oaa, Cys, _pyr 9.7 34.3 Asn N 3.3 14.7 Pro P 3.7 8.3 Gln Q 16.3 Arg R 10.7 Ser S Thr T 7.7 18.7 Val V 2 pyr 2.0 23.3 Trp W 2 pep, eryP, PRPP, _pyr 27.7 74.3 Tyr Y eryP, 2 pep 18.3 50.0 Hiroshi Akashi and Takashi Gojobori 2002, PNAS 99:3695–3700 Slide 3 Xuhua Xia

Numerical Prediction Prediction: Usage of energetically expensive (and also rare) amino acids should decrease with gene expressionLarge ~P/Copy should be associated with small NumCopy and small ~P/Copy should be associated with large NumCopy. Slide 4 Xuhua Xia

AA usage and tRNA abundance Saccharomyces cerevisiae Salmonella typhymurium Xia, X. 1998. Genetics. 149: 37:44 Slide 5

AA usage and tRNA gene copies F G H I K L1 L2 N P Q R1 R2 S1 S2 T V W Y y = 231.88x + 244.93 r = 0.8426 p<0.0001 200 400 600 800 1000 1200 1400 1600 1800 1 2 3 4 5 6 7 Number of tRNA genes in E. coli AA Freq in 11 ssDNA coliphages Chithambaram, S. et al. 2014. Genetics: 197:301-315

Number of synonymous codons Slide 7 Xuhua Xia

Mutation bias Amino acid usage in E. coli K12 (NC_000913) and S. cerevisiae (NC_001133-NC_001148) coding sequences. Amino acids encoded by AT-rich codons are in red, and those encode by GC-rich codons are bold blue. Yeast (Saccharomyces cerevisiae) is relatively more AT-rich (0.3090, 0.1917, 0.1913, and 0.3080 for A, C, G, and T, respectively). AA Codon Ecoli Yeast Ecoli% Yeast% Ala GCT,GCC,GCA,GCG 125332 160810 9.5527 5.4966 Arg CGT,CGC,CGA,CGG,AGA,AGG 72502 130068 5.5260 4.4458 Asn AAT,AAC 51075 179836 3.8929 6.1469 Asp GAT,GAC 67349 171072 5.1333 5.8473 Cys TGT,TGC 15188 37093 1.1576 1.2679 Gln CAA,CAG 58360 115741 4.4481 3.9561 Glu GAA,GAG 75786 191267 5.7763 6.5376 Gly GGT,GGC,GGA,GGG 96701 145433 7.3705 4.9710 His CAT,CAC 29751 63505 2.2676 2.1706 Ile ATT,ATC,ATA 78845 191677 6.0095 6.5516 Leu TTG,TTA,CTT,CTC,CTA,CTG 140571 277988 10.7142 9.5017 Lys AAA,AAG 57620 214842 4.3917 7.3434 Me t ATG 60672 2.8272 2.0738 Phe TTT,TTC 51131 129516 3.8972 4.4269 Pro CCT,CCC,CCA,CCG 58293 128177 4.4430 4.3811 Ser TCT,TCC,TCA,TCG,AGT,AGC 75661 263096 5.7668 8.9927 Thr ACT,ACC,ACA,ACG 70494 173084 5.3730 5.9161 Trp TGG 20060 30387 1.5290 1.0386 Tyr TAT,TAC 37134 98746 2.8303 3.3752 Val GTT,GTC,GTA,GTG 93061 162642 7.0930 5.5592 Slide 8 Xuhua Xia

Summary of AA usage Selection: Mutation Energetic cost: mass-produced proteins should use cheap amino acids. Translation efficiency: mass-produced proteins should use abundant amino acids amino acids carried by many tRNAs (need to control for number of synonymous codons to evaluate its effect) Mutation AT-rich codons increases with AT-biased mutation GC-rich codon increases with GC-biased mutation Slide 9 Xuhua Xia

Codon Usage Bias Observation: Strongly biased codon usage in a variety of species ranging from viruses, mitochondria, plastids, prokaryotes and eukaryotes. Hypotheses: Differential mutation hypothesis, e.g., Transcriptional hypothesis of codon usage (Xia 1996 Genetics 144:1309-1320 ) Different selection hypothesis, e.g., (Xia 1998 Genetics 149: 37-44) Predictions: From mutation hypothesis: Concordance between codon usage and mutation pressure From Selection hypothesis: Concordance between differential availability of tRNA and differential codon usage. The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is positively correlated with gene expression). UCC~tRNA~Gly GCC~tRNA~Gly Polycistronic mRNA Ribosome Gene 1 Gene 2 Gene 3 RNA polymerase Protein Slide 10 Xuhua Xia

Codon usage of HEGs in yeast You may be wondering about Cys codon family which has 4 tRNAs matching UGC, but none matching UGU. We would have predicted that UGC should be preferred, but the opposite is true. Why? One might think that, because Cys is rarely used, the codon family is not under selection, so that codon usage will be at the mercy of mutation bias. Because the yeast genome is AT-biased, we expect U-ending codon to be more than C-ending codon. Unfortunately, the explanation is wrong because 1) the mutation bias is not sufficient for the 3/39 ratio, and 2) the lowly expressed genes, which should be even more affected by mutation bias, did not exhibit a strong bias comparable to 3/9. This criticism is also applicable to another explanation stating that the GCA anticodon can decode C-ending and U-ending codons equally well. In short, one should UGU to code Cys to improve translation efficiency, but we do not know why. Slide 11 Xuhua Xia Xia 2007. Bioinformatics and the cell.

Major and minor codons Major codon: the codon in a synonymous codon family that can be most efficiently translated in a species, typically with three associated properties: it is over-represented in highly expressed genes relative to lowly expressed genes. it corresponds to the most abundant isoacceptor tRNA replacing it with another codon leads to reduced translation efficiency (reduced protein production) Minor codon is the opposite Their identification is NOT based on the codon frequencies of all coding sequences in a species Different species may have different major and minor codons in the same synonymous codon family. Slide 12 Xuhua Xia

Calculation of RSCU RSCU and proportion: Different scaling. RSCU (Sharp et al. 1986) is codon-specific Slide 13 Xuhua Xia

Codon adaptation: E. coli & phage y = 0.4046x + 0.5954 R 2 = 0.672 0.5 1 1.5 0.0 1.0 2.0 2.5 3.0 3.5 E. coli RSCU Phage TLS RSCU Problem with mutation bias shared by both host and phage

Calculation of CAI N2,3,4: Number of 2-, 3-, 4-fold codon families Compound 6- or 8-fold codon families should be broken into two codon families CAI is gene-specific. 0  CAI  1 CAI computed with different reference sets are not comparable. Problem with computing w as Fi/Fi.max: Suppose an amino acid is rarely used in highly expressed genes, then there is little selection on it, and the codon usage might be close to even, with wi  1. Now if we have a lowly expressed gene that happen to be made of entire of this amino acid, then the CAI for this lowly expressed gene would be 1, which is misleading. There has been no good alternative. Further research is needed. Slide 15 Xuhua Xia

Weak mRNA predictive power y = 5.6507x + 4.1367 R 2 = 0.1936 10 20 30 40 50 60 70 80 0.5 1.5 2.5 3.5 4.5 mRNA abundance Protein abundance ENO1 FRS2 Slide 16 Xuhua Xia

Effect of Codon Usage Bias y = 70.398x - 11.739 R 2 = 0.5668 10 20 30 40 50 60 70 80 0.05 0.25 0.45 0.65 0.85 Codon usage bias Protein abundance ENO1 FRS2 Slide 17 Xuhua Xia

Hypothesis and Predictions Met Leu Glu Lys Gln Arg Trp tRNAMet/CAU tRNALeu/UAA tRNAGlu/UUC tRNALys/UUU tRNAGln/UUG tRNAArg/UCU tRNATrp/UCA AUG UUG GAG AAG CAG AGG UGG AUA UUA GAA AAA CAA AGA UGA A-ending codons are favoured by both mutation and tRNA-mediated selection. AUA is favoured by mutation, but not by tRNA-mediated selection Predictions: 1. Proportion of A-ending codons (PNNA = NNNA/NNNG) or RSCU should be smaller in the Met codon family than in other R-ending codon families: 2. Availability of tRNAMet/UAU should increase PAUA. Xuhua Xia Xia et al. 2007

Testing prediction 1 Carullo, M. and Xia, X. 2008 J Mol Evol 66:484–493. Slide 19 Xuhua Xia

Testing prediction 2 Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNAMet/CAU and tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene.

Why a systems biology perspective? No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will respond to a logical and carefully thought-out questionnaire; indeed, if we ask her a single question, she will often refuse to answer until some other topic has been discussed. ... in his correct but somewhat awkward English: “”. In short, you should consider everything that is relevant. Of course his statement did not come out of a vacuum. At that time, a lot of data involving unbalanced experimental designs and multi-factor interactions have accumulated, and one is prone to draw wrong conclusions if one does not use balanced factorial designs and does not think broadly and critically. Here is one real data set to illustrate this point – Simpson’s paradox. --Ronald A. Fisher (1926). Journal of the Ministry of Agriculture of Great Britain 33: 503–513

Simpson’s paradox Treatment A Treatment B Small Stones 93% (81/87) 87% (234/270) Large Stones 73% (192/263) 69% (55/80) Pooled 78% (273/350) 83% (289/350) C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292 (6524): 879–882 Treatment A: all open procedures Treatment B: percutaneous nephrolithotomy Question: which treatment is better? This example is from a study of the efficacy of two treatment for kidney stones. (pointing to the first cell). Here 87 is the total number of patients in this category and 81 is the number of successes. 93% is the percentage of the success in each group. As we can see clearly, treatment A is more efficacious than treatment B in both “Small stones” group and the “Large Stones” group. However, if we pool these two groups together, we see that treatment B has greater success rate than treatment B. We thus would draw a wrong conclusion if we fail to consider the confounding effect of stone size. But can we now conclude that treatment A is better than treatment B? Such a conclusion would be highly significant because it can guide us in our choice of the treatment if we happen to have a kidney stone. Unfortunately, we cannot draw this conclusion because the success rate of both treatments changes over time. We can only say that treatment A is better than treatment B at the time of data collection and cannot provide us any guidance today. Such a conclusion, albeit scientifically correct, seems quite useless and trivial. You see that a correct conclusion is often trivial, and a potentially wrong generalization that treatment A is better than treatment B appears much more significant. So if you want your conclusions to be highly significant, don’t be too correct, because it will then be trivial.

RSCU (HIV-1 vs Human) (a) E G I K L P Q R S T V 0.5 1 1.5 2 2.5 RSCU (Human) RSCU (HIV-1) A-ending C-ending G-ending U-ending Fig. 1. Relative synonymous codon usage (RSCU) of HIV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. van Weringh et al. 2011. MBE. Slide 23 Xuhua Xia

RSCU (HTLV-1 vs Human) Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human genes. Data points for codons ending with A, C, G or U are annotated with different combinations of colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and human and are annotated with their coded amino acids. Slide 24 Xuhua Xia

Differential adaptation: early & late genes The conventional codon-anticodon prediction The new one The HIV-1 Rev response element (RRE) is a highly structured, ~350 nucleotide RNA segment present in the Env coding region of unspliced and partially spliced viral mRNAs. In the presence of the HIV-1 accessory protein Rev, HIV-1 mRNAs that contain the RRE can be exported from the nucleus to the cytoplasm for downstream events such as translation and virion packaging

Any problem with the mutation hypothesis? Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding sequences (CDS). Gene CDS (bp) CAI tat 261 0.66875 rev 351 0.66211 nef 621 0.67523 gag 1503 0.62784 pol 3012 0.58139 vif 579 0.61941 vpr 291 0.64272 vpu 249 0.49068 env 2571 0.61924 CAI values may change depending on what reference set of highly expressed genes is used, but the relative magnitude should be maintained (unless the reference set is not of highly expressed genes) van Weringh et al. 2011. Molecular Biology and Evolution 28:1827-1834.

tRNA 1. Here we have two variables that need some explanation. Icodon measures deviation of HIV-1 codon usage from human codon usage, and ItRNA measures the selective enrichment. For example, human uses few AUA for Ile and has few tRNA with UAU anticodon. So HIV-1 is using a lot of AUA codons whereas human cells do not have a lot of AUA-decoding tRNAs. One would expect that, if there is any selective tRNA enrichment, then tRNA for AUA should be particularly enriched, which is true. These two variables are significantly and positively correlated. van Weringh et al. 2011. MBE. Slide 28 Xuhua Xia

I/A wobble pair is error-prone

Translation rate & codon adaptation Kudla et al. (2009, Science) engineered a synthetic library of 154 genes, all encoding the same protein but differing in degrees of codon adaptation, to quantify the effect of differential codon usage on protein production in E. coli. They concluded that “codon bias did not correlate with gene expression” and that “translation initiation, not elongation, is rate-limiting for gene expression” Slide 30 of x

Problem with CAI and a new ITE AA Codon Cfnon-HEG CFHEG tRNA A GCA 20 40 3 GCG 80 60 Identification of major and minor codons CAI ITE AA Codon CFnon-HEG CFHEG w pHEG pnon-HEG s A GCA 20 40 2/3 0.4 0.2 2 1 GCG 80 60 0.6 0.8 0.75 0.375 50 0.5 1.2 CAI is a special case of ITE (when there is no background codon usage bias) Slide 31 Xuhua Xia

Problem with CAI and a new ITE AA Codon CFnon-HEG CFHEG w Gene1 Gene2 A GCA 20 40 2/3 10 GCG 80 60 1 30 𝐶𝐴𝐼= 𝑒 𝐹 𝑖 ln⁡( 𝑤 𝑖 ) 𝐹 𝑖 CAI1 = 0.9221; CAI2 = 0.8503 Wrong conclusions: 1. Excellent codon adaptation in the codon family (high CAI values) 2. Gene 1 has better codon adaptation than Gene2. AA Codon CFnon-HEG CFHEG pHEG pnon-HEG s w Gene1 Gene2 A GCA 20 40 0.4 0.2 2 1 10 GCG 80 60 0.6 0.8 0.75 0.375 30 E. coli data 𝐼 𝑇𝐸 = 𝑒 𝐹 𝑖 ln⁡( 𝑤 𝑖 ) 𝐹 𝑖 ITE.1 = 0.4563；ITE.2 = 0.5552 Correct conclusions: 1. Poor codon adaptation in the codon family (low ITE values) 2. Gene 2 has better codon adaptation than Gene1. Slide 32 Xuhua Xia