Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques.

Similar presentations


Presentation on theme: "CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques."— Presentation transcript:

1 CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques

2 For bacterial genomes the main source of heterogeneity of the genetic text is the signal corresponding to the presence of coding information Mutual information in three consecutive letters - frequency of triplet ijk - frequency of letter i Introduction

3 Example: Codon bias in Ecoli Overall codon usage Highly expressed genes

4 Different types of codon bias Translational (mainly fast-growing bacteria) GC-rich (or AT-rich) codons are preffered Codons with G and C in 3 rd position are preffered (or A and T) Influenced by GC-skew (G-C/G+C) or AT-skew Influenced by strand (leading or lagging) Codon bias connected with genes from other organisms (horizontally transferred)

5 Questions How codon usage of different genes in different genomes is organized? How to describe codon bias quantatively? How to detect what is the main source of codon bias?

6 Qualitative study of codon usage We can describe every gene by its frequencies of codons – vector with 64 components (59 are interesting for studying codon bias) PCA (principal component analysis) and CA (correspondence analysis) are the most common techniques for exploratory study of codon usage Close points – genes with similar codon usage

7 Common pattern of fast-growing bacteria IV II I III Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

8 Typical case of fast-growing bacterium: Bacillus subtilis Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

9 Escherichia coli Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

10 Lower-eukaryotic organism: Saccharomyces cerevisiae Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

11 Higher-eukaryotic organism: Caenorhabditis elegans Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

12 Slow-growing bacterium: Helicobacter pylori Genes of class I (most of) Genes of class IV (hydrophobic)

13 Slow-growing bacterium: Borrelia burgdorferi Leading strand Lagging strand

14 Some conclusions: sources of sequence heterogeneity Hydrophobicity Evolutional pressure (translational bias) Horizontal transfer Different GC(AT)-content Strand heterogeneity

15 Quantative measures of bias Effective number of codons N c Relative Synonymous Codon Usage Relative Codon Adaptiveness [0..1]

16 Codon Adaptaion Index (CAI) Codon bias with respect to some small set of genes (Reference Set) f i – frequency of codon i, calculated over reference set S L – number of all codons in a gene g i – frequency of codon i in a gene

17 Expert chooses Reference Set Ribosomal proteins Elongation factors Glycolitic proteins …

18 Problems: Functions of genes need to be known Expert needs to know the type of codon bias already (else the results will be meaningless) The genes in Reference Set may not have the highest CAIs We use as a Reference Set the most biased genes with respect to dominating codon bias. It is not necessarily translational

19 The most biased set of genes S R Calculate CAI (with w i calculated over S R ) for every gene in genome Then every gene in S R has CAI higher than any gene which is not in S R We can have several S R for one genome, every of them reflects presence of some type of codon bias

20 Algorithm for detecting dominating codon bias 1. Calculate w i over 100% genes, and CAIs for all genes 2. Select 50% genes with the highest CAIs, calculate w i, recalculate CAIs 3. Select 25% genes with the highest CAIs, calculate w i, recalculate CAIs … When we will have to select 1% of genes or less, repeat with 1% until convergence.

21 Example: Bacillus subtilis

22 How it works for fast-growers Reference set

23 Dominating bias, connected with translation

24 Dominating bias, connected with GC3s

25 Dominating bias, connected with strand

26 Example of non-dominating bias Genes in Class III (possibly horizontally transferred genes) of Bacillus subtilis We can detect and measure this bias by finding the most biased genes in class III with analog of the algorithm proposed

27 REFERENCE A.Carbone, A.Zinovyev, F.Képès “Codon Adaptation Index as a measure of dominating codon bias”, preprint of Institut des Hautes Études Scientifiques, 2003. http://www.ihes.fr/~materials


Download ppt "CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques."

Similar presentations


Ads by Google