CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques.

Slides:



Advertisements
Similar presentations
Codon Bias and Regulation of Translation among Bacteria and Phages
Advertisements

DNA Transcription & Protein Translation
Hierarchical Cluster Structures and Symmetries in Genomic Sequences Andrei Zinovyev Institut des Hautes Études Scientifiques group of M.Gromov.
The DNA Story Germs, Genes, and Genomics 4. Heredity Genes DNA Manipulating DNA.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Chapter 8 Microbial Genetics Biology 1009 Microbiology Johnson-Summer 2003.
A novel method for measuring codon usage bias and estimating its statistical significance Codon usage bias or CUB, a phenomenon in which synonymous codons.
Recombinant DNA technology
Microbial Genome/Proteome Architectures – Signatures of Environmental Adaptation CHITRA DUTTA Structural Biology & Bioinformatics Division Indian Institute.
Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15 th Mar, 2006.
The how and why of information flow in living things.
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France.
Microbial Genetics I. Structure and function of DNA and genes II. Gene expression in bacteria III. Mutation and variation IV. Horizontal gene transfer.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Figure 13.9 Initiation and elongation steps of transcription.
How do Replication and Transcription Change Genomes? Andrey Grigoriev Director, Center for Computational and Integrative Biology Rutgers University.
Central Dogma Information storage in molecules DNA RNA Protein transcription translation replication.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
The phylogenetics project data revealed! October 4, 2010 OEB 192.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
An Overview of Protein Synthesis. Genes A sequence of nucleotides in DNA that performs a specific function such as coding for a particular protein.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Simple cluster structure of triplet distributions in genetic texts Andrei Zinovyev Institute des Hautes Etudes Scientifique, Bures-sur-Yvette.
12-3 RNA and Protein Synthesis
Chapter 27 Lehninger 5th Edition
Chap. 1 basic concepts of Molecular Biology Introduction to Computational Molecular Biology Chapter 1.
Chapter 17 Transcription and Translation From Gene to Protein.
Objective: to understand RNA and transcription and translation 12.3.
Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
Codon Bias and its Relationship to Gene Expression Presented through a virtual grant by the Virtual Student Union.
A. Chromosomes are made of DNA B.Segments of DNA code for a protein C.A protein in turn, relates to a trait or a gene (examples: eye color, hair color,
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
Microbial Models I: Genetics of Viruses and Bacteria 8 November, 2004 Text Chapter 18.
1 Codon Usage. 2 Discovering the codon bias 3 In the year 1980 Four researchers from Lyon analyzed ALL published mRNA sequences of more than about 50.
DNA "The Blueprint of Life". DNA stands for... DeoxyriboNucleic Acid.
I.Structure and Function of RNA A) Why is RNA needed? 1) proteins are made by ribosomes outside the nucleus (on the rough Endoplasmic Reticulum)
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
12-3 RNA and Protein Synthesis Page 300. A. Introduction 1. Chromosomes are a threadlike structure of nucleic acids and protein found in the nucleus of.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
1. 2 Discovering the codon bias 3 Il codice genetico è DEGENERATO.
Chapter – 10 Part II Molecular Biology of the Gene - Genetic Transcription and Translation.
Discovering the codon bias
bacteria and eukaryotes
DNA Replication and Repair
Ribosomes and Protein Synthesis
Aim: What is the connection between DNA & protein?
Molecular Genetics.
13.3 RNA & Gene Expression I. An Overview of Gene _____________ A. RNA
13.3 RNA & Gene Expression I. An Overview of Gene Expression A. RNA
DNA Transcription & Protein Translation
The making of proteins for …..
RNA and Protein Synthesis
AHL 7.2 AHL 7.2 AHL 7.2
CHMI 2227E Biochemistry I Gene expression
Transfer of information from DNA
DNA Transcription & Protein Translation
Translation.
Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention  Iain G. Johnston, Ben P. Williams  Cell.
Monday’s Writing Topic ~ Week 20 – January 25 – 29 ~
Chapter 9 Using the Genetic Code.
RNA and Protein Synthesis
GENE EXPRESSION / PROTEIN SYNTHESIS
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Continuation: translation
Section 13.2 Protein Synthesis.
Segment 5 Molecular Biology Part 1b
Presentation transcript:

CAI and the most biased genes Zinovyev Andrei Institut des Hautes Études Scientifiques

For bacterial genomes the main source of heterogeneity of the genetic text is the signal corresponding to the presence of coding information Mutual information in three consecutive letters - frequency of triplet ijk - frequency of letter i Introduction

Example: Codon bias in Ecoli Overall codon usage Highly expressed genes

Different types of codon bias Translational (mainly fast-growing bacteria) GC-rich (or AT-rich) codons are preffered Codons with G and C in 3 rd position are preffered (or A and T) Influenced by GC-skew (G-C/G+C) or AT-skew Influenced by strand (leading or lagging) Codon bias connected with genes from other organisms (horizontally transferred)

Questions How codon usage of different genes in different genomes is organized? How to describe codon bias quantatively? How to detect what is the main source of codon bias?

Qualitative study of codon usage We can describe every gene by its frequencies of codons – vector with 64 components (59 are interesting for studying codon bias) PCA (principal component analysis) and CA (correspondence analysis) are the most common techniques for exploratory study of codon usage Close points – genes with similar codon usage

Common pattern of fast-growing bacteria IV II I III Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

Typical case of fast-growing bacterium: Bacillus subtilis Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

Escherichia coli Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

Lower-eukaryotic organism: Saccharomyces cerevisiae Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

Higher-eukaryotic organism: Caenorhabditis elegans Genes of class I (most of) Genes of class II (higly expressed) Genes of class III (unusual) Genes of class IV (hydrophobic)

Slow-growing bacterium: Helicobacter pylori Genes of class I (most of) Genes of class IV (hydrophobic)

Slow-growing bacterium: Borrelia burgdorferi Leading strand Lagging strand

Some conclusions: sources of sequence heterogeneity Hydrophobicity Evolutional pressure (translational bias) Horizontal transfer Different GC(AT)-content Strand heterogeneity

Quantative measures of bias Effective number of codons N c Relative Synonymous Codon Usage Relative Codon Adaptiveness [0..1]

Codon Adaptaion Index (CAI) Codon bias with respect to some small set of genes (Reference Set) f i – frequency of codon i, calculated over reference set S L – number of all codons in a gene g i – frequency of codon i in a gene

Expert chooses Reference Set Ribosomal proteins Elongation factors Glycolitic proteins …

Problems: Functions of genes need to be known Expert needs to know the type of codon bias already (else the results will be meaningless) The genes in Reference Set may not have the highest CAIs We use as a Reference Set the most biased genes with respect to dominating codon bias. It is not necessarily translational

The most biased set of genes S R Calculate CAI (with w i calculated over S R ) for every gene in genome Then every gene in S R has CAI higher than any gene which is not in S R We can have several S R for one genome, every of them reflects presence of some type of codon bias

Algorithm for detecting dominating codon bias 1. Calculate w i over 100% genes, and CAIs for all genes 2. Select 50% genes with the highest CAIs, calculate w i, recalculate CAIs 3. Select 25% genes with the highest CAIs, calculate w i, recalculate CAIs … When we will have to select 1% of genes or less, repeat with 1% until convergence.

Example: Bacillus subtilis

How it works for fast-growers Reference set

Dominating bias, connected with translation

Dominating bias, connected with GC3s

Dominating bias, connected with strand

Example of non-dominating bias Genes in Class III (possibly horizontally transferred genes) of Bacillus subtilis We can detect and measure this bias by finding the most biased genes in class III with analog of the algorithm proposed

REFERENCE A.Carbone, A.Zinovyev, F.Képès “Codon Adaptation Index as a measure of dominating codon bias”, preprint of Institut des Hautes Études Scientifiques,