1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

Slides:



Advertisements
Similar presentations
Algorithmics of -1 frameshift RNA sequences Michaël Bekaert 1, Laure Bidou 1, Alain Denise 1,2, Guillemette Duchateau-Nguyen 1, Céline Fabret 1 Jean-Paul.
Advertisements

Traducción. Molécula de aminoácido Sitio de fijación del aminoácido Adaptador (RNAt) RNAm Triplete nucleotídico que codifica un aminoácido + -O 2 C—C—NH.
Click Here to Begin Your Lab
Translation By Josh Morris.
Mutations. DNA mRNA Transcription Introduction of Molecular Biology Cell Polypeptide (protein) Translation Ribosome.
Transcription & Translation Worksheet
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Transcription and Translation
Transcription and Translation
Proteins are made by decoding the Information in DNA Proteins are not built directly from DNA.
FEATURES OF GENETIC CODE AND NON SENSE CODONS
Chapter 17: From Gene to Protein.
Concepts and Applications Eighth Edition
How Proteins are Produced
DNA.
Sec 5.1 / 5.2. One Gene – One Polypeptide Hypothesis early 20 th century – Archibald Garrod physician that noticed that some metabolic errors were found.
PowerPoint ® Lecture Slides prepared by Janice Meeking, Mount Royal College C H A P T E R Copyright © 2010 Pearson Education, Inc. 3 Cells: The Living.
1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Human Biology Sylvia S. Mader Michael Windelspecht Chapter.
Wellcome Trust Workshop Working with Pathogen Genomes Module 2 Gene Prediction.
GENE EXPRESSION. Gene Expression Our phenotype is the result of the expression of proteins Different alleles encode for slightly different proteins Protein.
Gene Expression: From Gene to Protein
Gene to Protein Gene Expression.
RNA Structure Like DNA, RNA is a nucleic acid. RNA is a nucleic acid made up of repeating nucleotides.
Sequence Analysis with Artemis and Artemis Comparison Tool (ACT) Carribean Bioinformatics Workshop 18 th -29 th January, 2010.
7. Protein Synthesis and the Genetic Code a). Overview of translation i). Requirements for protein synthesis ii). messenger RNA iii). Ribosomes and polysomes.
Chapter 11 DNA and Genes.
Cell Division and Gene Expression
Chapter 14 Genetic Code and Transcription. You Must Know The differences between replication (from chapter 13), transcription and translation and the.
Chapter 17 From Gene to Protein. Protein Synthesis  The information content of DNA  Is in the form of specific sequences of nucleotides along the DNA.
©1998 Timothy G. Standish From DNA To RNA To Protein Timothy G. Standish, Ph. D.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
Today 14.2 & 14.4 Transcription and Translation /student_view0/chapter3/animation__p rotein_synthesis__quiz_3_.html.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
G U A C G U A C C A U G G U A C A C U G UUU UUC UUA UCU UUG UCC UCA
Protein Synthesis Worksheet Answer Key
Protein Synthesis Translation e.com/watch?v=_ Q2Ba2cFAew (central dogma song) e.com/watch?v=_ Q2Ba2cFAew.
Figure 17.4 DNA molecule Gene 1 Gene 2 Gene 3 DNA strand (template) TRANSCRIPTION mRNA Protein TRANSLATION Amino acid ACC AAACCGAG T UGG U UU G GC UC.
How Genes Work: From DNA to RNA to Protein Chapter 17.
Gene Translation:RNA -> Protein How does a particular sequence of nucleotides specify a particular sequence of amino acids?nucleotidesamino acids The answer:
F. PROTEIN SYNTHESIS [or translating the message]
DNA.
From DNA to Protein.
Translation PROTEIN SYNTHESIS.
Whole process Step by step- from chromosomes to proteins.
Please turn in your homework
The blueprint of life; from DNA to Protein
Where is Cytochrome C? What is the role? Where does it come from?
Mutations.
What is Transcription and who is involved?
From Gene to Phenotype- part 2
Ch. 17 From Gene to Protein Thought Questions
Gene Expression: From Gene to Protein
Overview: The Flow of Genetic Information
Section Objectives Relate the concept of the gene to the sequence of nucleotides in DNA. Sequence the steps involved in protein synthesis.
Protein Synthesis Translation.
Overview: The Flow of Genetic Information
Transcription You’re made of meat, which is made of protein.
Gene Expression: From Gene to Protein
Translating the Genetic Code
SC-100 Class 25 Molecular Genetics
Translation -The main purpose of translation is to create proteins from mRNA  -mRNA serves as a template during protein synthesis -this means that, ultimately,
Warm Up 3 2/5 Can DNA leave the nucleus?
Today’s notes from the student table Something to write with
Transcription and Translation
Central Dogma and the Genetic Code
Bellringer Please answer on your bellringer sheet:
DNA, RNA, Amino Acids, Proteins, and Genes!.
How does DNA control our characteristics?
DNA and Words Activity.
Mutations Timothy G. Standish, Ph. D..
Presentation transcript:

1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine Froidevaux 2, Isabelle Hatin 1, Jean-Pierre Rousset 1, Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie) 2 LRI (Laboratoire de Recherche en Informatique) Université Paris-Sud, Orsay

2 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’ mRNA

3 Translation CAU AUG GAU UAC AUG GUC UAA GAU The ribosome reads bases by triplets (or codons) from a START codon ribosome 5’3’

4 Translation CAU AUG GAU UAC AUG GUC UAA GAU The ribosome synthetizes one amino-acid per codon 5’3’

5 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

6 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

7 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

8 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

9 Translation CAU AUG GAU UAC AUG GUC UAA GAU The synthesis goes on until a STOP codon is read 5’3’ 1 mRNA gives 1 protein

10 Experimental fact Some mRNAs encode two distinct proteins with same 5’ end

11 Programmed -1 frameshifting Non-deterministic event ORF1a START 0 STOP 0 0 phase STOP -1 ORF1b -1 phase usual translation -1 frameshift 1 mRNA gives 2 distinct proteins with accurate ratio

12 Typical -1 frameshift site [Brierley, 1989] NNX XXY YYZAUG PSP S1 L1L1 S2S2 L2L2 L’1L’1 Slippery sequence Secondary structure 5’ 3’

13 IBV frameshift site UAU UUA AACAUG S1 S2 Slippery sequence Pseudoknot 5’ 3’ GGGUAC UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

14 PK picture ?

15 Translation with frameshift UAU UUA AAC GGG UACAUG 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

16 Translation with frameshift UAU UUA AAC GGG UAC 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

17 Translation with frameshift UAU UUA AAC GGG UAC 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA -1 shift

18 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

19 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

20 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

21 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

22 Goals  To improve the known model for viral frameshift sites  To identify new frameshift sites in viral and non viral genomes

23 Our approach Biological sequences Formal models Prediction tools In silico and in vivo validation Applications to other genomes represent explain predict

24 IBV frameshift site: spacer 5’ 3’ GGGUAC

25 Spacer consensus HAST-1UAC AAA BEV UGU UG EAVUGA GAG HCVGAG UC IBVGGG UAC MHVGGG UU TGEVGAG RCNMVUAG GC BWYVGGA GUG PLRVGGG CAA BLVUAA UAG A FIVUGG AAG GC HIV-1GGG AAG AU HTLV-2UCC UUA A JSRUGG GUG A MMTV gag-pro UUG UAA A MMTV pro-pol UGA U RSVUAG GGA SRV-1GGA CUG A Consensus UGG UAG A GAA GUA

26 Lab experiments lacZluc -1 phase pSV40lacZluc 0 phase pSV40 FS signal FS signal N Test construct Control construct Expression reporter FS reporter

27 Spacer: lab experiments Spacerrelative FS rate wild-type IBVGGGUA100 U mutantUGGUA100 A mutant AGGUA 55 C mutantCGGUA 32 CC mutantCCGUA 70 CCU mutantCCUUA 49

28 Refining the model: Machine learning To identify relevant properties that characterize FS sites Disjunctive learning: all sequences do not frameshift for the same reasons [Giedroc et al., 2000]

29 Annotating data: spacer 5’ 3’ GGGUAC

30 Example of data: SP SP = GGGUAC –number of A = 1; C = 1; G = 3; U = 1; –% of A = 33; C = 33; G = 50; U = 33; –first = G; –last = C;

31 Annotating data: stem 1 UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC 5’ 3’

32 Example of data: stem 1 S1 = –5' side : GGGGUAGCAGU –3' side : CCCCAUAGUCG –stability : -20,7 kcal/mol

33 Annotating data: full sequence U UUA AAC 5’ 3’ GGGUAC UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

34 Example of data : FS rate FS rate = 22 %

35 GloBo  Disjunctive learning algorithm  Suited to small amount of data  Won the PTE challenge on analogous data

36 Example of rules If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %G in S2.5’  70 then FS rate  5% If %G in S1.5' bottom half  80 and %C in L1  45 then FS rate  5% If SP length  5 and S1.3' length  6 and %C in S1.3'  45 then FS rate  5%...

37 Covering and prediction If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %G in S2.5’  70 then FS rate  5% Covering of examples : 70 % Examples predicted in test set :80 %

38 Is R1 relevant for frameshift ? Stem 1 5’-siderelative FSR1 rate wild-type IBVGGGGU AUCAGU 100 yes mutant 1GGUCG AUCAGU 41yes mutant 2GGGGU UCUACA 55yes mutant 3GCUCG AUCAGU 36 no mutant 4GCCCU AUCAGU 73no

39 Covering and prediction If SP length  5 and S1.3' length  6 and %C in S1.3'  45 then FS rate  5% Covering of examples : 45 % Examples predicted in test set :40 %

40 Conclusion Spacer: –correlation between primary sequence and FS rate has been established –systematic experimentation going on

41 Conclusion Biological sequences Formal models Prediction tools In silico and in vivo validation Applications to other genomes

42 GloBo rule covering Run 1 Run 2 Run 3... Rule 1 70 % 80 % 80 % Rule 2 35 % 35 % 40 % Rule 3 45 % 45 % 65 % Rule 4 40 % 50 % 40 % Rule 5 55 % 45 % Rule 6 40 % Average covering of Rule 1 = 80 %

43 Examples of rule 1 SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  % SP length  5 and number of G in S1.5’ bottom half  3 and %C in S1.5’  45 and number of T in S2.5’  1 80 % SP length  5 and S1.5' length  6 and number of G in S1.5’  4 and number of T in S2.5'  1 and %C in S2.3’  %

44 Examples of rule 1 SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  % SP length  5 and number of G in S1.5’ bottom half  3 and %C in S1.5’  45 and number of T in S2.5’  1 80 % SP length  5 and S1.5' length  6 and number of G in S1.5’  4 and number of T in S2.5'  1 and %C in S2.3’  %

45 Conclusion and perspectives Spacer: –correlation between primary sequence and FS rate has been established –systematic experimentation going on Learning: –relevant rules –experimentation enriches data –quantitative approach

46 Future work Interaction between sub-sequences Kinetics of frameshift

47 Current model and future work NNX XXY YYZAUG NNN PSP S1 L1L1 S2S2 L2L2 L’1L’1

48 Outline Biological problem and motivation of study Existing work Towards building a finer model Conclusion and future work

49 Translation CAU AUG GAU UAC AUG GUC UAA GAU The protein synthesis begins with a START triplet Each codon then gives an aminoacid The process ends with a STOP triplet 1 mRNA gives 1 protein mRNA protein

50 Spacer Only its length has been systematically studied so far Its primary sequence is relevant as well

51 On-going work Program that looks for potential frameshift sites Main issues : –to select a reasonable number of candidate sequences –to find actual pseudoknots in an reliable way [Isambert and Siggia, 2001]

52  3G  4G Observations (in vitro) IBV..gggguaucagu....gcugauacccc.. 30% MHV..cgggguacaag....cuuguacccug.. 30% RSV..gggccacug....caguggccc.. 5% Constructions respectant la répartition en guanine  G° (kcal/mol) (in vivo) IBV..gggguaucagu....gcugguacccc.. -20,7 22% mutant1..ggucgaucagu....gcuggucgacc.. -20,3 9% mutant2..gggguucuaca....uguagaacccc.. -22,4 12% mutant3..gcgcgcccgcc....ggcgggcgcgc.. -30,7 x% Constructions NE respectant PAS répartition en guanine mutant4..gcucgaucagu....gcuggucgagc.. -20,3 8% mutant5..gcccuaucagu....gcugguagggc.. -20,7 16% mutant6..gccggcccccc....ggggggccggc.. -31,7 x%

53 Spacer: lab experiments (mouse) Spacer FS efficiency GGGTAC14 ± % AGGTAC 13 ± % CGGTAC 8.9 ± % CCGTAC 12.5 ± % CCTTAC21 ± %

54 Recent studies Scanning databases to count frameshift-like sites: [Hammell et al. 1999] Using Stochastic Context-Free Grammars: [Liphardt 1999]

55 Why do we study frameshifting ? To properly annotate genomes To find frameshift sites in other organisms

56 First results Pointed out to new relevant attributes, like position of first mismatch in S1

57 Example of data IBV family= Coronaviridae genus= Coronavirus name= Infectious avian bronchitis virus gene1= ORF1a gene2= ORF1b article= Review Brierley 1995 wild type= yes modified part= none P= {UUUAAAC} SP= {GGGUAC} S1.5'= {GGGGUAGCAGU} L1= {G} S2.5'= {GAGGCUCG} L1'= {} S1.3'= {GCUGAUACCCC} L2={UUGCUAGUGGAUGUGAUCCUGAUGUUGUAAAG} S2.3'= {CGAGCCUU} S1= { stem1= GGGGTAGCAGT stem2= CCCCATAGTCG stability= -20,7 } S2= { stem1= GAGGCTCG stem2= TTCCGAGC stability= unknown } global stability= unknown definite secondary structure= yes L1.folding= no L1'.folding= no L2.folding= no efficiency= RRL 30% efficiency= XO 30%

58 Spacer

59 Example of rules if SP length  5 and number of Gs in S1.5’ bottom half  3 and number of Gs in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  75 or % G in S1.5' bottom half  80 and %C in L1  45 or SP length  5 and S1.3' length  6 and %C in S1.3' or SP length  5 and number of Gs in S1.5’ bottom half  3 and %C in S1.3’  70 and %G in S2.3’  45 or number of As in S1.5' = 0 and number of As in S2.3' = 0 then %FS  5

60 GloBo: main ideas Takes each example as a seed Agglomerates other examples in subset if least general generalization does not cover counterexamples Heuristically selects subsets to cover all examples