Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

Similar presentations


Presentation on theme: "1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine."— Presentation transcript:

1 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine Froidevaux 2, Isabelle Hatin 1, Jean-Pierre Rousset 1, Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie) 2 LRI (Laboratoire de Recherche en Informatique) Université Paris-Sud, Orsay

2 2 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’ mRNA

3 3 Translation CAU AUG GAU UAC AUG GUC UAA GAU The ribosome reads bases by triplets (or codons) from a START codon ribosome 5’3’

4 4 Translation CAU AUG GAU UAC AUG GUC UAA GAU The ribosome synthetizes one amino-acid per codon 5’3’

5 5 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

6 6 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

7 7 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

8 8 Translation CAU AUG GAU UAC AUG GUC UAA GAU 5’3’

9 9 Translation CAU AUG GAU UAC AUG GUC UAA GAU The synthesis goes on until a STOP codon is read 5’3’ 1 mRNA gives 1 protein

10 10 Experimental fact Some mRNAs encode two distinct proteins with same 5’ end

11 11 Programmed -1 frameshifting Non-deterministic event ORF1a START 0 STOP 0 0 phase STOP -1 ORF1b -1 phase usual translation -1 frameshift 1 mRNA gives 2 distinct proteins with accurate ratio

12 12 Typical -1 frameshift site [Brierley, 1989] NNX XXY YYZAUG PSP S1 L1L1 S2S2 L2L2 L’1L’1 Slippery sequence Secondary structure 5’ 3’

13 13 IBV frameshift site UAU UUA AACAUG S1 S2 Slippery sequence Pseudoknot 5’ 3’ GGGUAC UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

14 14 PK picture ?

15 15 Translation with frameshift UAU UUA AAC GGG UACAUG 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

16 16 Translation with frameshift UAU UUA AAC GGG UAC 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

17 17 Translation with frameshift UAU UUA AAC GGG UAC 5’ 3’ UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA -1 shift

18 18 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

19 19 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

20 20 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

21 21 UA UUU AAA CGG GUA CGG GGU AGC AGU Translation with frameshift 5’ 3’

22 22 Goals  To improve the known model for viral frameshift sites  To identify new frameshift sites in viral and non viral genomes

23 23 Our approach Biological sequences Formal models Prediction tools In silico and in vivo validation Applications to other genomes represent explain predict

24 24 IBV frameshift site: spacer 5’ 3’ GGGUAC

25 25 Spacer consensus HAST-1UAC AAA BEV UGU UG EAVUGA GAG HCVGAG UC IBVGGG UAC MHVGGG UU TGEVGAG RCNMVUAG GC BWYVGGA GUG PLRVGGG CAA BLVUAA UAG A FIVUGG AAG GC HIV-1GGG AAG AU HTLV-2UCC UUA A JSRUGG GUG A MMTV gag-pro UUG UAA A MMTV pro-pol UGA U RSVUAG GGA SRV-1GGA CUG A Consensus UGG UAG A GAA GUA

26 26 Lab experiments lacZluc -1 phase pSV40lacZluc 0 phase pSV40 FS signal FS signal N Test construct Control construct Expression reporter FS reporter

27 27 Spacer: lab experiments Spacerrelative FS rate wild-type IBVGGGUA100 U mutantUGGUA100 A mutant AGGUA 55 C mutantCGGUA 32 CC mutantCCGUA 70 CCU mutantCCUUA 49

28 28 Refining the model: Machine learning To identify relevant properties that characterize FS sites Disjunctive learning: all sequences do not frameshift for the same reasons [Giedroc et al., 2000]

29 29 Annotating data: spacer 5’ 3’ GGGUAC

30 30 Example of data: SP SP = GGGUAC –number of A = 1; C = 1; G = 3; U = 1; –% of A = 33; C = 33; G = 50; U = 33; –first = G; –last = C;

31 31 Annotating data: stem 1 UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC 5’ 3’

32 32 Example of data: stem 1 S1 = –5' side : GGGGUAGCAGU –3' side : CCCCAUAGUCG –stability : -20,7 kcal/mol

33 33 Annotating data: full sequence U UUA AAC 5’ 3’ GGGUAC UGACGAUGGGGUGACGAUGGGG GCUGAUACCCCGCUGAUACCCC A G G C U C G U C C G A G C G UUGC GAAA

34 34 Example of data : FS rate FS rate = 22 %

35 35 GloBo  Disjunctive learning algorithm  Suited to small amount of data  Won the PTE challenge on analogous data

36 36 Example of rules If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %G in S2.5’  70 then FS rate  5% If %G in S1.5' bottom half  80 and %C in L1  45 then FS rate  5% If SP length  5 and S1.3' length  6 and %C in S1.3'  45 then FS rate  5%...

37 37 Covering and prediction If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %G in S2.5’  70 then FS rate  5% Covering of examples : 70 % Examples predicted in test set :80 %

38 38 Is R1 relevant for frameshift ? Stem 1 5’-siderelative FSR1 rate wild-type IBVGGGGU AUCAGU 100 yes mutant 1GGUCG AUCAGU 41yes mutant 2GGGGU UCUACA 55yes mutant 3GCUCG AUCAGU 36 no mutant 4GCCCU AUCAGU 73no

39 39 Covering and prediction If SP length  5 and S1.3' length  6 and %C in S1.3'  45 then FS rate  5% Covering of examples : 45 % Examples predicted in test set :40 %

40 40 Conclusion Spacer: –correlation between primary sequence and FS rate has been established –systematic experimentation going on

41 41 Conclusion Biological sequences Formal models Prediction tools In silico and in vivo validation Applications to other genomes

42 42 GloBo rule covering Run 1 Run 2 Run 3... Rule 1 70 % 80 % 80 % Rule 2 35 % 35 % 40 % Rule 3 45 % 45 % 65 % Rule 4 40 % 50 % 40 % Rule 5 55 % 45 % Rule 6 40 % Average covering of Rule 1 = 80 %

43 43 Examples of rule 1 SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  75 70 % SP length  5 and number of G in S1.5’ bottom half  3 and %C in S1.5’  45 and number of T in S2.5’  1 80 % SP length  5 and S1.5' length  6 and number of G in S1.5’  4 and number of T in S2.5'  1 and %C in S2.3’  70 80 %

44 44 Examples of rule 1 SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  75 70 % SP length  5 and number of G in S1.5’ bottom half  3 and %C in S1.5’  45 and number of T in S2.5’  1 80 % SP length  5 and S1.5' length  6 and number of G in S1.5’  4 and number of T in S2.5'  1 and %C in S2.3’  70 80 %

45 45 Conclusion and perspectives Spacer: –correlation between primary sequence and FS rate has been established –systematic experimentation going on Learning: –relevant rules –experimentation enriches data –quantitative approach

46 46 Future work Interaction between sub-sequences Kinetics of frameshift

47 47 Current model and future work NNX XXY YYZAUG NNN PSP S1 L1L1 S2S2 L2L2 L’1L’1

48 48 Outline Biological problem and motivation of study Existing work Towards building a finer model Conclusion and future work

49 49 Translation CAU AUG GAU UAC AUG GUC UAA GAU The protein synthesis begins with a START triplet Each codon then gives an aminoacid The process ends with a STOP triplet 1 mRNA gives 1 protein mRNA protein

50 50 Spacer Only its length has been systematically studied so far Its primary sequence is relevant as well

51 51 On-going work Program that looks for potential frameshift sites Main issues : –to select a reasonable number of candidate sequences –to find actual pseudoknots in an reliable way [Isambert and Siggia, 2001]

52 52  3G  4G Observations (in vitro) IBV..gggguaucagu....gcugauacccc.. 30% MHV..cgggguacaag....cuuguacccug.. 30% RSV..gggccacug....caguggccc.. 5% Constructions respectant la répartition en guanine  G° (kcal/mol) (in vivo) IBV..gggguaucagu....gcugguacccc.. -20,7 22% mutant1..ggucgaucagu....gcuggucgacc.. -20,3 9% mutant2..gggguucuaca....uguagaacccc.. -22,4 12% mutant3..gcgcgcccgcc....ggcgggcgcgc.. -30,7 x% Constructions NE respectant PAS répartition en guanine mutant4..gcucgaucagu....gcuggucgagc.. -20,3 8% mutant5..gcccuaucagu....gcugguagggc.. -20,7 16% mutant6..gccggcccccc....ggggggccggc.. -31,7 x%

53 53 Spacer: lab experiments (mouse) Spacer FS efficiency GGGTAC14 ± % AGGTAC 13 ± % CGGTAC 8.9 ± % CCGTAC 12.5 ± % CCTTAC21 ± %

54 54 Recent studies Scanning databases to count frameshift-like sites: [Hammell et al. 1999] Using Stochastic Context-Free Grammars: [Liphardt 1999]

55 55 Why do we study frameshifting ? To properly annotate genomes To find frameshift sites in other organisms

56 56 First results Pointed out to new relevant attributes, like position of first mismatch in S1

57 57 Example of data IBV family= Coronaviridae genus= Coronavirus name= Infectious avian bronchitis virus gene1= ORF1a gene2= ORF1b article= Review Brierley 1995 wild type= yes modified part= none P= {UUUAAAC} SP= {GGGUAC} S1.5'= {GGGGUAGCAGU} L1= {G} S2.5'= {GAGGCUCG} L1'= {} S1.3'= {GCUGAUACCCC} L2={UUGCUAGUGGAUGUGAUCCUGAUGUUGUAAAG} S2.3'= {CGAGCCUU} S1= { stem1= GGGGTAGCAGT stem2= CCCCATAGTCG stability= -20,7 } S2= { stem1= GAGGCTCG stem2= TTCCGAGC stability= unknown } global stability= unknown definite secondary structure= yes L1.folding= no L1'.folding= no L2.folding= no efficiency= RRL 30% efficiency= XO 30%

58 58 Spacer

59 59 Example of rules if SP length  5 and number of Gs in S1.5’ bottom half  3 and number of Gs in S1.5’  4 and %T in S2.5’  30 and %C in S2.3’  75 or % G in S1.5' bottom half  80 and %C in L1  45 or SP length  5 and S1.3' length  6 and %C in S1.3' or SP length  5 and number of Gs in S1.5’ bottom half  3 and %C in S1.3’  70 and %G in S2.3’  45 or number of As in S1.5' = 0 and number of As in S2.3' = 0 then %FS  5

60 60 GloBo: main ideas Takes each example as a seed Agglomerates other examples in subset if least general generalization does not cover counterexamples Heuristically selects subsets to cover all examples


Download ppt "1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine."

Similar presentations


Ads by Google