Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes.

Similar presentations


Presentation on theme: "Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes."— Presentation transcript:

1 Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes pour la comparaison des structures secondaires dARN

2 © Ebbe Sloth Andersen Les multiples rôles de lARN

3 © Ebbe Sloth Andersen Les multiples rôles de lARN

4 Why RNA ? Present in all cellular processes The only molecule which can be genome as well as catalyser Origin of life (?): RNA world Frequent target for antibiotics © E.Westhof 2005

5 RNA structure: tRNA Primary structure Tertiary structure Secondary structure GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA

6 RNA structure levels RNA structure ~ Graph of bounded degree, containing a (known) hamiltonian path. Arc-annotated sequences General (Tertiary structure) Crossing (Secondary structure with pseudoknots) Nested (Secondary structure without pseudoknots) Plain (Primary structure)

7 RNA « Bio-Algorithmics » Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures

8 Why to compare RNA structures ? How much are they similar (or different?) classification phylogeny Which parts are the more similar between the two structures? Is the small one similar to a part of the large one? Comparison score + correspondence between the structures

9 Edition and alignment We are given a set of basic operations and a score function associated to each of them. Data : two structures S 1 and S 2. Edit(S 1,S 2 ) : find a best-scoring sequence of operations which changes S 1 into S 2. Align(S 1,S 2 ) : find a structure S which contains S 1 and S 2 as substructures, in such a way to maximize Score(Edit(S 1,S)+Edit(S 2,S)).

10 Example: sequence comparison Deux séquences v = v 1 v 2 …v n et w = w 1 w 2 …w m Opérations dédition : ins(x,i) suppr(x,i) subs(x,y,i) CHAT - suppr(C,1) HAT - subs(H,R,1) RAT (Pour les séquences : édition ~ alignement : CHAT - RAT)

11 Example: tree comparison

12 Edition vs Alignment Alignment Edition Ins( )Del( ) Subs(, ) Ancestor relations are conserved

13 The nested case Secondary structures (without pseudokots) Tree comparison

14

15 Tree edition algorithm Zhang, Shasha 1989

16 Tree edition algorithm Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Score( [ (f) o t 1 o … o t p ], [ (f), t 1 o … o t q ] ) = Max Score( (f), (f)) + Score([t 1 o … o t p ], [t 1 o … o t q ]) Ins( ) + Score( [ (f) o t 1 o … o t p ], [ f, t 1 o … o t q ]) Del( ) + Score([ f o t 1 o … o t p ], [ (f) o t 1,… o t q ]) f t 1 t 2 … t p Zhang, Shasha 1989 O(n 3 logn) [Klein 1998]

17 Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Tree alignment algorithm Score( (f) o t 1 o … o t p ; (f) o t 1 o … o t q ) = Max Score( (f); (f)) + Score(t 1 o … o t p ; t 1 o … o t q ) Ins( ) + Max i { Score( (f) o … o t i ; f ) + Score(t i+1 o … o t p ; t 1 o … o t q ) } Del( ) + Max j { Score( f ; (f) o t 1 o … o t j ) + Score(t 1 o … o t p ; t j+1 o … o t q ) } f t 1 t 2 … t p Jiang, Wang, Zhang 1995 O(n 4 )

18 Edition vs Alignment Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Score( [ (f), t 1,…,t p ], [ f, t 1,…,t q ]) … Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Max i { Score( [ (f), …t i ], f ) + Score([t i+1,…, t p ], [t 1,…,t q ]) } …

19 Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i

20 Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i Can be inserted anywhere

21 Complexity Edition [Zhang, Shasha 1989, Klein 1998] Worst-case : O(n 4 ) [Zhang-Shasha 1989] O(n 3 logn) [Klein 1998, Dulucq-Touzet 2003] In average : O(n 3 ) [Dulucq-Tichit 2003] Alignment [Jiang, Wang, Zhang 1995] Worst-case : O(n 4 )

22 3 operations! AU GC GU UA UU Delete( ) Insert( ) Edition operations: problem A-U U-A G-C C-U A-U U G-C C-U AUGG…….UCAUAUGG…….UCUU

23 Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C - Edition operations on RNA New

24 A first solution A-U U-A G-C C-U A-U U A G-C C-U AUGG…….UCAU A U G C U A C U A U G C U A C U But this implies some constraints on the scores. For example: Arc-deletion = Arc-Breaking + 2 Base-Deletion Höchsmann, Töller, Gierich, Kurtz 2003 (RNAforester)

25 Edition operations on RNA Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C -

26 General Crossing Nested Plain Complexity of the edition problem

27 GeneralCrossingNestedPlain General NP-complete Crossing NP-complete Nested NP-completeO(nm 3 ) Plain O(nm / logn) Jiang, Lin, Ma, Zhang 2002 Blin, Fertin, Rusu, Sinoquet 2003 Crochemore, Landau, Ziv-Ukelson 2002 If 2 Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n 3 m) or Edit(crossing,nested) et Edit(nested,nested) Complexity of the edition problem

28 Complexity of 2 ary struct. comparison Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] ?

29 Secondary structure alignment A-BCD-EFG ABB-DF-FG AB---CDEFG ABBDF---FG ABCDEFGABBDFFG EditionAlignment

30 New edition operations on trees Arc-breaking / : Arc-altering / : C G C G C -

31 Alignment algorithm (1/5) f

32 Alignment algorithm (2/5) f t

33 Alignment algorithm (2/5) f t

34 Alignment algorithm (2/5) f t

35 Alignment algorithm (2/5) f t

36 Alignment algorithm (2/5) f t

37 Alignment algorithm (3/5) f t

38 Alignment algorithm (3/5) f t

39 Alignment algorithm (3/5) f t

40 Alignment algorithm (3/5) f t

41 Alignment algorithm (3/5) f t

42 Alignment algorithm (4/5) f t

43 Alignment algorithm (5/5) f t

44 Alignment algorithm (5/5) f t

45 Alignment algorithm (5/5) f t

46 Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison

47 Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison Complexity of the alignment problem for the other structure levels: [Blin, Touzet 2006]

48 Example: two tRNAs Homo sapiensBacillus subtilis Drawing: Tulip (David Auber et al., LaBRI) Base-subs / Arc-subs Deletions / Insertions Arc-breaking Arc-altering

49 Et dans la vraie vie ?

50 Alignement de RNAses P

51

52

53

54 To do… Biological validation : Test on real data Comparison with other softwares ( RNAForester, MiGal [J.Allali, M.F.Sagot] ) Combined approaches ( [J.Allalli, A.Ouangraoua-P.Ferraro] ) Parameters : substitution matrices etc. Statistical evaluation of results Relevant algorithms and parameters Useful and user-friendly programs Sequence/Structure alignment Multiple alignment …

55 Crédits Julien Allali David Auber Serge Dulucq Claire Herrbach Rym Kachouri Yann Ponty Michel Termier Laurent Tichit Hélène Touzet Eric Westhof


Download ppt "Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes."

Similar presentations


Ads by Google