Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes pour la comparaison des structures secondaires dARN
© Ebbe Sloth Andersen Les multiples rôles de lARN
© Ebbe Sloth Andersen Les multiples rôles de lARN
Why RNA ? Present in all cellular processes The only molecule which can be genome as well as catalyser Origin of life (?): RNA world Frequent target for antibiotics © E.Westhof 2005
RNA structure: tRNA Primary structure Tertiary structure Secondary structure GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA
RNA structure levels RNA structure ~ Graph of bounded degree, containing a (known) hamiltonian path. Arc-annotated sequences General (Tertiary structure) Crossing (Secondary structure with pseudoknots) Nested (Secondary structure without pseudoknots) Plain (Primary structure)
RNA « Bio-Algorithmics » Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures
Why to compare RNA structures ? How much are they similar (or different?) classification phylogeny Which parts are the more similar between the two structures? Is the small one similar to a part of the large one? Comparison score + correspondence between the structures
Edition and alignment We are given a set of basic operations and a score function associated to each of them. Data : two structures S 1 and S 2. Edit(S 1,S 2 ) : find a best-scoring sequence of operations which changes S 1 into S 2. Align(S 1,S 2 ) : find a structure S which contains S 1 and S 2 as substructures, in such a way to maximize Score(Edit(S 1,S)+Edit(S 2,S)).
Example: sequence comparison Deux séquences v = v 1 v 2 …v n et w = w 1 w 2 …w m Opérations dédition : ins(x,i) suppr(x,i) subs(x,y,i) CHAT - suppr(C,1) HAT - subs(H,R,1) RAT (Pour les séquences : édition ~ alignement : CHAT - RAT)
Example: tree comparison
Edition vs Alignment Alignment Edition Ins( )Del( ) Subs(, ) Ancestor relations are conserved
The nested case Secondary structures (without pseudokots) Tree comparison
Tree edition algorithm Zhang, Shasha 1989
Tree edition algorithm Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Score( [ (f) o t 1 o … o t p ], [ (f), t 1 o … o t q ] ) = Max Score( (f), (f)) + Score([t 1 o … o t p ], [t 1 o … o t q ]) Ins( ) + Score( [ (f) o t 1 o … o t p ], [ f, t 1 o … o t q ]) Del( ) + Score([ f o t 1 o … o t p ], [ (f) o t 1,… o t q ]) f t 1 t 2 … t p Zhang, Shasha 1989 O(n 3 logn) [Klein 1998]
Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Tree alignment algorithm Score( (f) o t 1 o … o t p ; (f) o t 1 o … o t q ) = Max Score( (f); (f)) + Score(t 1 o … o t p ; t 1 o … o t q ) Ins( ) + Max i { Score( (f) o … o t i ; f ) + Score(t i+1 o … o t p ; t 1 o … o t q ) } Del( ) + Max j { Score( f ; (f) o t 1 o … o t j ) + Score(t 1 o … o t p ; t j+1 o … o t q ) } f t 1 t 2 … t p Jiang, Wang, Zhang 1995 O(n 4 )
Edition vs Alignment Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Score( [ (f), t 1,…,t p ], [ f, t 1,…,t q ]) … Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Max i { Score( [ (f), …t i ], f ) + Score([t i+1,…, t p ], [t 1,…,t q ]) } …
Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i
Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i Can be inserted anywhere
Complexity Edition [Zhang, Shasha 1989, Klein 1998] Worst-case : O(n 4 ) [Zhang-Shasha 1989] O(n 3 logn) [Klein 1998, Dulucq-Touzet 2003] In average : O(n 3 ) [Dulucq-Tichit 2003] Alignment [Jiang, Wang, Zhang 1995] Worst-case : O(n 4 )
3 operations! AU GC GU UA UU Delete( ) Insert( ) Edition operations: problem A-U U-A G-C C-U A-U U G-C C-U AUGG…….UCAUAUGG…….UCUU
Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C - Edition operations on RNA New
A first solution A-U U-A G-C C-U A-U U A G-C C-U AUGG…….UCAU A U G C U A C U A U G C U A C U But this implies some constraints on the scores. For example: Arc-deletion = Arc-Breaking + 2 Base-Deletion Höchsmann, Töller, Gierich, Kurtz 2003 (RNAforester)
Edition operations on RNA Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C -
General Crossing Nested Plain Complexity of the edition problem
GeneralCrossingNestedPlain General NP-complete Crossing NP-complete Nested NP-completeO(nm 3 ) Plain O(nm / logn) Jiang, Lin, Ma, Zhang 2002 Blin, Fertin, Rusu, Sinoquet 2003 Crochemore, Landau, Ziv-Ukelson 2002 If 2 Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n 3 m) or Edit(crossing,nested) et Edit(nested,nested) Complexity of the edition problem
Complexity of 2 ary struct. comparison Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] ?
Secondary structure alignment A-BCD-EFG ABB-DF-FG AB---CDEFG ABBDF---FG ABCDEFGABBDFFG EditionAlignment
New edition operations on trees Arc-breaking / : Arc-altering / : C G C G C -
Alignment algorithm (1/5) f
Alignment algorithm (2/5) f t
Alignment algorithm (2/5) f t
Alignment algorithm (2/5) f t
Alignment algorithm (2/5) f t
Alignment algorithm (2/5) f t
Alignment algorithm (3/5) f t
Alignment algorithm (3/5) f t
Alignment algorithm (3/5) f t
Alignment algorithm (3/5) f t
Alignment algorithm (3/5) f t
Alignment algorithm (4/5) f t
Alignment algorithm (5/5) f t
Alignment algorithm (5/5) f t
Alignment algorithm (5/5) f t
Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison
Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison Complexity of the alignment problem for the other structure levels: [Blin, Touzet 2006]
Example: two tRNAs Homo sapiensBacillus subtilis Drawing: Tulip (David Auber et al., LaBRI) Base-subs / Arc-subs Deletions / Insertions Arc-breaking Arc-altering
Et dans la vraie vie ?
Alignement de RNAses P
To do… Biological validation : Test on real data Comparison with other softwares ( RNAForester, MiGal [J.Allali, M.F.Sagot] ) Combined approaches ( [J.Allalli, A.Ouangraoua-P.Ferraro] ) Parameters : substitution matrices etc. Statistical evaluation of results Relevant algorithms and parameters Useful and user-friendly programs Sequence/Structure alignment Multiple alignment …
Crédits Julien Allali David Auber Serge Dulucq Claire Herrbach Rym Kachouri Yann Ponty Michel Termier Laurent Tichit Hélène Touzet Eric Westhof