Download presentation
Presentation is loading. Please wait.
Published byNathaniel Chase Modified over 11 years ago
1
Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes pour la comparaison des structures secondaires dARN
2
© Ebbe Sloth Andersen Les multiples rôles de lARN
3
© Ebbe Sloth Andersen Les multiples rôles de lARN
4
Why RNA ? Present in all cellular processes The only molecule which can be genome as well as catalyser Origin of life (?): RNA world Frequent target for antibiotics © E.Westhof 2005
5
RNA structure: tRNA Primary structure Tertiary structure Secondary structure GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA
6
RNA structure levels RNA structure ~ Graph of bounded degree, containing a (known) hamiltonian path. Arc-annotated sequences General (Tertiary structure) Crossing (Secondary structure with pseudoknots) Nested (Secondary structure without pseudoknots) Plain (Primary structure)
7
RNA « Bio-Algorithmics » Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures
8
Why to compare RNA structures ? How much are they similar (or different?) classification phylogeny Which parts are the more similar between the two structures? Is the small one similar to a part of the large one? Comparison score + correspondence between the structures
9
Edition and alignment We are given a set of basic operations and a score function associated to each of them. Data : two structures S 1 and S 2. Edit(S 1,S 2 ) : find a best-scoring sequence of operations which changes S 1 into S 2. Align(S 1,S 2 ) : find a structure S which contains S 1 and S 2 as substructures, in such a way to maximize Score(Edit(S 1,S)+Edit(S 2,S)).
10
Example: sequence comparison Deux séquences v = v 1 v 2 …v n et w = w 1 w 2 …w m Opérations dédition : ins(x,i) suppr(x,i) subs(x,y,i) CHAT - suppr(C,1) HAT - subs(H,R,1) RAT (Pour les séquences : édition ~ alignement : CHAT - RAT)
11
Example: tree comparison
12
Edition vs Alignment Alignment Edition Ins( )Del( ) Subs(, ) Ancestor relations are conserved
13
The nested case Secondary structures (without pseudokots) Tree comparison
15
Tree edition algorithm Zhang, Shasha 1989
16
Tree edition algorithm Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Score( [ (f) o t 1 o … o t p ], [ (f), t 1 o … o t q ] ) = Max Score( (f), (f)) + Score([t 1 o … o t p ], [t 1 o … o t q ]) Ins( ) + Score( [ (f) o t 1 o … o t p ], [ f, t 1 o … o t q ]) Del( ) + Score([ f o t 1 o … o t p ], [ (f) o t 1,… o t q ]) f t 1 t 2 … t p Zhang, Shasha 1989 O(n 3 logn) [Klein 1998]
17
Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Tree alignment algorithm Score( (f) o t 1 o … o t p ; (f) o t 1 o … o t q ) = Max Score( (f); (f)) + Score(t 1 o … o t p ; t 1 o … o t q ) Ins( ) + Max i { Score( (f) o … o t i ; f ) + Score(t i+1 o … o t p ; t 1 o … o t q ) } Del( ) + Max j { Score( f ; (f) o t 1 o … o t j ) + Score(t 1 o … o t p ; t j+1 o … o t q ) } f t 1 t 2 … t p Jiang, Wang, Zhang 1995 O(n 4 )
18
Edition vs Alignment Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Score( [ (f), t 1,…,t p ], [ f, t 1,…,t q ]) … Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Max i { Score( [ (f), …t i ], f ) + Score([t i+1,…, t p ], [t 1,…,t q ]) } …
19
Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i
20
Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i Can be inserted anywhere
21
Complexity Edition [Zhang, Shasha 1989, Klein 1998] Worst-case : O(n 4 ) [Zhang-Shasha 1989] O(n 3 logn) [Klein 1998, Dulucq-Touzet 2003] In average : O(n 3 ) [Dulucq-Tichit 2003] Alignment [Jiang, Wang, Zhang 1995] Worst-case : O(n 4 )
22
3 operations! AU GC GU UA UU Delete( ) Insert( ) Edition operations: problem A-U U-A G-C C-U A-U U G-C C-U AUGG…….UCAUAUGG…….UCUU
23
Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C - Edition operations on RNA New
24
A first solution A-U U-A G-C C-U A-U U A G-C C-U AUGG…….UCAU A U G C U A C U A U G C U A C U But this implies some constraints on the scores. For example: Arc-deletion = Arc-Breaking + 2 Base-Deletion Höchsmann, Töller, Gierich, Kurtz 2003 (RNAforester)
25
Edition operations on RNA Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C -
26
General Crossing Nested Plain Complexity of the edition problem
27
GeneralCrossingNestedPlain General NP-complete Crossing NP-complete Nested NP-completeO(nm 3 ) Plain O(nm / logn) Jiang, Lin, Ma, Zhang 2002 Blin, Fertin, Rusu, Sinoquet 2003 Crochemore, Landau, Ziv-Ukelson 2002 If 2 Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n 3 m) or Edit(crossing,nested) et Edit(nested,nested) Complexity of the edition problem
28
Complexity of 2 ary struct. comparison Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] ?
29
Secondary structure alignment A-BCD-EFG ABB-DF-FG AB---CDEFG ABBDF---FG ABCDEFGABBDFFG EditionAlignment
30
New edition operations on trees Arc-breaking / : Arc-altering / : C G C G C -
31
Alignment algorithm (1/5) f
32
Alignment algorithm (2/5) f t
33
Alignment algorithm (2/5) f t
34
Alignment algorithm (2/5) f t
35
Alignment algorithm (2/5) f t
36
Alignment algorithm (2/5) f t
37
Alignment algorithm (3/5) f t
38
Alignment algorithm (3/5) f t
39
Alignment algorithm (3/5) f t
40
Alignment algorithm (3/5) f t
41
Alignment algorithm (3/5) f t
42
Alignment algorithm (4/5) f t
43
Alignment algorithm (5/5) f t
44
Alignment algorithm (5/5) f t
45
Alignment algorithm (5/5) f t
46
Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison
47
Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison Complexity of the alignment problem for the other structure levels: [Blin, Touzet 2006]
48
Example: two tRNAs Homo sapiensBacillus subtilis Drawing: Tulip (David Auber et al., LaBRI) Base-subs / Arc-subs Deletions / Insertions Arc-breaking Arc-altering
49
Et dans la vraie vie ?
50
Alignement de RNAses P
54
To do… Biological validation : Test on real data Comparison with other softwares ( RNAForester, MiGal [J.Allali, M.F.Sagot] ) Combined approaches ( [J.Allalli, A.Ouangraoua-P.Ferraro] ) Parameters : substitution matrices etc. Statistical evaluation of results Relevant algorithms and parameters Useful and user-friendly programs Sequence/Structure alignment Multiple alignment …
55
Crédits Julien Allali David Auber Serge Dulucq Claire Herrbach Rym Kachouri Yann Ponty Michel Termier Laurent Tichit Hélène Touzet Eric Westhof
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.