Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPIRE Normalized Similarity of RNA Sequences

Similar presentations


Presentation on theme: "SPIRE Normalized Similarity of RNA Sequences"— Presentation transcript:

1 SPIRE 2005 - Normalized Similarity of RNA Sequences
Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

2 RNA sequences C G G C U A A U C A G U C G U A

3 RNA sequences C C G U A G U A C C A C A G U G U G G C G C G G C C A U

4 SPIRE 2005 - Normalized Similarity of RNA Sequences
LCS of Strings S1 = C C G U A G U A C C A C A G U G U G G S2 = G A G C A G C C C U C G G G A A U U G Global LCS: [Hirschberg 1977]

5 SPIRE 2005 - Normalized Similarity of RNA Sequences
LCS of RNA sequences C G U A R1 = Left arc match Right arc match G A C U R2 = Arc match and non arc match, we get lCS of every 2 arcs (one in R1 one in R2) RNA Global LCS: [Klein 1998]

6 SPIRE 2005 - Normalized Similarity of RNA Sequences
Global Similarity - LCS C G A U Look for largest set of matches strictly increasing in both rows and columns that obey the arc restrictions

7 Local Similarity – Normalized LCS
SPIRE Normalized Similarity of RNA Sequences Local Similarity – Normalized LCS Report the most similar substring pair according to some scoring scheme. In our case, we look for the substrings (with their arcs) that maximize: Can be viewed as measure of the density of the matches. One mach is always optimal so set a minimum score of M

8 Local Similarity in Strings
Local edit distance O(nm) [Smith Waterman 1981] Normalized LCS O(mnlogn) [Arslan Pevzner 2001] Normalized LCS for sparse matrices O(rLloglogn) [Efraty Landau 2004]

9 Our Result A novel local similarity metric for comparing RNA sequences. An time algorithm for computing this metric. As fast as the global algorithm (in contrast to the case of strings).

10 SPIRE 2005 - Normalized Similarity of RNA Sequences
Definitions A chain is a sequence of matches that is strictly increasing in rows and columns. The length of a chain from (i,j) to match (i’,j’) is i’-i+j’-j. n m R2 R1 A k-chain(i,j) is the shortest chain of k matches starting from (i,j). R1 J’ n i i’ m J R2 (i,j) (i’,j’) The chain is legal in arcs The chain will never really start in a mismatch but needed for dp The normalized value of k-chain(i,j) is k divided by its length. ( )

11 SPIRE 2005 - Normalized Similarity of RNA Sequences
General idea - Construct (k+1)-chain(i,j) by concatenating (i,j) to k-chain(i’,j’) . a a b c a d e c f h c g g b f h e c For the moment lets assume no arcs. When I will say BEST k-chain I mean value of new chain (yellow+chain) is best. g g g f d e f

12 Decomposing k-Chains C G A U

13 Decomposing k-Chains (non arc match)
U Best (k-1)-Chain

14 Decomposing k-Chains (mismatch)
U Best k-Chain

15 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (right arc match) C G A U Treat it like mismatch (since we can’t use this match for the chain starting at him). Cannot connect to same row or column (column can be seen in figure) since the matches there are right arc matches Best k-Chain

16 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (left arc match) C G A U Option 1: don’t use the match – use any k-chain in the gray area

17 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (left arc match I) C G A U Best k-Chain Option 1: don’t use the match – use any k-chain in the gray area

18 SPIRE 2005 - Normalized Similarity of RNA Sequences
Example 2-Chain C G A U Example for option 1

19 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (left arc match II) C G A U Option 2: use match then we need to take the whole arc!

20 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (left arc match II) C G A U k ≥ lcs Option 3: use match then we need to take the whole arc && k>lcs Best (k-lcs)-Chain

21 SPIRE 2005 - Normalized Similarity of RNA Sequences
Decomposing k-Chains (left arc match III) A G U C k lcs Option 2: use match then we need to take the whole arc && k<=lcs

22 SPIRE 2005 - Normalized Similarity of RNA Sequences
Example 3-Chain C G A U Option 2: use match then we need to take the whole arc && k<=lcs

23 The Algorithm (Given R1,R2)
SPIRE Normalized Similarity of RNA Sequences The Algorithm (Given R1,R2) Run Klein’s algorithm to get LCS of every arc in R1 with every arc in R2. For k=1,2,…,n: Construct all k-chains from bottom right to top left using DP. Report best k-chain. Total of as fast as global LCS Bottleneck = global LCS

24 The DP

25 Muchas Gracias por la atencion


Download ppt "SPIRE Normalized Similarity of RNA Sequences"

Similar presentations


Ads by Google