SPIRE Normalized Similarity of RNA Sequences

Slides:



Advertisements
Similar presentations
Longest Common Subsequence
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
 2004 SDU Lecture11- All-pairs shortest paths. Dynamic programming Comparing to divide-and-conquer 1.Both partition the problem into sub-problems 2.Divide-and-conquer.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11 sections4-7 Lecturer:
Sparse Normalized Local Alignment Nadav Efraty Gad M. Landau.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Sequence Alignment Cont’d. Evolution Scoring Function Sequence edits: AGGCCTC  Mutations AGGACTC  Insertions AGGGCCTC  Deletions AGG.CTC Scoring Function:
Sparse Normalized Local Alignment Nadav Efraty Gad M. Landau.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Alignment II Dynamic Programming
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
CS 473 All Pairs Shortest Paths1 CS473 – Algorithms I All Pairs Shortest Paths.
Sequence comparison: Local alignment
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Dynamic Programming.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
DNA, RNA and protein are an alien language
What Dynamic Programming (DP) is a fundamental problem solving technique that has been widely used for solving a broad range of search and optimization.
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
Relations and Their Properties
The Acceptance Problem for TMs
IOI/ACM ICPC Training 4 June 2005.
Merge Sort 5/28/2018 9:55 AM Dynamic Programming Dynamic Programming.
Approximate Matching of Run-Length Compressed Strings
Sequence comparison: Local alignment
Distance Functions for Sequence Data and Time Series
SPIRE Normalized Similarity of RNA Sequences
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Cyclic string-to-string correction
Merge Sort 1/12/2019 5:31 PM Dynamic Programming Dynamic Programming.
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
SPIRE Normalized Similarity of RNA Sequences
Dynamic Programming Dynamic Programming 1/18/ :45 AM
Merge Sort 1/18/ :45 AM Dynamic Programming Dynamic Programming.
Dynamic Programming Merge Sort 1/18/ :45 AM Spring 2007
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Merge Sort 2/22/ :33 AM Dynamic Programming Dynamic Programming.
Dynamic Programming-- Longest Common Subsequence
Longest Common Subsequence
Dynamic Programming II DP over Intervals
Merge Sort 4/28/ :13 AM Dynamic Programming Dynamic Programming.
The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.
Longest Common Subsequence
Dynamic Programming Merge Sort 5/23/2019 6:18 PM Spring 2008
Fragment Assembly 7/30/2019.
Presentation transcript:

SPIRE 2005 - Normalized Similarity of RNA Sequences Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

RNA sequences C G G C U A A U C A G U C G U A

RNA sequences C C G U A G U A C C A C A G U G U G G C G C G G C C A U

SPIRE 2005 - Normalized Similarity of RNA Sequences LCS of Strings S1 = C C G U A G U A C C A C A G U G U G G S2 = G A G C A G C C C U C G G G A A U U G Global LCS: [Hirschberg 1977]

SPIRE 2005 - Normalized Similarity of RNA Sequences LCS of RNA sequences C G U A R1 = Left arc match Right arc match G A C U R2 = Arc match and non arc match, we get lCS of every 2 arcs (one in R1 one in R2) RNA Global LCS: [Klein 1998]

SPIRE 2005 - Normalized Similarity of RNA Sequences Global Similarity - LCS C G A U Look for largest set of matches strictly increasing in both rows and columns that obey the arc restrictions

Local Similarity – Normalized LCS SPIRE 2005 - Normalized Similarity of RNA Sequences Local Similarity – Normalized LCS Report the most similar substring pair according to some scoring scheme. In our case, we look for the substrings (with their arcs) that maximize: Can be viewed as measure of the density of the matches. One mach is always optimal so set a minimum score of M

Local Similarity in Strings Local edit distance O(nm) [Smith Waterman 1981] Normalized LCS O(mnlogn) [Arslan Pevzner 2001] Normalized LCS for sparse matrices O(rLloglogn) [Efraty Landau 2004]

Our Result A novel local similarity metric for comparing RNA sequences. An time algorithm for computing this metric. As fast as the global algorithm (in contrast to the case of strings).

SPIRE 2005 - Normalized Similarity of RNA Sequences Definitions A chain is a sequence of matches that is strictly increasing in rows and columns. The length of a chain from (i,j) to match (i’,j’) is i’-i+j’-j. n m R2 R1 A k-chain(i,j) is the shortest chain of k matches starting from (i,j). R1 J’ n i i’ m J R2 (i,j) (i’,j’) The chain is legal in arcs The chain will never really start in a mismatch but needed for dp The normalized value of k-chain(i,j) is k divided by its length. ( )

SPIRE 2005 - Normalized Similarity of RNA Sequences General idea - Construct (k+1)-chain(i,j) by concatenating (i,j) to k-chain(i’,j’) . a a b c a d e c f h c g g b f h e c For the moment lets assume no arcs. When I will say BEST k-chain I mean value of new chain (yellow+chain) is best. g g g f d e f

Decomposing k-Chains C G A U

Decomposing k-Chains (non arc match) U Best (k-1)-Chain

Decomposing k-Chains (mismatch) U Best k-Chain

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (right arc match) C G A U Treat it like mismatch (since we can’t use this match for the chain starting at him). Cannot connect to same row or column (column can be seen in figure) since the matches there are right arc matches Best k-Chain

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (left arc match) C G A U Option 1: don’t use the match – use any k-chain in the gray area

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (left arc match I) C G A U Best k-Chain Option 1: don’t use the match – use any k-chain in the gray area

SPIRE 2005 - Normalized Similarity of RNA Sequences Example 2-Chain C G A U Example for option 1

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (left arc match II) C G A U Option 2: use match then we need to take the whole arc!

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (left arc match II) C G A U k ≥ lcs Option 3: use match then we need to take the whole arc && k>lcs Best (k-lcs)-Chain

SPIRE 2005 - Normalized Similarity of RNA Sequences Decomposing k-Chains (left arc match III) A G U C k lcs Option 2: use match then we need to take the whole arc && k<=lcs

SPIRE 2005 - Normalized Similarity of RNA Sequences Example 3-Chain C G A U Option 2: use match then we need to take the whole arc && k<=lcs

The Algorithm (Given R1,R2) SPIRE 2005 - Normalized Similarity of RNA Sequences The Algorithm (Given R1,R2) Run Klein’s algorithm to get LCS of every arc in R1 with every arc in R2. For k=1,2,…,n: Construct all k-chains from bottom right to top left using DP. Report best k-chain. Total of - as fast as global LCS Bottleneck = global LCS

The DP

Muchas Gracias por la atencion