SPIRE Normalized Similarity of RNA Sequences

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Longest Common Subsequence
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Dynamic Programming: Sequence alignment
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
Chapter 7 Dynamic Programming.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
Chapter 3 The Greedy Method 3.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Local Alignment Tutorial 2. Conditions –Division to sub-problems possible –(Optimal) Sub-problem solution usable (many times?) –“Bottom-up” approach Dynamic.
§ 8 Dynamic Programming Fibonacci sequence
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Sparse Normalized Local Alignment Nadav Efraty Gad M. Landau.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sparse Normalized Local Alignment Nadav Efraty Gad M. Landau.
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Class 2: Basic Sequence Alignment
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Approximation Algorithms
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.
Chapter 3 Computational Molecular Biology Michael Smith
1 The Floyd-Warshall Algorithm Andreas Klappenecker.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Dynamic Programming: Edit Distance
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Sequence Alignment.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
String Processing.
The Taxi Scheduling Problem
ICS 353: Design and Analysis of Algorithms
SPIRE Normalized Similarity of RNA Sequences
Sequence Alignment Using Dynamic Programming
Sequence Alignment 11/24/2018.
Analysis of Algorithms
Intro to Alignment Algorithms: Global and Local
Lecture 19-Problem Solving 4 Incremental Method
Cyclic string-to-string correction
CSE 589 Applied Algorithms Spring 1999
Longest Common Subsequence
Dynamic Programming II DP over Intervals
String Processing.
1 1 Find a maximum Matching ? Women Men.
Lecture 6 Dynamic Programming
SPIRE Normalized Similarity of RNA Sequences
Fragment Assembly 7/30/2019.
Presentation transcript:

SPIRE 2005 - Normalized Similarity of RNA Sequences Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann We are doing local we will start with the global

RNA sequences C G G C U A A U C A G U C G U A

RNA sequences C C G U A G U A C C A C A G U G U G G C G C G G C C A U

RNA sequences C C G U A G U A C C A C A G U G U G G C G C G G C C A U

SPIRE 2005 - Normalized Similarity of RNA Sequences Alignment of Strings S1 = U C A C C G __ A __ G S2 = U C G C G G U A U G Global Alignment: White=match, red=indels, yellow=mismatch

Alignment of RNA sequences SPIRE 2005 - Normalized Similarity of RNA Sequences Alignment of RNA sequences A A G G C C C U G A U A G A C C G U U A U Red=character indels & arc indels, yellow=character mismatch & arc mismatch, white=character match & arc match Arc is a whole entity so we either delete an arc or match it to another arc.

Alignment of RNA sequences SPIRE 2005 - Normalized Similarity of RNA Sequences Alignment of RNA sequences A A G G C C C U G A U A G A C C G U U U If we match the blinking arcs we need to match the colored segments (the sequence between the arc endpoints)

Alignment of RNA sequences SPIRE 2005 - Normalized Similarity of RNA Sequences Alignment of RNA sequences A A G G C C C U G A U A G A C C G U U U The theorem says we get the score of aligning any possible colored segments (between arc endpoints) RNA Global Alignment via tree edit distance: [SZ 1989] Theorem: All these algorithms compute the edit distance between any two arcs provided we match these arcs. [K 1998] n [DMRW 2006] m

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U This is the string alignment graph. We will turn it into an RNA alignment graph where there is a one to one correspondence between HEAVIEST paths and OPTIMAL alignments A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U We put in the arcs now A U G

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U We don’t want to match one endpoint without the other so we remove diagonal edges from column/row of an arc endpoint. We will take care of arc matches later. A U G

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U Notice that we split the cost of deleting an arc to the cost of deleting its two endpoints – this is done for the case where a path (or alignment) crosses an arc. An open problem is to charge the cost of deleting a arc to only one arc endpoint. Now we add the shortcut edges for matching arcs. Their weight is obtained by the preprocessing step- one run of Klein or Demaine’ A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U We add a shortcut edge from the cell that represents the beginning of the arcs to the one representing its end. A U G

SPIRE 2005 - Normalized Similarity of RNA Sequences The Alignment graph U C A C C G A G U C G C G G U Clearly the global alignment is just the cost of the shortcut edge from (0,0) to (n,n). The reason it is only the OPTIMAL alignments and not all alignments is that we do not get alignments that match two arcs but take a non optimal alignment of the substring between the arc endpoints. A U G Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.

The Local Alignment algorithms SPIRE 2005 - Normalized Similarity of RNA Sequences The Local Alignment algorithms We use the alignment graph to compute the local similarity between two RNA sequences according to two well known metrics: Smith-Waterman – the highest scoring alignment between any pair of substrings of the input RNAs. It’s normalized version.

Standard Local Similarity (Smith-Waterman) SPIRE 2005 - Normalized Similarity of RNA Sequences Standard Local Similarity (Smith-Waterman) U C A C C G A G U C The score is computed via dynamic program: Score(i,j) = max G C G G U A Score(i,j) is best alignment that ends in (i,j). This is very similar to string smith waterman, only there every vertex had exactly 3 incoming edges and here some have 2 and some 3, and one incoming edge can come from far. U G Score(i’,j’) + Weight of the incoming edge from (i’,j’), Time complexity: O(mn) + one run of a global algorithm = n m

Normalized Local Similarity SPIRE 2005 - Normalized Similarity of RNA Sequences Normalized Local Similarity The weakness of Smith Waterman approach [AP 2001]: Solution: look for the substrings (with their arcs) that maximize: and some given value. AP= Arslan and Pevzner One mach is always optimal so demand that ED(R1’,R2’) is greater than some given value.

Normalized Local Similarity SPIRE 2005 - Normalized Similarity of RNA Sequences Normalized Local Similarity U C A C C G A G Again, dynamic program: U C G Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. C G G U The best k/Length(k,i,j) over all i,j,k is the normalized score. A U G k/Length(k,i,j) = normalized score, and the best k/Length(k,i,j) among all n^2m is the best normalized score. n^2m because 1<k<n.

Normalized Local Similarity SPIRE 2005 - Normalized Similarity of RNA Sequences Normalized Local Similarity Again, dynamic program: Length(k-w,i’,j’) w Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. j’-j i’-i For every k,i,j compute Length(k,i,j) = min Length(k,i,j) Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) Time complexity: + one run of a global algorithm = n m

Open Problems Arc deletion: Improve global tree edit distance U C A C C G A G U C G C G G U A U G

Muchas Gracias por la atencion