RNA sequence-structure alignment

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Lecture 3: Parallel Algorithm Design
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Sequence Alignment Tutorial #2
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
DNA Alignment. Dynamic Programming R. Bellman ~ 1950.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
RNA Secondary Structure Prediction
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Multiple sequence alignment
Multiple Sequence alignment Chitta Baral Arizona State University.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Protein Sequence Comparison Patrice Koehl
Phylogenetic Tree Construction and Related Problems Bioinformatics.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Sequence Alignment.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Lectures on Greedy Algorithms and Dynamic Programming
Structural Alignment of Pseudo-knotted RNA
Prediction of Secondary Structure of RNA
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
DNA, RNA and protein are an alien language
Motif Search and RNA Structure Prediction Lesson 9.
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Tree - in “math speak” An ________ graph is a set of vertices/nodes and a set of edges, each edge connects two vertices. Any undirected graph in which.
Core String Edits, Alignments, and Dynamic Programming.
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008
Distance Functions for Sequence Data and Time Series
Sequence Alignment 11/24/2018.
Dynamic Programming (cont’d)
Comparative RNA Structural Analysis
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
Advanced Algorithms Analysis and Design
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Multiple Sequence Alignment (I)
Dynamic Programming-- Longest Common Subsequence
Multiple Sequence Alignment
Data Structures Using C++ 2E
Multiple Sequence Alignment
Presentation transcript:

RNA sequence-structure alignment RNA alignments: sequence-sequence alignment. sequence-structure alignment. It’s most useful for ncRNA finding. structure-structure alignment.

What is an alignment? Alignment of two sequences ACCCG-UUAAU | ||| ||*|| A-CCGUUUCAU Sequence-structure alignment: One RNA sequence with its known secondary structure. One just sequence Take into account both structure similarity and sequence similarity. Stru1: >>>> <<<< Seq1: GGGGCAACCCC ++++*||++++ Seq2: AUCCGAAGGAU

RNA sequence-structure alignment Given two RNA sequence s[1…n] and t[1…m], and s has a known secondary structure S, where (i,j) in S implies s[i] is basepaired with s[j]. Score for the alignment is the sum of scores γ of each column plus the sum of scores δ of each basepair in S. Scores for each column: γ(a,b) for all a, b in {A, C, G, U, -}. Scores for two basepair columns: δ(a, b, c, d) for all a, b, c, and d in {A, C, G, U}. For example δ(a, b, c, d) = 1 if (a,b) in S and c and d form basepair, otherwise -∞.

Analysis: alignment solutions Similar to Nussinov’s base pair maximization problem’s solution, the obvious dynamic programming solution is in O(n6) time. Bafna et al. (1995) improved the dynamic programming algorithm to a running time of O(n4) by using binarized tree for structure S. We further improve the algorithm by building a binarized tree for whole sequence s[1…n] with its structure, and reduce the computation time into O(n3m1), where m1 is the number of the branches in the tree (typically very small constant).

The method of Bafna et al. (1995) Adding spurious (dashed) edges to S, so that each node in S’ has at most 2 children. We use the dashed edges (void nodes) to fix certain interval doing recursions, when we need find the branch points. It is in O(n2m2+nm3) time.

Extend the binary tree into the whole sequence s[1…n] Solid nodes represent the basepairs. Dotted nodes represent either branch point or unpaired bases. Now the s[1…n] with its structure is changed into a binary tree. The dynamic programming compares each nodes against an interval. It is in O(n3m1) time. (m1 the number of branch points)

Using a binary tree to represent an RNA Solid nodes represent the basepairs. Void nodes represent either branch point or unpaired bases. Branch points C-G A-U -A C- -C -U

The dynamic programming aligns each subtree rooted at node v against an subinterval (i,j) If root v is a solid node (first and last bases of the subsequence represented by the subtree are paired to each other): If v is a void node (the subsequence’s last base is unpaired): If v is a void node (it is a branch point for multi-loop.):

Modifications of the formulation Banded alignment. [O(n2δn)] Complicated scoring functions: Affine gap penalties in both stacks and loops. Learning scoring matrix from handmade alignments.