Predicting the Secondary Structure of RNA

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Chapter 7 Dynamic Programming.
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
 A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT.
Example 4.7 Data Envelopment Analysis (DEA) | 4.2 | 4.3 | 4.4 | 4.5 | Background Information n Consider a group of three hospitals.
1 Greedy Algorithms. 2 2 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
KNAPSACK PROBLEM A dynamic approach. Knapsack Problem  Given a sack, able to hold K kg  Given a list of objects  Each has a weight and a value  Try.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
1 Dynamic Programming Jose Rolim University of Geneva.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Case Study. DNA Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known.
RNA-Seq and RNA Structure Prediction
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Sequence Alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
06 - Boundary Models Overview Edge Tracking Active Contours Conclusion.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Dr. Naveed Ahmad Assistant Professor Department of Computer Science University of Peshawar.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
Prediction of Secondary Structure of RNA
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
The analytics of constrained optimal decisions microeco nomics spring 2016 the monopoly model (I): standard pricing ………….1optimal production ………….2 optimal.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Dynamic Programming for the Edit Distance Problem.
The Hungarian algorithm for non-square arrays
Merge Sort 5/28/2018 9:55 AM Dynamic Programming Dynamic Programming.
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Karnaugh Maps (K-Maps)
Discussion section #2 HW1 questions?
SPIRE Normalized Similarity of RNA Sequences
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Brute Force Approaches
Fast Sequence Alignments
Dynamic Programming (cont’d)
Intro to Alignment Algorithms: Global and Local
Comparative RNA Structural Analysis
Advanced Analysis of Algorithms
RNA Secondary Structure Prediction
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
SPIRE Normalized Similarity of RNA Sequences
Graph Searching.
Merge Sort 1/18/ :45 AM Dynamic Programming Dynamic Programming.
Dynamic Programming Merge Sort 1/18/ :45 AM Spring 2007
CSE 589 Applied Algorithms Spring 1999
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Estimation Error and Portfolio Optimization
Merge Sort 2/22/ :33 AM Dynamic Programming Dynamic Programming.
Dynamic Programming II DP over Intervals
Lecture 6 Dynamic Programming
Merge Sort 4/28/ :13 AM Dynamic Programming Dynamic Programming.
Lecture 5 Dynamic Programming
Dynamic Programming Merge Sort 5/23/2019 6:18 PM Spring 2008
ECE 352 Digital System Fundamentals
Presentation transcript:

Predicting the Secondary Structure of RNA Mitra Shokat Spring 2017 For my project, I chose to write an algorithm predicts the optimal secondary structure of any given string of RNA.

Optimizing RNA Secondary Structure Goal - Given a sequence of RNA, determine most probable secondary structure What determines secondary structure? Base pairings Thermodynamics The secondary structure of RNA is formed by interactions between base pairs on the same sequence of RNA. There are a few algorithms that already exist for predicting optimal secondary structures. Challenge: many thermodynamic factors that have to be considered when deciding which set of base pair interactions is most favorable, and it is difficult to incorporate all of these factors into one algorithm For my project, I chose to implement a variation of the Nussinov Algorithm, which simplifies the problem of predicting structures by defining the optimal structure as the one with the most base pairings and no pseudoknots.

Nussinov Algorithm Dynamic programming approach to predicting RNA secondary structure Input – sequence of RNA (string) Output – graphical representation of base pairings in optimal secondary structure Simplifying assumptions: Optimal structure is one that contains maximum number of base pairings Pseudoknots not allowed Nussinov algorithm = dynamic programming approach to predicting the optimal secondary structure of a given strand of RNA. It breaks down the sequence of RNA into smaller sequences, finds the optimal structure of those subsequences, and then combines subsequences to determine the overall structure. My program takes a sequence of RNA as input and outputs a graphical representation of the optimal structure by showing which bases are paired. Again, this optimal structure is only based on maximizing the number of paired bases. Dataset (test then alanine)

Pseudocode Initialize matrix: Fill main diagonal and diagonal below it with zeros Fill matrix: For each index, choose option that yields max score - 4 options: 1. pair rna[i] and rna[j] and attach to best structure for rna[i+1:j-1] 2. add rna[i] to best structure of rna[i+1:j] 3. add rna[j] to best structure of rna[i:j-1] 4. combine two optimal structures for rna[i:k] and rna[k+1:j] Backtrack: For each index in matrix, backtrack to source of maximum score https://bibiserv.cebitec.unibielefeld.de/sadr2/rnasecondarystructure/secondarystructureprediction/index.html create a scoring matrix that compares the sequence of RNA to itself. for each index i,j , we fill in the table with the maximum number of bases that can be paired in the substring from index i to j At each position, there are four options for filling in the score (they’re shown here in pic). And at each index we choose the option with the highest score. After the scoring matrix is complete, the algorithm uses a backtracking method to retrace its steps in order to find the optimal structure. I used the python graphics library to convert the program’s output into a visual representation of which bases are paired with which in the optimal structure. S.Will. “RNA Structure and RNA Structure Prediction”MIT.2011

Results Input - 'GACACGACGA’ Predicted Output - Actual Output - So with my short test sequence, the program ran well. This is an image I drew of the predicted optimal secondary structure given this particular strand of RNA. And here is the program’s output. And as you can see, they match.

Results Input – alanine tRNA sequence Predicted Output - Actual Output - However, I had some more problems when I tried to run the program on the sequence of alanine tRNA. This is what the secondary structure of this particular tRNA looks like. And this is the output structure my program gave. So as you can see, they do not match. But this result did reveal some ways I might be able to improve the program. First, I could allow for the wobble pair, which is G paired with U. Second, I could someone change the weights of certain scores so as to take into account the number of consecutively paired bases, because having more of these is more favorable.