Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Gene expression From Gene to Protein
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
[Bejerano Fall10/11] 1.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Transcription: Synthesizing RNA from DNA
RNA-Seq and RNA Structure Prediction
Central Dogma & PCR B Wang Yu-Hsin.
Chapter 13 - Transcription
Transcription How the Information in DNA Is Used to Produce RNA in Prokaryotes and Eukaryotes.
GENETICS ESSENTIALS Concepts and Connections SECOND EDITION GENETICS ESSENTIALS Concepts and Connections SECOND EDITION Benjamin A. Pierce © 2013 W. H.
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Posttranscriptional Modification
Strand Design for Biomolecular Computation
Computational ncRNA gene finding ncRNA structure prediction
From Gene to Protein Chapter 17.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
KEY CONCEPT DNA structure is the same in all organisms.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
KEY CONCEPT DNA structure is the same in all organisms.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Gene expression DNA  RNA  Protein DNA RNA Protein Replication Transcription Translation Degradation Initiation Elongation Processing Export Initiation.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 6:
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
C omputational ncRNA gene finding (& nc RNA structure prediction) Liming Cai Fall 2015) nc RNA structure prediction (& computational ncRNA.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Lecture 4: Transcription in Prokaryotes Chapter 6.
Motif Search and RNA Structure Prediction Lesson 9.
Protein Synthesis RNA, Transcription, and Translation.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.
Liming Cai’s Research Department of Computer Science The University of Georgia.
bacteria and eukaryotes
NUCLEIC ACID RNA DNA.
Biochemistry Free For All
Lecture 21 RNA Secondary Structure Prediction
Lab 8.3: RNA Secondary Structure
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
KEY CONCEPT DNA structure is the same in all organisms.
Dynamic Programming (cont’d)
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Chapter 6: Transcription and RNA Processing in Eukaryotes
KEY CONCEPT DNA structure is the same in all organisms.
4/6 Objective: Explain the steps and key players in transcription.
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
4/2 Objective: Explain the steps and key players in transcription.
7.2 Transcription and gene expression
Presentation transcript:

Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology University of Georgia

Why another RNA folding algorithm? The need for RNA analysis tools has increased because of the number of recently found functional RNAs (i.e., ncRNAs). RNA folding algorithms are not completely satisfactory in spite of having been intensively studied for more than 25 years.

Increased number of ncRNAs ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures

Increased number of ncRNAs rRNAs and tRNAs RNA maturation: snRNA in recognizing splicing sites RNA modification: snoRNA converting uridine to pseudo-uridine Regulation of gene expression and translation: e.g., miRNAs DNA replication: e.g., telomerase RNAs - template for addition of telomeric repeats Etc. In introns, intergenic regions, or 5’ and 3’ UTRs,

Increased number of ncRNAs (Bompfunewerer, et al, 2005) ClassSizeFunctionPhylogenetic distribution tRNA70-80Translationubiquitous rRNA 16S/18S 28S+5.8S/23S 5S 1.5K 3K 130 translationubiquitous RNase P MRP tRNA -maturationubiquitous eukarya snoRNA telomerase pseudouridinylation addition of repeats snRNA U1 ~ U Spliceosome mRNA maturation Eukarya Eukarya, archaea U7 7SK ~65 ~300 Histone mRNA Maturation Translational regulation Eukayotes vertebrata tmRNA Tags protein For proteolysis bacteria miRNA~22Post-tran. Reg.Multi-cellular orgs

Long history of RNA foldings First simple RNA folding algorithm (Nussinov 1978) Thermodynamic based (Zuker&Stiegler 1981) Zuker’s (1989) mFOLD 3.2 RNAfold (a part of Vienna Package 1.6.1) Not all that accurate on single sequence Inherent computational complex from DP Unable to predict pseudoknots

Background Base pairings allow RNA to fold Watson-Crick base pairs: A-U, C-G Wobble pair G-U non-canonical pairs are also possible

NN N O H H 5’-u-u-c-c-g-a-a-g-c-u-c-a-a-c-g-g-g-a-a-a-u-g-a-g-c-u-3’ P a P c 5’ 3’ P u a P g P CYTOSINE N N N O H H H N N GUANINE URACIL ADENINE NN O O H N N N N N HH

Secondary structure is important to tertiary structure

Hairpin loop Junction (Multiloop) Bulge Loop Single-Stranded Interior Loop Stem Image– Wuchty Pseudoknot

aacguuccccucugg g gcagcccag a ugccc stem (double helix): stacked base pairs loop: strand of unpaired bases ac c gg u

aacguuccccucuac c gg g gcagcgg u ccag a ugcac c cc Pseudoknots: crossing patterns of stems

terminates translation errors Bacterial tmRNA consensus structure (Felden et al NAR 29)

Pseudoknots in TMV 3’ UTR Promotes efficient translation Binds EF1A, cooperates with 5’UTR (Leathers et al MCB 13 Zeenko et al JVI 76)

Previous work (Nussinov’s) maximizing the number of base pairs (Nussinov et al, 1978) simple case  (i, j) = 1

Previous work (Zuker’s) Thermodynamic energy based method (Zuker and Stiegler 1981) Energy minimization algorithm: find the secondary structure to minimize the free energy (  G)  G calculated as sum of individual contributions of: –loops –base pairs –secondary structure elements

Previous work (Zuker’s) Free-energy values (kcal/mole at 37 o C ) Energies of stems calculated as stacking contributions between neighboring base pairs

Previous work (Zuker’s)

MFOLD: computing loop dependent energies Previous work (Zuker’s)

Difficult issues Energy associated with any position is only influenced by local sequence and structure mFOLD does not predict pseudoknots PKnots: (Eddy and Rivas 1999) predict restricted cases of pseudoknots, O(n 6 ) time and O(n 4 ) space Min energy-based pseudoknot prediction is NP-hard (Lyngso and Pederson 2000)

Pseudoknots drastically increase the complexity

Heuristic RNA folding algoithms ILM (Ruan et al 2004) HotKnots (Ren et al 2005) Fast, sometime slow unlimited class of pseudoknots do not guarantee the optimality of the predicted structure

This work Graph-theoretic based, aviod nucleotide level DP Unlimited pseudoknot structures Optimal solutions Fast Comparable performance in accuracy

This work (summary) 1.Model: similar to ILM, without loop energy 2.Approach: Find all stable stems, construct a stem graph Reduce folding to independent set problem 3.Techniques: tree-decompose the stem graph DP to obtain optimal solution

This work (approach)

A set of non-overlapping stems corresponds to an independent set of the stem graph. The weight of each vertex is related to the energy of the corresponding stem.

This work (techniques) A tree decomposition of the stem graph Tree width t = 4

This work (techniques) A tree decomposition of the stem graph Tree width t = 4 Find an approximate tree decomposition of width t MWIS can be found in time O(2 t N), N=O(n 2 ) by DP over the tree Time can be improved to O(e t/e ) = O(1.44 t )

This work (experimental results) Data sets: 50 tRNAs (length ) 50 pseudoknots ( ) 11 large RNAs ( Compared with PKnots (DP, optimal, restricted pks) ILM (heuristic, unrestricted) HotKnots (heuristic, unrestricted Measure sensitivity = TP/Real total specificity = TP/(TP+FP) Time

This work (experimental results)

Conclusion A new graph-theoretic algorithm to RNA folding Performance comparable with the best in both accuracy and speed With much room to be improved Applications in multiple structure alignment as well as in folding single sequence A part of NIH project for ncRNA gene search