Poster Design & Printing by Genigraphics ® - 800.790.4001 Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
18-21 August 2009 The Biosphere August 2009 Secondary structure of small subunit ribosomal RNA 5' end 3' end Image adapted from R. Gutell
RNA Structure Prediction
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Zhi John Lu, Jason Gloor, and David H. Mathews University of Rochester Medical Center, Rochester, New York Improved RNA Secondary Structure Prediction.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Course information To reach me: Barry Cohen GITC 4301 W 4:00-5:30 F 4:45-5:55 Web site,
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Predicting RNA Structure and Function
An Investigation into Selection Constraints in RNA Genes Naila Mimouni, Rune Lyngsoe and Jotun Hein Department of Statistics, Oxford University Aim A robust.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Project No. 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Nucleic Acid Secondarily Structure AND Primer Selection Bioinformatics
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Strand Design for Biomolecular Computation
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
CS397-CXZ Algorithms in Bioinformatics ChengXiang (“Cheng”) Zhai, Robert Skeel (Department of Computer Science) Nick Sahinidis (Department of Chemical.
Transformational Grammars and PROSITE Patterns Roland Miezianko CIS Bioinformatics Prof. Vucetic.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
© Wiley Publishing All Rights Reserved. RNA Analysis.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
CS5263 Bioinformatics RNA Secondary Structure Prediction.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Progress toward Predicting Viral RNA Structure from Sequence: How Parallel Computing can Help Solve the RNA Folding Problem Susan J. Schroeder University.
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
The Chinese University of Hong Kong
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
杜嘉晨 PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs.
RNA Structure Prediction
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.
Genome Annotation (protein coding genes)
Stochastic Context-Free Grammars for Modeling RNA
Vienna RNA web servers
Predicting RNA Structure and Function
Stochastic Context-Free Grammars for Modeling RNA
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Computational Genomics of Noncoding RNA Genes
Chem 291C Draft Sample Preliminary Seminar
Presentation transcript:

Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology Introduction Methods Conclusions Results References Figure 2. Pfold Grammar. Parse Tree Figure 5. F-Measure accuracy MFE vs. Stochastic Grammars Abstract Contact David Esposito Georgia Institute of Technology Phone: Websites: code.google.com/p/gt-jscfg-2/ Accurate RNA secondary structure prediction is an important problem in computational biology. Different RNA nucleotide sequences often fold to similar structures causing current prediction algorithms to range widely in accuracy for RNA strands with similar structures. To understand the origins of these inaccuracies we trained a stochastic context free grammar on a hard-to-predict training set and an easy-to-predict training set which corresponds to a set of sequences with low and high prediction accuracy respectively. We found interesting statistical differences in the nucleotide composition of the sequence as well as the distribution of nucleotide base pairs between the two training sets. Stochastic context free grammars provide a means to quantify subtle difference in the composition of native secondary structures. The discovery of these differences could potentially lead to the improvement of current prediction algorithms. We are currently performing a parametric analysis of several prediction methods. Forming training sets RNA 5S, 16S, 23S and tRNA structures and sequences were acquired from the Comparative RNA Web site[2]. RNAfold[4] was used to predict on each unambiguous sequence recording accuracy score according to F-Measure. See Figure 4. Thirty-two training sets were formed by differentiating size, prediction difficulty and RNA class. Training the P-Fold grammar Grammar trained on each set. The min, max, mean and standard deviation was recorded for each parameter set. Original Pfold parameters most closely matched my Grammars parameters for the hard tRNA training set. Secondary structure prediction The trained grammar was used to predict a structure for each of the sequences within its training set, recording and comparing accuracy to MFE for each structure. There seems to exist multiple characteristics of RNA sequences and structures which can be used to infer the accuracy of the secondary structure prediction using current prediction methods. If it can be shown that a sequence is difficult for MFE to predict, then it is probable that my stochastic grammar algorithm will predict the secondary structure more accurately. RNA RNA has 4 main classes: tRNA, 5S, 16S and 23S from shortest to longest.[2] Each class folds to a similar 2D structure. The secondary structure dictates the function of the RNA [2] RNA secondary structure prediction Open problem in the fields of Biology, Mathematics, and Computer Science. Efficient algorithms exits. MFE (Minimum Free Energy) algorithm scores structures chemical stability. Arguably most accurate prediction method. Accuracies according to F-Measure range from for 16S RNA class. See Figure 3. Based on Nearest Neighbor Thermodynamic Model [2] SCFG (Stochastic Context Free Grammars) CFG's are languages which describe structural possibilities of words within the language [3]. See Figure 1 and 2. SCFG describe the likeliness of a word existing in the language [3] 1.Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Miller K, Pande N, Shang Z, Yu N, Gutell R: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 2002, 3. 2.Mathews D.H., Schroeder S.J., Turner D.H., and Zuker M. RNA World. Cold Spring Harbor Labratory Press, 3rd edition, Sean R. Eddy Richard Durbin, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis. Cambridge University Press, Ivo L. Hofacker. Vienna rna secondary structure server. Nucleic Acids Research, 31(13):3429–3431, Figure 1. The Pfold Grammar Figure 7. p(t→c) vs. p(t→u) for 5S and 16S. Improved RNA Secondary Structure Prediction Using Stochastic Context Free Grammars. Figure 3. F-Measure Definition. Figures 4. 5S and 16S F-Measure Distribution Figure 6. Canonical vs. Non-Canonical base pair probabilities for 5S and 16S. Predicting with the trained grammar My grammar performed similar to MFE scoring base pairs similarly MFE gained greater rewards and suffered greater penalties on average for predicting canonical-base pairs My grammar performed better than MFE on average for hard sets shown in Figure 5. Examining base pair ratios MFE only predicts canonical base pairs (gc, au, gu) while Pfold considers all base pairs. Figure 6 shows MFE predicting well on sequences with many canonical base pairs. Examining nucleotide ratios Because g can pair with c or u, the less options MFE has ( count(c) >> count(u)) the better MFE performed in prediction. See Figure 7 The difference p(t→c|t) – p(t→u|t) was over 7 times larger on average for the easy sets.