Stochastic Context-Free Grammars for Modeling RNA

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
Parsing Clothing in Fashion Photographs
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Theory By Johan Walters (SR 2003)
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Structural bioinformatics
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Project 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May 16, 2001.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Project No. 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Analysis of Algorithms Chapter 11 Instructor: Scott Kristjanson CMPT 125/125 SFU Burnaby, Fall 2013.
Some Probability Theory and Computational models A short overview.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Expected accuracy sequence alignment Usman Roshan.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Expected accuracy sequence alignment Usman Roshan.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.
Hidden Markov Models BMI/CS 576
An Iterative Approach to Discriminative Structure Learning
Stochastic Context-Free Grammars for Modeling RNA
Vienna RNA web servers
Lecture 21 RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
N-Gram Model Formulas Word sequences Chain rule of probability
Stochastic Context Free Grammars for RNA Structure Modeling
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Paradigms for Computational Nucleic Acid Design
Ab Initio Profile HMM Generation
LECTURE 15: REESTIMATION, EM AND MIXTURES
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Simulation of Hybridization
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Presentation transcript:

Stochastic Context-Free Grammars for Modeling RNA Y. Sakakibara, M. Brown, R. C. Underwood, I. S. Mian, and D. Haussler Proceedings of the 27th Hawaii International Conference on System Sciences Jang HaYoung

Introduction Phylogenetic analysis for homologous RNA molecules Alignment and subsequent folding of man sequences into similar structures. Energy minimization Thermodynamic parameters and computer algorithms to evaluate the optimal and suboptimal free energy folding of an RNA species.

Introduction HMM approach Formal grammar Two positions base-paired in the typical RNA are treated as having independent distributions. Formal grammar Base pairing in RNA can be described by a context-free grammar

A G U G U C A C U U C A C U G G A U G U Base Pair Nesting RNA base pairs are usually nested: A G U G U C G G C U C A C U Unnested RNA base pairs also occur Called pseudoknots Many algorithms ignore pseudoknots A G U G U C A C U U C A C U G G A U G U

Context-free grammars for RNA SCFG Generalization from HMM Learn the parameters from a set f unaligned primary sequences with a novel generalization of the forward-backward algorithm commonly used to train HMM Modularity: two separate grammars can be combined into a single grammar

Context-free grammars for RNA

Context-free grammars for RNA SSS, SaSa, SaS, SS, Sa SaSa: base pairings in RNA SaS, SSa: unpaired bases SSS: branched secondary structures SS: used in the context of multiple alignments

Context-free grammars for RNA

Stochastic context-free grammars Stochastic context-free grammar G The probability distribution of a parse tree can be calculated as the product of the probabilities of the production instances in the tree. The probability of a sequence s is the sum of probabilities over all possible parse trees or derivations that could generate s

Estimating SCFG from sequences Estimation Maximization training algorithm Theory of stochastic tree grammars Tree grammars are used to derive labeled trees instead of strings EM part readjust the production probabilities to maximize the probability of these parses.

Estimating SCFG from sequences Design a rough initial grammar which might represent only a portion of the base pairing interaction. Estimate a new SCFG using the partially folded sequences and our EM training algorithm. Obtain more accurately folded training sequences and reestimate the SCFG

Experimental Result A training set of unfolded and unaligned RNA sequences

Experimental Result Discriminating tRNAs Multiple sequence alighments Prediction of secondary structure Introns

Discussion SCFGs may provide a flexible and highly effective statistical method in a number of problems for RNA sequences. How much prior knowledge about the structure of the RNA class being modeled is necessary