Modeling RNA motifs by graph-grammars

Slides:



Advertisements
Similar presentations
Molecular Computing Formal Languages Theory of Codes Combinatorics on Words.
Advertisements

Satellite Workshop on RNA Ontology RNA 2005 Tenth Annual Meeting of the RNA Society Banff, May 23st and May 24rth.
Towards RNA structure prediction: 3D motif prediction and knowledge-based potential functions Christian Laing Tamar Schlick’s lab Courant Institute of.
Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestricted Context-sensitive.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer.
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Project 4 Information discovery using Stochastic Context-Free Grammars(SCFG) Wei Du Ranjan Santra May 16, 2001.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Quantifying Basepair Isostericity Jesse Stombaugh 1, Craig L. Zirbel 2, Eric Westhof 4, and Neocles B. Leontis 3,* 1 Department of Biological Sciences,
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
A basis for computer theory and A means of specifying languages
Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008.
Recap Sometimes it is necessary to conduct Bad Science – often the product of having too much information Human Genome Project changed natural scientists.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
MC-Search: a three-dimensional RNA pattern matching tool Martin Larose, Patrick Gendron and François Major Département d'informatique et de recherche opérationnelle.
Normal forms for Context-Free Grammars
R AG P OOLS : RNA-As-Graph-Pools A Web Server to Assist the Design of Structured RNA Pools for In-Vitro Selection R AG P OOLS : RNA-As-Graph-Pools A Web.
Chapter 3: Formal Translation Models
Structure Mapping Working Group. RNA Secondary Structure Experimental Constraints: Enzymatic Cleavage –Paired nucleotides –Unpaired nucleotides FMN Cleavage.
A. niger 2aaa:1-353 Acid  -amylase B. cereus J. Biochem 113: Oligo-1,6 glucosidase B. circulans 1cdg:1-3821cgt:1-382 Cyclodextrin glycosyltransferase.
Romain Rivière AReNa –  Characterise RNA families  Improve non-coding RNA identification in genomic data  Determine the RNA players in.
APPLICATIONS OF CONTEXT FREE GRAMMARS BY, BRAMARA MANJEERA THOGARCHETI.
Sampletalk Technology Presentation Andrew Gleibman
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
CS 3240: Languages and Computation Context-Free Languages.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Parsing A Bacterial Genome Mark Craven Department of Biostatistics & Medical Informatics University of Wisconsin U.S.A.
EB3233 Bioinformatics Introduction to Bioinformatics.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
This seems highly unlikely.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Transforming Grammars. CF Grammar Terms Parse trees. – Graphical representations of derivations. – The leaves of a parse tree for a fully filled out tree.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Motif Search and RNA Structure Prediction Lesson 9.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Great Theoretical Ideas in Computer Science for Some.
Introduction to NP Instructor: Neelima Gupta 1.
Protein Tertiary Structure Prediction Structural Bioinformatics.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Genome Annotation (protein coding genes)
CS510 Compiler Lecture 4.
Syntax Analysis Sections :.
N-Gram Model Formulas Word sequences Chain rule of probability
The chloroplast 4.5S ribosomal RNA
Faculty of Computer Science and Information System
Presentation transcript:

Modeling RNA motifs by graph-grammars

Madison (ROC)2 MC-Tools: Functions ( MC-Annotate 3-D ) -> graph ( MC-Cycles graph ) -> [ NCM ] ( MC-Seq graph ) -> [ sequence ] ( MC-Fold sequence ) -> [ graph ] ( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ] ( MC-Search ( graph, [ 3-D ] ) -> [ 3-D ] ( MC-Sym graph ) -> [ 3-D ]

Madison (ROC)3 MC-Tools: Objects (rat 28S rRNA sarcin/ricin stem-loop) Sequence: GGGUGCUCAGUACGAGAGGAACCGCACCC Graph: Nucleotide cyclic motifs: 3-D structure: Szewczak et al. PNAS(USA) 1993 Lemieux & Major NAR 2006 Parisien, Thibault & Major (in prep.) ( MC-Fold sequence ) -> [ graph ] ( MC-Sym graph ) -> [ 3-D ]

Madison (ROC)4 Graph ( MC-Annotate 3-D ) -> graph Gendron, Lemieux & Major JMB 2001 Lemieux & Major NAR 2002 Leontis & Westhof RNA 2001

Madison (ROC)5 Shortest Cycle Basis C1 C5 C4 C3 C2 X1X1 X2X2 X3X3 X4X4 Y1Y1 Y2Y2 Y3Y3 5’ 3’ ( MC-Cycle graph ) -> [ NCM ] Horton SIAM J Comp 1987 St-Onge et al. NAR 2007

Madison (ROC)6 The Nucleotide Cyclic Motifs (NCM) i.Embrace indistinctly all base pairing types (Watson-Crick and others) ii.Precisely designate how any nucleotide in the sequence relate to others iii.Are joined through a common base pair (context). This helps us predict coherent chains of NCMs and to project them in 3-D. Tentative definition of a motif: “ordered” chain of NCMs. iv.Recur within and across all RNAs v.Are short (< 10 nts; most of 3 to 5 nts) vi.Compose the classical motifs (cf. GRNA tetraloop; sarcin/ricin motif, etc). There are exceptions (cf. AA platform). Lemieux & Major (2006) NAR 34:2340 Parisien, Thibault & Major (in prep.)

Madison (ROC)7 Aim We want a computational model that can encode the valid sequences and structural features of RNA motifs. Hypothesis: A relation between the sequence and the structure of RNA motifs exists.

Madison (ROC)8 Graph Grammars A graph grammar is to a set of graphs what a formal generative grammar is to a set of strings, i.e. a precise and formal description of that set. A graph-grammar consists of a set of rules or productions for transforming graphs. Formally, a graph-grammar, H = {N, , P}, consists of a set of non ‑ terminal symbols, N, a set of terminal symbols, , and a set of production rules, P. Hypothesis: NCMs are “independent” building blocks. Nagl Computing 1976 Nagl In H. Ehrig et al., eds 1987 St-Onge et al. NAR 2007

Madison (ROC)9 ⇒ Sarcin/Ricin Graph Grammar N = {C1, C2, … C5}, the set of NCMs:  = {S1, S2, … S5} the sets of sequences for each NCM: P is a set of consistent assignment of the sequences in  to the NCMs in N (production rules): ARNt levure 23S H. marismortui 16S E. coli ⇒ ⇒ St-Onge et al. NAR 2007

Madison (ROC)10 Sarcin/Ricin Building Blocks C1 : Theoretical : 256 (16 x 16) IMs : 120 (10 x 12) PDB : 7 C2 : Theoretical : 64 (16 x 4) IMs : 40 (10 x 4) PDB : 5 Theoretical : 16 IMs : 10 PDB : 15 A AA U U A A U A C3 : Theoretical : 64 (16 x 4) IMs : 56 (14 x 4) PDB : 2 C4 : Theoretical : 256 (16 x 16) IMs : 160 (16 x 10) PDB : 3 C5 : Theoretical : 64 (16 x 4) IMs : 40 (10 x 4) PDB : 8 A GU G A G U A G A St-Onge et al. NAR 2007

Madison (ROC)11 ( MC-Seq sarcin-ricin-graph ) -> [ sequence ] Sequences supported by the NCMs in the PDB: AGUA-GAAAGUA-AAA GGUA-GAAGGUA-AAA If we remove the instances of the sarcin/ricin motifs ( MC-Search ( sarcin-ricin-graph, [ PDB ] ) ) -> [ 3-D ] Then, the same four sequences are supported => NCMs are found outside the sarcin/ricin context Larose et al. (in prep.) St-Onge et al. NAR 2007

Madison (ROC)12 Graph Grammar Parsing Westhof (personal comm.) St-Onge et al. NAR sequences aligned according to E. coli 23S rRNA structure; site /

Madison (ROC)13 MC-Seq PDB Alignement: 5S, 16S, 23S AGUA-AAA AGUA-GAA GGUA-GAA GGUA-AAA AAUA-AAA AAUA-GAA ACUA-AAA ACUA-GAA ACUA-GAC AGUA-AAC AGUA-CAA AGUA-GAC AGUA-GAU AGUA-GCC AGUA-GGG AGUA-GUG AGUC-GAA AUUA-GAA CGUA-GAA GAUA-GAA GGUA-GAU GUUA-GAA UGUA-GAA UGUA-GAC Isostericity matrices sequences Validation (MC-Seq vs. PDB vs. Alignment) St-Onge et al. NAR 2007

Madison (ROC)14 Perspectives We want to develop a version of MC-Seq that would be useful during the alignment process. PDB does not seem to contain enough structural information yet. To avoid too many sequences, the NCMs (context) are necessary. Two more things need to be considered…

Madison (ROC)15 Sarcin/Ricin (Sequence/Structure Space Is Not Simple) St-Onge et al. (in prep.)

Madison (ROC)16 Modeling In 3-D Might Be Necessary Alignment AUUA-GAA (0.9 Å ) MC-Fold CAUU-AAG (2.1 Å ) St-Onge et al. NAR 2007

Madison (ROC)17 Acknowledgments Martin Larose (Res. assistant) Philippe Thibault (Res. assistant) Patrick Gendron (Res. assistant) Romain Rivière (Postdoc, CS) Véronique Lisi (Ph.D. Molecular Biology) Marc Parisien (Ph.D. Computer Science) Emmanuelle Permal (Ph.D. Bioinformatics) Karine St-Onge (Ph.D. Computer Science) Louis-Philippe Lavoie (M.Sc. Bioinformatics) Maxime Caron (M.Sc. Bioinformatics) Caroline Louis-Jeune (M.Sc. Bioinformatics) Montréal: Pascal Chartrand Gerardo Ferberye Sylvie Hamel Sébastien Lemieux Pascale Legault Luc Desgroseillers Kathy Borden Daniel Lamarre Éric Westhof (Strasbourg) Alain Denise (Paris) Dave Mathews (Rochester)