CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
Gene expression From Gene to Protein
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestricted Context-sensitive.
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
CS5371 Theory of Computation
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
Some new sequencing technologies
RNA Secondary Structure Prediction
Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008.
Transformational grammars
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
[Bejerano Fall10/11] 1.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Predicting RNA Structure and Function
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Lecture # 5 Pumping Lemma & Grammar
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Prediction of Secondary Structure of RNA
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Motif Search and RNA Structure Prediction Lesson 9.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Stochastic Context Free Grammars CBB 261 for noncoding RNA gene prediction B. Majoros.
The estimation of stochastic context-free grammars using the Inside-Outside algorithm Oh-Woog Kwon KLE Lab. CSE POSTECH.
Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.
RNA secondary structure Lecture 1- Introduction Lecture 2- Hashing and BLAST Lecture 3- Combinatorial Motif Finding Lecture 4-Statistical Motif Finding.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
CISC667, S07, Lec25, Liao1 CISC 467/667 Intro to Bioinformatics (Spring 2007) Review Session.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Stochastic Context-Free Grammars for Modeling RNA
Lecture 21 RNA Secondary Structure Prediction
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
Dynamic Programming (cont’d)
Intro to Alignment Algorithms: Global and Local
Stochastic Context Free Grammars for RNA Structure Modeling
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Presentation transcript:

CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure

CISC667, F05, Lec19, Liao2 Facts about RNAs Mainly as “information carrier” in protein synthesis –mRNA –tRNA Also as catalysts –Ribozymes Also as regulator –RNAi Single strand –Intra-molecular pairing –Secondary structure –Essential for sequence stability and function Similarity in secondary structure plays a more important role than primary sequence in determining homology for RNAs

CISC667, F05, Lec19, Liao3

4

5

6 RNA stem loop hybridization pairing: A-U, and C-G where “-” stands for a pairing, and “x” for no pairing. In the above example, seq1 and seq2 fold into a similar structure, whereas seq3 does not. Pairwise alignments disregarding such structural restrictions may be misleading; seq2 and seq3 have 70% sequence identity, seq1 and seq3 have 60%, whereas seq1 and seq2 have only 30%. GC AU CG GA AA UA CG GC GA CA UC CU GG GA CA x x x seq1seq2 seq3 seq1 seq2 seq3 C A G G A A A C U G G C U G C A A A G C G C U G C A A C U G

CISC667, F05, Lec19, Liao7 Representations for palindromic structures 1) 2) 3) GC AU CG GA AA C A G G A A A C U G ( ( (.... ) ) ) There is a 1-1 correspondence between RNA secondary structures and well-balanced parenthesis expressions, where the balancing parentheses correspond to base pairings via hydrogen bonds.

CISC667, F05, Lec19, Liao8 Secondary structure prediction Base pair maximization algorithm [Nussinov] m i,j : maximal # of base pairs that can be formed for sequence s i …s j. d i,j = 1, if s i and s j are paired = 0, otherwise Initialization m i,i-1 = 0 for i= [2, L] m i,i = 0 for i = [1,L] Recursion m i,j = max {m i+1,j, m i,j-1, m i+1,j-1 + d i,j, max i<k<j {m i,k + m k+1,j } // bifurcation }

CISC667, F05, Lec19, Liao GGGAAAUCC G G G A A A U C C CG CG UA AA G Dynamic programming table Time Complexity: O(L 3 ), where L is sequence length Space complexity: O(L 2 )

CISC667, F05, Lec19, Liao10 Secondary structure prediction Minimum energry algorithm [Zuker] E i,j : minimum energy for an optimal fold formed by sequence s i …s j. e i,j = -5, if s i, s j is CG or GC = -4, if s i, s j is AU or UA = -1, if s i, s j is GU or UG Initialization m i,i-1 = 0 for i= [2, L] m i,i = 0 for i = [1,L] Recursion E i,j = min {E i+1,j, E i,j-1, E i+1,j-1 + e i,j, min i<k<j {E i,k + E k+1,j } // bifurcation }

CISC667, F05, Lec19, Liao11 Motivation: Both DNA and protein sequences can be considered as sentences in a “language”. We know the “language” is not random What’s the grammar of the language? “The linguistics of DNA” – David Searls, American Scientist, 80(

CISC667, F05, Lec19, Liao12 Chomsky hierarchy of transformational grammars (1956) Regular expression (no recursive structure, cannot handle palindromes) –W  aW, or W  a Context-free –W   Context-sensitive –  1 W  2   1   2 Unrestricted grammars –  1 W  2   Notations: a: any terminals; ,  : any string of non-terminals and/or terminals, including the null string;  : any string of non-terminals and/or terminals, not including the null string. Note: Chomsky normal form: any CFG can be written such that all productions have the following forms: W v  W y W z W v  a

CISC667, F05, Lec19, Liao13 To capture the palindromic structure, a context-free grammar can be given as follows. S  aW 1 u | cW 1 g | gW 1 c | uW 1 a W 1  aW 2 u| cW 2 g| gW 2 c| uW 2 a W 2  aW 3 u| cW 3 g| gW 3 c| uW 3 a W 3  gaaa | gcaa seq1 seq2 C A G G A A A C U G G C U G C A A A G C c a g g a a a c u g w3w3 w2w2 w1w1 S

CISC667, F05, Lec19, Liao14 Stochastic grammars Motivations: –Irregularities (or exceptions to the grammar) in languages –Such irregularities can be taken care of by “growing” your grammar; introducing new non-terminals and new production rules. –It is useful to differentiate productions that account for a large portion of the language from those that are rare exceptions. –This can be achieved by assigning probabilities to various productions. For any non-terminals, the probabilities of all its possible productions must add to 1. Given a sentence x, a stochastic grammar  parsing the sentence will assign a probability P(x|  ) that the sentence belongs to the language specified by the grammar, whereas, a conventional grammar will just give a yes-or-no answer. –  x  language P(x|  ) =1

CISC667, F05, Lec19, Liao15 Stochastic CFG for sequence modeling Calculate optimal alignment of a sequence to a parameterized SCFG. [Cocke-Younger-Kasami algorithm] Calculate probability for a given sequence to belong to the language specified by SCFG. [inside algorithm, and inside-outside algorithm] Given a set of example sequences/structures, estimate optimal probability parameters for a SCFG. Note: –The issues addressed above by SCFGs are quite similar to those for HMMs; actually it can be proved that HMMs are equivalent to stochastic regular grammars.

CISC667, F05, Lec19, Liao16 Resources Lecture notes: RNA informatics: Software: -Vienna RNA fold