RNA Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007.
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
RNA-Seq and RNA Structure Prediction
Chapter 13 - Transcription
Developing Pairwise Sequence Alignment Algorithms
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Protein Synthesis Pages Part 3. Warm-Up: DNA DNA is a double stranded sequence of ___________ (smallest unit of DNA). 2.Short segments of.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Strand Design for Biomolecular Computation
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Prediction of Secondary Structure of RNA
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.
molecule's structure prediction
Structure of Nucleic Acids
RNA sequence-structure alignment
Stochastic Context-Free Grammars for Modeling RNA
Lecture 21 RNA Secondary Structure Prediction
Lab 8.3: RNA Secondary Structure
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
Dynamic Programming (cont’d)
Identification and Characterization of pre-miRNA Candidates in the C
Intro to Alignment Algorithms: Global and Local
Predicting the Secondary Structure of RNA
Comparative RNA Structural Analysis
RNA Secondary Structure Prediction
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Presentation transcript:

RNA Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: xudong@missouri.edu 573-882-7064 (O) http://digbio.missouri.edu

Final Report Due on Dec. 8. A numerical score (0-15) will be assigned for based on Clear formulation of the project (2) Method (4) Significant results achieved (4) Discussion (3) Writing of the project (2)

Final Presentation Preferably shown in powerpoint file, pdf is fine Preferably 20 minutes (up to 25 min), plus 5min for questions 15 points for the presentation (introduction, methods, results, discussions) 15 points for software demo Implementation of the software Major functionalities Documentation Perform a test run

Presentation Evaluation A numerical score (0-15) will be assigned based on Did the student put enough effort? (3) Is the work interesting or novel? (3) Is the method technically sound? (3) Is the discussion insightful? (3) Is the presentation clear? (3)

Software Demo A numerical score (0-15) will be assigned based on Whether the program can run using actual biological data (3) Documentations (3) Whether it is easy to use (3) Performance in accuracy (3) Performance in computing time and memory usage (3)

Outline RNA Secondary Structure Comparative Approach Base-Pair Maximization Free Energy Minimization Local Structure Prediction

RNA Types siRNA, short interfering RNA; miRNA, microRNA; small temporal RNA stRNA; snoRNA small nucleolar RNA ; snRNA: Small nuclear RNA.

Features of RNA RNA: polymer composed of a combination of four nucleotides adenine (A) cytosine (C) guanine (G) uracil (U)

Features of RNA G-C and A-U form complementary hydrogen bonded base pairs (canonical Watson-Crick) G-C base pairs being more stable (3 hydrogen bonds) A-U base pairs less stable (2 bonds) non-canonical pairs can occur in RNA -- most common is G-U

RNA Pairs A-U G-C G-U

RNA Structure Hierarchy Primary structure: 5’ ACCACCUGCUGA 3’ Secondary Structure Tertiary structure:

Secondary Structure Categories Hairpin loop Hairpin loop Stem Stem Internal loop Internal loop Bulge loop Bulge loop Pseudoknots

tRNA structure

Circular Representation

Assumptions in Secondary Structure Prediction Most likely structure similar to energetically most stable structure Energy associated with any position is only influenced by local sequence and structure Structure formed does not produce pseudoknots

Exceptions Pseudoknot Kissing hairpins Hairpin-bulge Do not obey “parentheses rule”

Outline RNA Secondary Structure Comparative Approach Base-Pair Maximization Free Energy Minimization Local Structure Prediction

Inferring Structure By Comparative Sequence Analysis First step is to calculate a multiple sequence alignment Requires sequences be similar enough so that they can be initially aligned Sequences should be dissimilar enough for correlated mutation to be detected

Mutual Information fxi : frequency of a base in column i fxixj : joint (pairwise) frequency of a base pair between columns i and j Information ranges from 0 and 2 bits If i and j are uncorrelated, mutual information is 0

Mutual Information Plot

Mutual Information Plot

Outline RNA Secondary Structure Comparative Approach Base-Pair Maximization Free Energy Minimization Local Structure Prediction

Dot Plot

Base-Pair Maximization Find structure with the most base pairs Efficient dynamic programming approach to this problem introduced by Nussinov (1970s). Four ways to get the best structure between position i and j from the best structures of the smaller subsequences

Nussinov Algorithm 1)      Add i,j pair onto best structure found for subsequence i+1, j-1 2)      add unpaired position i onto best structure for subsequence i+1, j 3)      add unpaired position j onto best structure for subsequence i, j-1 4)      combine two optimal structures i,k and k+1, j

Dynamic Programming - 1 Notation: e(ri,rj) : free energy of a base pair joining ri and rj S(i,j) : optimal free energy associated with segment ri…rj

Dynamic Programming - 2 i is unpaired, added on to a structure for i+1…j S(i,j) = S(i+1,j) j is unpaired, added on to a structure for i…j-1 S(i,j) = S(i,j-1)

Dynamic Programming - 3 i j paired, but not to each other; the structure for i…j adds together structures for 2 sub regions, i…k and k+1…j S(i,j) = max {S(i,k)+S(k+1,j)} i j paired, added on to a structure for i+1…j-1 S(i,j) = S(i+1,j-1)+e(ri,rj) i<k<j

Dynamic Programming - 4 Since there are only four cases, the optimal score S(i,j) is just the maximum of the four possibilities:

j A U C G Initialisation: No close basepairs i

j A U C G 2 3 1 Propagation: C5….U9 : C5 unpaired: S(6,9) = 0 2 3 1 Propagation: C5….U9 : C5 unpaired: S(6,9) = 0 U10 unpaired: S(5,8)=0 C5-U10 paired S(6,8) +e(C,U)=0 C5 paired, U10 paired: S(5,6)+S(7,9)=0 S(5,7)+S(8,9)=0

j A U C G 2 3 5 6 1 Propagation: C5….G11 : C5 unpaired: S(6,11) = 3 2 3 5 6 1 Propagation: C5….G11 : C5 unpaired: S(6,11) = 3 G11 unpaired: S(5,10)=3 C5-G11 paired S(6,10)+e(C,G)=6 C5 paired, G11 paired: S(5,6)+S(7,11)=1 S(5,7)+S(8,11)=0 S(5,8)+S(9,11)=0 S(5,9)+S(10,11)=0

j A U C G 2 3 5 6 8 10 12 1 Propagation: i

j A U C G 2 3 5 6 8 10 12 1 Traceback: i

Final Prediction AUACCCUGUGGUAU Total free energy: -12 kcal/mol U G C G C U U G AUACCCUGUGGUAU Total free energy: -12 kcal/mol

Some Notes Computational complexity: N3 Does not work with pseudo-knot (would invalidate DP algorithm) Methods that include pseudo knots: Rivas and Eddy, JMB 285, 2053 (1999) These methods are at least N6

Outline RNA Secondary Structure Comparative Approach Base-Pair Maximization Free Energy Minimization Local Structure Prediction

Energy Minimization Methods RNA folding is determined by biophysical properties Energy minimization algorithm predicts the correct secondary structure by minimizing the free energy (G) G calculated as sum of individual contributions of: loops base pairs secondary structure elements Energies of stems calculated as stacking contributions between neighboring base pairs

Example for Thermodynamic Parameters

Calculating Best Structure sequence is compared against itself using a dynamic programming approach similar to the maximum base-paired structure instead of using a scoring scheme, the score is based upon the free energy values Gaps represent some form of a loop The most widely used software that incorporates this minimum free energy algorithm is MFOLD.

How well do they perform? Current RNA folding programs get about 60-70% of base pairs correct, on average: useful, but not yet good. The problem is the scoring system: thermodynamic model is accurate within 5-10%, and many alternative structures are within 10%. Possible solution: combination of thermodynamic score with comparative sequence information

Outline RNA Secondary Structure Comparative Approach Base-Pair Maximization Free Energy Minimization Local Structure Prediction

RNA Motif in HIV TAR motif: Transactivating Response Element

RNA Motifs Associated with Transcription termination Rho-independent terminator stop the transcription process via its hairpin structure

Algorithm in Rnall Definition 1. A “match” : canonical base pairs Definition 2. A “mismatch”: non-canonical base pair Definition 3. An “insertion”/“deletion”: nucleotide unpaired

RNA LSS in HIV TAR (30) DIS (260) PolyA (82) SD (292) PSI (319)

Some RNA Resource Comparative RNA web site http://www.rna.icmb.utexas.edu/ RNA world http://www.imb-jena.de/RNA.html RNA page by Michael Suker http://www.bioinfo.rpi.edu/~zukerm/rna/ RNA structure database http://www.rnabase.org/ http://ndbserver.rutgers.edu/ (nucleic acid database) http://prion.bchs.uh.edu/bp_type/ (non canonical bases) RNA structure classification http://scor.berkeley.edu/ RNA visualisation http://ndbserver.rutgers.edu/services/download/index.html#rnaview http://rutchem.rutgers.edu/~xiangjun/3DNA/

Suggested reading: Optional reading: Reading Assignments Chapter 14 in “Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.” Optional reading: http://www.bioinfo.rpi.edu/~zukerm/seqanal/mfold-3.0-manual.pdf