RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Mol Genet 4 Struktur DNA dan Replikasi. structure of DNA is a double-stranded, antiparallel helix. (A)Antiparallel nature of the two DNA strands. (B)
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007.
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Zhi John Lu, Jason Gloor, and David H. Mathews University of Rochester Medical Center, Rochester, New York Improved RNA Secondary Structure Prediction.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
1 Ref: Ch. 5 Mount: Bioinformatics i.Protein synthesis: ribosomal RNA transfer RNA messenger RNA ii.Catalysis e.g. ribozymes iii.Regulatory molecules 17.1.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
DNA & RNA Structure Fig 1.9. Deoxyribonucleic acid (DNA) is the genetic material -Stores genetic information in the form of a code: a linear sequence.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
RNA-Seq and RNA Structure Prediction
RNA multiple sequence alignment Craig L. Zirbel October 14, 2010.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Essential Idea The structure of DNA allows efficient storage of genetic information.
Nucleic Acid Secondarily Structure AND Primer Selection Bioinformatics
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Structure and function of nucleic acids.. Heat. Heat flows through the boundary of the system because there exists a temperature difference between the.
Strand Design for Biomolecular Computation
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
© Wiley Publishing All Rights Reserved. RNA Analysis.
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P)
Prediction of Secondary Structure of RNA
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
This seems highly unlikely.
DNA Replication. Learning Targets Describe the replication of DNA. Explain semi-conservative replication and why it is important.
Motif Search and RNA Structure Prediction Lesson 9.
RNA Structure Prediction
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
DNA structure (with a side of RNA). The sugar HOCH 2 OH H H H H HOCH 2 OH H H H.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.
molecule's structure prediction
Structure of Nucleic Acids
RNA sequence-structure alignment
Stochastic Context-Free Grammars for Modeling RNA
Vienna RNA web servers
Reminder-four classes of large biomolecules
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
BTY100-Lec#4.1 Genetic Basis of Life Genetic Makeup © LPU: BTY100.
Algorithms for Structure and Motion in Biology
Reminder-four main classes of large biomolecules
Reminder-we recognize four classes of large biomolecules
Dynamic Programming (cont’d)
Predicting the Secondary Structure of RNA
RNA Secondary Structure Prediction
Figure Number: 27-00CO Title: RNA Catalyst
Presentation transcript:

RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010

RNA primary sequences Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms: UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC A H.m. (structure) GCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGC B E.coli (structure) UCCCCCGUGCCCAUAGCGGCGUGGAACCACCCGUUCCCAUUCCGAACACGGAAGUGAAACGCGCCAGCGCCGAUGGUACUGGGCGGGCGACCGCCUGGGAGAGUAGGUCGGUGCGGGG B T.th. (structure) AGUGGUGGCCAUAUCGGCGGGGUUCCUCCCCGUACCCAUCCUGAACACGGAAGAUAAGCCCGCCAGCGUCCGGCAAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCAC A L27170.1/1-120 GUAGCGGCCACAGCGGUGGGGUUCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGACCCUCUGGGAAACCGGGUUCGCCGCUAC A L27163.1/1-119 GCGGCCAGGGCGGAGGGGAAACACCCGUACCCAUUCCGAACACGGAAGUGAAGCCCUCCAGCGAACCAGCUAGUACUAGAGUGGGAGACCCUCUGGGAGCGCUGGUUCGCCGCC A L27343.1/3-116 UUUGGCGGUCAUGGCGUGGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A M36187.1/5-126 GUUGGCGGUCAUGGCGUGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A X62857.1/1-121 UUUGGCGGUCAUGGCGUGGGGGUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUGUUUUGCUGUGGGAAGCCCAUUUCACUGCCAGCC A X15364.1/6601-6721 GUCGGUGGUGUUAGCGGUGGGGUCACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCUGCCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGACCCCGCCGGCA B M16176.1/4-120 GUCGGUGGUUAUAGCGGUGGGGUCACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCCACCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCACCGCCGGCC B M16177.1/4-120 GUUGGUGGUUAUUGUGUCGGGGGUACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCCGAUUGCGCUGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCGCUGCCAACC B X55255.1/4-120 UACGGCGGUCAAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCAAUGAUACUGCCCUCACCGGGUGGAAAAGUAGGACACCGCCGAAC B X55259.1/3-117 UACGGCGGUCCAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUACCCAUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X55251.1/3-116 UACGGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUGCCCCUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X75601.1/91-203 UAAGGCGGCCAUAGCGGUGGGGUUACUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCGCCUGCGUUCCGGUCAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCUACU A X03407.1/5927-6048 UUGGCGACCAUAGCGGCGAGUGACCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCUCGCCUGCGUUUCGGUCAGUACUGGAUUGGGCGACCCUCUGGGAAAUCUGAUUCGCCGCCACC A L27168.1/1-120 GGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCACCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCC A X02128.1/24-139 GGGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCGCCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCCCU A X14441.1/5-123

RNA can make double helices RNA chains are flexible enough to fold back on themselves and make the same types of basepairs as are found in DNA. These are called “Watson-Crick” basepairs.

Watson-Crick basepairs The main Watson-Crick basepairs are AU and GC. (GU also occurs sometimes.) They can substitute for one another freely without changing the structure of the RNA molecule. They are said to be isosteric, and changes between these basepairs is an example of neutral variability. They are held together by hydrogen bonds (dotted lines). Superposition

Comparative sequence analysis By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU

Comparative sequence analysis By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU ((((((((((-----((((((((----(((((((-------------)))))))))---(((((((((-(((((((--((((((((---))))))))--)))))))---))))))))))-

RNA 3D structure Starting late in the year 2000, high-resolution atomic structures of entire ribosomes have been published. These show the bases, the backbone, the Watson-Crick basepairs, and several new types of basepairs. The 3D structures confirm the predicted secondary structure and show the importance of Watson-Crick basepairs. E. coli 5S RNA

RNA secondary structure prediction Now that we understand the basics of RNA 3D structure and Watson-Crick basepairs, we can pose the problem of predicting the secondary and 3D structure from an RNA sequence. Comparative sequence analysis requires multiple RNA sequences. For now, we will talk about predicting RNA secondary structure from a single sequence.

Three methods to predict secondary structure Dot plots – a way to visualize the possible helices in a sequence. Somewhat primitive, but a good technique to know. Nussinov algorithm – a technique to find the set of basepairs which maximizes the number of basepairs in a sequence Energy methods – find the set of basepairs which results in the lowest energy structure, the one which is likely to be preferred in nature. mfold

Put a dot at the location of each CG or AU pair. Dot plot – make a grid with the RNA sequence down the rows and across the columns. Put a dot at the location of each CG or AU pair. c g u a C G U A

Put a dot at the location of each CG or AU pair. Dot plot – make a grid with the RNA sequence down the rows and across the columns. Put a dot at the location of each CG or AU pair. c g u a C G U A

Dot Plots CGUUUGGGUUCACAAACG ((((((------)))))) “dot-bracket notation” + + + + + G + + + + U + + + + U + + + + U + + + + G + + + + G + + + + G + + + + U + + + + U + + + + C + + + + + A + + + + + C + + + + + A + + + + + A + + + + + A + + + + + CGUUUGGGUUCACAAACG ((((((------)))))) “dot-bracket notation” C + + + + + G + + + +

Nussinov algorithm Finds the largest number of nested Watson-Crick pairs in an RNA sequence. Similar to a dot plot, but we keep track of the cumulative number of nested Watson-Crick basepairs in each subsequence as we go.

Put zeros down the diagonal.

Put ones above the diagonal where there is a CG, GC, AU, or UA pair.

Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

Each cell we fill in tells the maximum number of Watson-Crick pairs in the subsequence down and to the left of the cell.

Fill in more “diagonals” in the same way.

This subsequence only has one nested Watson-Crick basepair, either a GC or a UA, but not both, since they would cross each other.

Finally we come to a subsequence that has two nested Watson-Crick pairs. The cell with the 2 is 1 for the GC pair plus 1 from the cell down and left of it, a UA.

Thermodynamic methods Idea: find the secondary structure with the most favorable (lowest) energy. Zuker method (mfold): Uses Dynamic Programming to calculate structure with lowest free energy • McCaskill method (sfold): Uses Dynamic Programming to calculate the most probable structure (more theoretically rigorous) Used by these programs: Mfold, Sfold, Pfold Assumptions: • Only Nearest Neighbor Interactions need to be considered. • Nearest Neighbor Interactions can be summed to give total free energy. • Pseudoknots and tertiary interactions can be ignored. • Most stable structure is also the kinetically favored structure.

Nearest neighbor parameters This is more sophisticated than simply counting the number of AU, UA, CG, GC basepairs in each subsequence. You also tally up the strength of each pair and the energy of one pair stacking on another pair. Bioinformatics: sequence and genome analysis By David W. Mount

Determining parameters Heat measure absorbance at 260 nm (UV) 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ cuvette

Determining parameters Reaction: Strand1 + Strand2 = Duplex Equilibrium constant for each T: [S1(T)][S2(T)] [Duplex(T)] Keq = Free energy change: ∆G(T) = -RT ln(Keq(T))

Determining parameters Repeat this for many related sequences and do statistical analysis to get pairwise parameters. 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ ∆∆G 5’ - GCCAACCG - 3’ 3’ - CGGUUGGC - 5’ ... ∆∆G - the energy change due to substituting one basepair for another.

Nearest neighbor parameters Parameters for most of the “loop” regions are unknown: There are too many possible loops to do experiments for all of them. Usually, unpaired regions are penalized, but it’s known that certain “loops” are very thermodynamically stable, and they are scored with low free energies (e.g. UNCG hairpin). Hard to extrapolate - small change in sequence - large change in free energy.

Using thermodynamic parameters ∆G° = -RT ln(Keq) = -2.4-2.2-0.9-0.9-2.1-2.4+5.4 = = -4.5 kcal/mol Dynamic Programming

Things to keep in mind Calculated free energies are always approximate Most stable calculated structure is not necessarily most stable real structure Must consider “sub-optimal” calculated structures Must use additional information, if available, to pick correct structure

MFOLD One of the best thermodynamic methods. Developed by Michael Zuker. Web server: http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi Submit a sequence, forget the various parameters you can set, look through the output. Look for: dot plots, multiple possible structures, and minimum free energies for the structures. Look at output in png format unless you prefer another image format. M. Zuker Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15, (2003)