Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.

Similar presentations


Presentation on theme: "RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010."— Presentation transcript:

1 RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010

2 RNA primary sequences Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms: UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC A H.m. (structure) GCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGC B E.coli (structure) UCCCCCGUGCCCAUAGCGGCGUGGAACCACCCGUUCCCAUUCCGAACACGGAAGUGAAACGCGCCAGCGCCGAUGGUACUGGGCGGGCGACCGCCUGGGAGAGUAGGUCGGUGCGGGG B T.th. (structure) AGUGGUGGCCAUAUCGGCGGGGUUCCUCCCCGUACCCAUCCUGAACACGGAAGAUAAGCCCGCCAGCGUCCGGCAAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCAC A L /1-120 GUAGCGGCCACAGCGGUGGGGUUCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGACCCUCUGGGAAACCGGGUUCGCCGCUAC A L /1-119 GCGGCCAGGGCGGAGGGGAAACACCCGUACCCAUUCCGAACACGGAAGUGAAGCCCUCCAGCGAACCAGCUAGUACUAGAGUGGGAGACCCUCUGGGAGCGCUGGUUCGCCGCC A L /3-116 UUUGGCGGUCAUGGCGUGGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A M /5-126 GUUGGCGGUCAUGGCGUGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A X /1-121 UUUGGCGGUCAUGGCGUGGGGGUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUGUUUUGCUGUGGGAAGCCCAUUUCACUGCCAGCC A X / GUCGGUGGUGUUAGCGGUGGGGUCACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCUGCCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGACCCCGCCGGCA B M /4-120 GUCGGUGGUUAUAGCGGUGGGGUCACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCCACCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCACCGCCGGCC B M /4-120 GUUGGUGGUUAUUGUGUCGGGGGUACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCCGAUUGCGCUGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCGCUGCCAACC B X /4-120 UACGGCGGUCAAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCAAUGAUACUGCCCUCACCGGGUGGAAAAGUAGGACACCGCCGAAC B X /3-117 UACGGCGGUCCAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUACCCAUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X /3-116 UACGGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUGCCCCUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X /91-203 UAAGGCGGCCAUAGCGGUGGGGUUACUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCGCCUGCGUUCCGGUCAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCUACU A X / UUGGCGACCAUAGCGGCGAGUGACCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCUCGCCUGCGUUUCGGUCAGUACUGGAUUGGGCGACCCUCUGGGAAAUCUGAUUCGCCGCCACC A L /1-120 GGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCACCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCC A X /24-139 GGGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCGCCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCCCU A X /5-123

3 RNA can make double helices
RNA chains are flexible enough to fold back on themselves and make the same types of basepairs as are found in DNA. These are called “Watson-Crick” basepairs.

4 Watson-Crick basepairs
The main Watson-Crick basepairs are AU and GC. (GU also occurs sometimes.) They can substitute for one another freely without changing the structure of the RNA molecule. They are said to be isosteric, and changes between these basepairs is an example of neutral variability. They are held together by hydrogen bonds (dotted lines). Superposition

5 Comparative sequence analysis
By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU

6 Comparative sequence analysis
By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule. This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them. Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU ((((((((((-----((((((((----((((((( )))))))))---(((((((((-(((((((--((((((((---))))))))--)))))))---))))))))))-

7 RNA 3D structure Starting late in the year 2000, high-resolution atomic structures of entire ribosomes have been published. These show the bases, the backbone, the Watson-Crick basepairs, and several new types of basepairs. The 3D structures confirm the predicted secondary structure and show the importance of Watson-Crick basepairs. E. coli 5S RNA

8 RNA secondary structure prediction
Now that we understand the basics of RNA 3D structure and Watson-Crick basepairs, we can pose the problem of predicting the secondary and 3D structure from an RNA sequence. Comparative sequence analysis requires multiple RNA sequences. For now, we will talk about predicting RNA secondary structure from a single sequence.

9 Three methods to predict secondary structure
Dot plots – a way to visualize the possible helices in a sequence. Somewhat primitive, but a good technique to know. Nussinov algorithm – a technique to find the set of basepairs which maximizes the number of basepairs in a sequence Energy methods – find the set of basepairs which results in the lowest energy structure, the one which is likely to be preferred in nature. mfold

10 Put a dot at the location of each CG or AU pair.
Dot plot – make a grid with the RNA sequence down the rows and across the columns. Put a dot at the location of each CG or AU pair. c g u a C G U A

11 Put a dot at the location of each CG or AU pair.
Dot plot – make a grid with the RNA sequence down the rows and across the columns. Put a dot at the location of each CG or AU pair. c g u a C G U A

12 Dot Plots CGUUUGGGUUCACAAACG ((((((------)))))) “dot-bracket notation”
+ + + + + G + + + + U + + + + U + + + + U + + + + G + + + + G + + + + G + + + + U + + + + U + + + + C + + + + + A + + + + + C + + + + + A + + + + + A + + + + + A + + + + + CGUUUGGGUUCACAAACG ((((((------)))))) “dot-bracket notation” C + + + + + G + + + +

13 Nussinov algorithm Finds the largest number of nested Watson-Crick pairs in an RNA sequence. Similar to a dot plot, but we keep track of the cumulative number of nested Watson-Crick basepairs in each subsequence as we go.

14 Put zeros down the diagonal.

15 Put ones above the diagonal where there is a CG, GC, AU, or UA pair.

16 Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

17 Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

18 Each cell we fill in tells the maximum number of Watson-Crick pairs in the subsequence down and to the left of the cell.

19 Fill in more “diagonals” in the same way.

20 This subsequence only has one nested Watson-Crick basepair, either a GC or a UA, but not both, since they would cross each other.

21 Finally we come to a subsequence that has two nested Watson-Crick pairs. The cell with the 2 is 1 for the GC pair plus 1 from the cell down and left of it, a UA.

22 Thermodynamic methods
Idea: find the secondary structure with the most favorable (lowest) energy. Zuker method (mfold): Uses Dynamic Programming to calculate structure with lowest free energy • McCaskill method (sfold): Uses Dynamic Programming to calculate the most probable structure (more theoretically rigorous) Used by these programs: Mfold, Sfold, Pfold Assumptions: • Only Nearest Neighbor Interactions need to be considered. • Nearest Neighbor Interactions can be summed to give total free energy. • Pseudoknots and tertiary interactions can be ignored. • Most stable structure is also the kinetically favored structure.

23 Nearest neighbor parameters
This is more sophisticated than simply counting the number of AU, UA, CG, GC basepairs in each subsequence. You also tally up the strength of each pair and the energy of one pair stacking on another pair. Bioinformatics: sequence and genome analysis By David W. Mount

24 Determining parameters
Heat measure absorbance at 260 nm (UV) 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ cuvette

25 Determining parameters
Reaction: Strand1 + Strand2 = Duplex Equilibrium constant for each T: [S1(T)][S2(T)] [Duplex(T)] Keq = Free energy change: ∆G(T) = -RT ln(Keq(T))

26 Determining parameters
Repeat this for many related sequences and do statistical analysis to get pairwise parameters. 5’ - GCCAUCCG - 3’ 3’ - CGGUAGGC - 5’ ∆∆G 5’ - GCCAACCG - 3’ 3’ - CGGUUGGC - 5’ ... ∆∆G - the energy change due to substituting one basepair for another.

27 Nearest neighbor parameters
Parameters for most of the “loop” regions are unknown: There are too many possible loops to do experiments for all of them. Usually, unpaired regions are penalized, but it’s known that certain “loops” are very thermodynamically stable, and they are scored with low free energies (e.g. UNCG hairpin). Hard to extrapolate - small change in sequence - large change in free energy.

28 Using thermodynamic parameters
∆G° = -RT ln(Keq) = = = -4.5 kcal/mol Dynamic Programming

29 Things to keep in mind Calculated free energies are always approximate
Most stable calculated structure is not necessarily most stable real structure Must consider “sub-optimal” calculated structures Must use additional information, if available, to pick correct structure

30 MFOLD One of the best thermodynamic methods.
Developed by Michael Zuker. Web server: Submit a sequence, forget the various parameters you can set, look through the output. Look for: dot plots, multiple possible structures, and minimum free energies for the structures. Look at output in png format unless you prefer another image format. M. Zuker Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), , (2003)


Download ppt "RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010."

Similar presentations


Ads by Google