Predicting RNA Structure and Function
According to the central dogma of molecular biology the main role of DNA RNA protein According to the central dogma of molecular biology the main role of RNA is to transfer genetic information from DNA to protein
RNA has many other biological functions Protein synthesis (ribosome) Control of mRNA stability (UTR) Control of splicing (snRNP) Control of translation (microRNA) The function of the RNA molecule depends on its folded structure
Ribozyme Ribosome Nobel prize 1989 Nobel prize 2009
Protein structures RNA structures ~Total 80,000 Total ~800
RNA Structural levels Secondary Structure Tertiary Structure tRNA
RNA Secondary Structure RNA bases are G, C, A, U The RNA molecule folds on itself. The base pairing is as follows: G C A U G U hydrogen bond. U U C G U A A U G C 5’ 3’ 5’ G A U C U U G A U C 3’
RNA Secondary structure Short Range Interactions A U U G C C G G A U A G C A G C U U G HAIRPIN LOOP BULGE INTERNAL LOOP STEM DANGLING ENDS 5’ 3’
The function of the RNA molecule depends on its folded structure Example: mRNA structure involved in control of Iron levels Iron Responsive Element IRE G U A G C N N N’ C conserved Recognized by IRP1, IRP2 5’ 3’
F: Ferritin = iron storage TR: Transferin receptor = iron uptake IRP1/2 IRE 3’ 5’ F mRNA IRP1/2 3’ TR mRNA 5’ Low Iron IRE-IRP inhibits translation of ferritin IRE-IRP Inhibition of degradation of TR High Iron IRE-IRP off -> ferritin translated Transferin receptor degradated
Predicting RNA secondary Structure Most common approach: Zucker & Stiegler (1981) Search for a RNA structure with a Minimal Free Energy (MFE) U U C G U A A U G C G A U C U U G A U C
Free energy of a structure is the sum of all interactions energies Free energy model Free energy of a structure is the sum of all interactions energies exclude coaxial stacking, metal ions, nonstandard bonds, folding pathway, etc Free Energy(E) = E(CG)+E(CG)+….. Each interaction energy can be calculated thermodynamicly
Why is MFE secondary structure prediction hard? MFE structure can be found by calculating free energy of all possible structures BUT the number of potential structures grows exponentially with the number, n, of bases
RNA folding with Dynamic programming (Zucker and Steigler) W(i,j): MFE structure of substrand from i to j W(i,j) i j
RNA folding with dynamic programming Assume a function W(i,j) which is the MFE for the sequence starting at i and ending at j (i<j) Define scores, for example (CG) =-1 (CA)=1 (we want a negative score ) Consider 4 possibilities: i,j are a base pair, added to the structure for i+1..j-1 i is unpaired, added to the structure for i+1..j j is unpaired, added to the structure for i..j-1 i,j are paired, but not to each other; W(i,j) i (i+1) (j-1) j Choose the minimal energy possibility
Simplifying Assumptions for Structure Prediction RNA folds into one minimum free-energy structure. The energy of a particular base can be calculated independently Neighbors do not influence the energy.
Sequence dependent free-energy Nearest Neighbor Model U U C G G C A U A UCGAC 3’ U U C G U A A U G C A UCGAC 3’ 5’ 5’ Energy is influenced by the previous base pair (not by the base pairs further down).
Sequence dependent free-energy values of the base pairs (nearest neighbor model) U U C G G C A U A UCGAC 3’ U U C G U A A U G C A UCGAC 3’ 5’ 5’ These energies are estimated experimentally from small synthetic RNAs. Example values: GC GC GC GC AU GC CG UA -2.3 -2.9 -3.4 -2.1
Adding Complexity to Energy Calculations Positive energy - added for destabilizing regions such as bulges, loops, etc. More than one structure can be predicted
Free energy computation U U A A G C A U A A U C G A 3’ 5’ +5.9 4 nt loop -1.1 mismatch of hairpin -2.9 stacking +3.3 1nt bulge -2.9 stacking -1.8 stacking -0.9 stacking -1.8 stacking 5’ dangling -2.1 stacking -0.3 G= -4.6 KCAL/MOL -0.3
Mfold :Adding Complexity to Energy Calculations Positive energy - added for destabilizing regions such as bulges, loops, etc. More than one structure can be predicted
More than one structure can be predicted for the same RNA GNAS1 mRNA folding structures predicted by MFOLD. The mRNA sequence carrying the T393C polymorphism was used for secondary folding structure model building by the use of the computer program MFOLD (26). Frey U H et al. Clin Cancer Res 2005;11:5071-5077 ©2005 by American Association for Cancer Research
RNA fold prediction based on Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C
Compensatory Substitutions Mutations that maintain the secondary structure can help predict the fold U U C G U A A U G C A UCGAC 3’ G C 5’
G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C RNA secondary structure can be revealed by identification of compensatory mutations U C U G C G N N’ G C G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C
Insight from Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. Conservation – no additional information Consistent mutations (GC GU) – support stem Inconsistent mutations – does not support stem. Compensatory mutations – support stem.
RNA families Rfam : General non-coding RNA database (most of the data is taken from specific databases) http://www.sanger.ac.uk/Software/Rfam/ Includes many families of non coding RNAs and functional motifs, as well as their alignment and their secondary structures
An example of an RNA family miR-1 MicroRNAs mir-1 microRNA precursor family This family represents the microRNA (miRNA) mir-1 family. miRNAs are transcribed as ~70nt precursors (pre-mir) and subsequently processed to give a ~22nt product (miRNA=mir). The products are thought to have regulatory roles through complementarity to mRNA.
Seed alignment (based on 7 sequences)