Presentation is loading. Please wait.

Presentation is loading. Please wait.

AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.

Similar presentations


Presentation on theme: "AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction."— Presentation transcript:

1 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction

2 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Review: RNA v DNA Predicting RNA secondary structure Features of RNA secondary structure Assumptions Methods of RNA structure prediction Outline

3 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Review What is RNA RNA vs. DNA

4 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 RNA is ribonucleic acid, closely related to deoxyribonucleic acid or DNA. RNA is the only biological polymer that serves as both a catalyst (like proteins) and as information storage (like DNA). RNA is structurally very similar to DNA with three main differences:

5 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 The DNA base Thymine is replaced by Uracil in RNA. This makes the RNA alphabet A, C, G, U rather than the DNA alphabet A, C, G, T. The phosphate sugar backbone of RNA is built out of ribose instead of deoxyribose. RNA is synthesized as a single stranded molecule.

6 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002

7 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Each ribonucleotide contains a phosphate group a sugar group (ribose) a base The polymer is formed by the linkage of the phosphate groups. The non-planar 5 member ribose ring connects the phosphate to the base. Finally, the bases are connected to the ribose group. Only the bases differ.

8 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002

9 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Four Bases in RNA: Purine Pyrimidine

10 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Three Major Types of RNA: messenger RNA (mRNA), serves as a temporary copy of genes that is used as a template for protein synthesis transfer RNA (tRNA), serves as adaptor molecules that decode the genetic code ribosomal RNA (rRNA), serves as catalyst for the synthesis of proteins

11 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Other Types of RNA: There are a number of other types of RNA present in smaller quantities as well: small nuclear RNA (snRNA) small nucleolar RNA (snoRNA) 4.5S signal recognition particle (SRP) RNA.

12 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Predicting RNA Secondary Structure What is RNA structure prediction? Why we study RNA structure prediction Terminology

13 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 What Is RNA Secondary Structure Prediction RNA sequence folds into functional shape by pairing complementary bases Canonical base pairs: complementary bases, C-G and A-U form stable base pairs Watson-Crick pairs G-U wobble pair

14 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 RNA is transcribed in cells as single strands of ribonucleic acids. However, these sequences are not simply long strands of nucleotides. Rather, intra-strand base pairing will produce structures such as the one shown below: 5’ 3’ G A G A G A G A G A G A G A G A U C U C U C U C U C U C U C U C U C

15 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Why Study RNA Structure Prediction Secondary structure related to function Drug, viral research

16 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Terminology of RNA Secondary Structures Stacked pairs Base-pairing/base-stacking interactions. Usually Watson-Crick pairs, but G-Us are not uncommon, and other noncanonical pairs do occur Hairpin Loop Bulge Loop Interior Loop

17 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Multi-loop Pseudoknots often not considered to be "secondary structure“--most RNA secondary structure algorithms ignore them

18 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Display of RNA Secondary Structures

19 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Features of RNA Secondary Structure: Intermediate step to 3-D structure Double-stranded regions formed by single-stranded molecule folding back on itself Downstream run of bases must be complementary to upstream run of bases so Watson-Crick pairing can occur

20 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Features (Continued) G-C base pairs contribute greatest energy stability. A-U base pairs contribute less stability. G-U base pairs (wobble pairs) contribute least stability.

21 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Assumptions: That the most likely structure is similar to the most energetically stable structure. The energy associated with a particular base pair in a double-stranded region is influenced only by the previous base pair. The structure is to be formed in a manner that does not produce any knots.

22 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Representation of Assumptions: Draw sequence in circular form Paired bases joined by arcs If structure to be free of knots, none of the arcs must cross (If lines cross, psuedoknot)

23 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Circle Graph

24 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Methods of RNA Structure Prediction Dot Matrix Sequence Comparison Minimum Free Energy MFOLD Covariation Analysis of RNA Sequences Context-Free Grammars

25 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Dot Matrix Sequence Comparison Find stretches of self-complementary regions Visual representation is easy to evaluate No need to compensate for gaps

26 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002

27 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Weaknesses: No "score“, evaluation produced For long sequences, too slow for a database search Doesn't actually align sequences

28 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Minimum Free Energy Every base compared to every other base, similar to dot matrix analysis Diagonal indicates potential double- stranded area Sum negative base-stacking energies for each pair bases in diagonal Add estimated positive energies for loops

29 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Predicted Free-Energy Values (kcal/mole at 37 degrees C) for base pair and other features of predicted RNA secondary structures

30 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 For Figure:

31 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Free Energy Calculation In the structure, there are five U's in the hairpin loop region, which has a predicted destabilizing energy of +4.4 kcal/mol. Therefore, the total free energy reduction predicted for the molecule given the above secondary structure is: -9.8 kcal/mol + 4.4 kcal/mol = -5.4 kcal/mol A/U followed by G/C = -1.7 kcal/mol G/C followed by C/G = -3.4 kcal/mol C/G followed by C/G = -2.9 kcal/mol C/G followed by A/U = -1.8 kcal/mol sum = -9.8 kcal/mol

32 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 MFOLD (Michael Zucker) Predicts non-based-paired interactions Predicts several structures having energies close to minimum free energy Accurately depict structures of RNA molecules derived from comparative sequence analysis

33 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Energy Dot Plot

34 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Representations of MFOLD Display Structure Most widely used. Closest to physical structure

35 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Limitations of MFOLD Does not compute all structures within a given energy range of the minimum free-energy structure. No alternative structures are produced that have the absence of base pairs in the best structure If two sub-structures are joined by a stretch of unpaired bases, no structures are produced that are suboptimal for both

36 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Covariation Analysis of RNA Sequences Based on information theory applied to biology After transcription, RNA spliced by cell Splicing done at donor and receptor sites Height of stack shows degree of conservation—proportional to frequency in acceptors RNA Structure Logo

37 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Structure Logo

38 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Needs multiple alignment of a family of RNA to identify columns of high information contents These columns indicate conservation which is essential in secondary structure and base pairing

39 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Stochastic Context Free Grammars (SCFG) Each RNA structure can be specified by a stochastic context-free grammar like that for a programming language This can be used to describe and classify RNAs Models can be used to search the DNA genome for RNA genes.

40 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Productions P = { S 0  S 1,S 7  G S 8, S 1  C S 2 G,S 8  G, S 2  A S 3 U,S 9  A S 10 U, S 3  S 4 S 9,S 10  G S 11 C, S 4  U S 5 A,S 11  A S 12 U, S 5  C S 6 G,S 12  U S 13, S 6  A S 7 S 13  C } For sequence CAUCAGGGAAGAUCUCUUG

41 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 S 0  S 1  CS 2 G  CAS 3 UG  CAS 4 S 9 UG  CAUS 5 AS 9 UG  CAUCS 6 GAS 9 UG  CAUCAS 7 GAS 9 UG  CAUCAGS 8 GAS 9 UG  CAUCAGGGAS 9 UG  CAUCAGGGAAS 10 UUG  CAUCAGGGAAGS 11 CUUG  CAUCAGGGAAGAUS 13 UCUUG  CAUCAGGGAAGAUCUCUUG Derivation

42 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 S 0 S 1 S 2 S 3 S 4 S 9 S 5 S 10 S 6 S 11 S 7 S 12 S 8 S 13 C AU C A G G G A A G A U C U C U U G

43 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 S 0 | S 1 C | G S 2 A | U S 3 U S 4 S 9 CAA S 5 S 10 C A S 6 G G S 11 U A S 7 S 12 S 13 G S 8 U C G

44 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 Summary Methods include Analysis of all possible combinations of potential double-stranded regions by energy minimization methods Identification of base covariation that maintains secondary (and tertiary) structure of RNA during evolution

45 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 References Mount, Daniel W., Bioinformatics Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, 2001 M. Zuker, P. Stiegler (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148 J. Gorodkin, L. J. Heyer, S. Brunak and G. D. Stormo. Displaying the information contents of structural RNA alignments: the structure logos. Comput. Appl. Biosci., Vol. 13, no. 6 pp 583-586, 1997. T. D. Schneider and R. M. Stephens. Sequence logos: a new way to display consensus sequences. Nucleic Acids Research, Vol. 18, no. 20, pp 6097-6100, 1990. Yizong Cheng, University of Cincinnati, RNA Secondary Structure, cheng.ececs.uc.edu Chen, R. O., Felciano, R., and Altman, R. B.1997. RIBOWEB: Linking structural computation to a Knowledge Base of published experimental data, Ismb 5:84 – 87.

46 AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G CISC889 Spring 2002 RNA Structure Databases Michael Zucker's MFOLD RNABase: The RNA Structure Database RNABase: The RNA Structure Database RNA databases


Download ppt "AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction."

Similar presentations


Ads by Google