Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting RNA Secondary Structures: A Lattice Walk Approach to Modeling Sequences Within the HIV-1 RNA Structure Facing the Challenge of Infectious Diseases.

Similar presentations


Presentation on theme: "Predicting RNA Secondary Structures: A Lattice Walk Approach to Modeling Sequences Within the HIV-1 RNA Structure Facing the Challenge of Infectious Diseases."— Presentation transcript:

1 Predicting RNA Secondary Structures: A Lattice Walk Approach to Modeling Sequences Within the HIV-1 RNA Structure Facing the Challenge of Infectious Diseases in Africa: The Role of Mathematical Modeling University of Witswatersrand Johannesburg, South Africa September 25-27, 2006 Asamoah Nkwanta, Ph.D. Morgan State University Nkwanta@jewel.morgan.edu

2 RNA Prediction & Molecular Biology RNA Combinatorics Certain Class of Random Walks Matrix Theory Connection Between Walks & RNA Modeling HIV-1 RNA SequencesTOPICS

3 “The Human Genome Project and related efforts have generated enormous amounts of raw biological sequence data. However, understanding how biological sequences encode structural information remains a fundamental scientific challenge. For instance, understanding the base pairing, or secondary structure, of single-stranded RNA sequences is crucial to advancing knowledge of their novel biochemical functions.” C. E. Heithsch, Combinatorics on Plane Trees, Motivated by RNA Secondary Structure Configuration (preprint, 2005) RNA Secondary Structure Prediction

4 What is RNA Secondary Sequence Prediction ?

5 RNA Secondary Structure Prediction Given a primary sequence, we want to find the biological function of the related secondary structure. To achieve this goal we predict (model) its’ secondary structure. Most methods predict secondary structure rather than tertiary structure. The three dimensional shape is important for biological function, and it is harder to predict.

6 Molecular Biology (Cont.) 3-D structure of Haloarcula marismortui 5S ribosomal RNA in large ribosomal subunit

7

8 Molecular Biology Central Dogma DNA  RNA  Protein Transcription / Translation

9 Molecular Biology (Cont.)

10 However, the "Central Dogma" has had to be revised a bit. It turns out that you CAN go back from RNA to DNA, and that RNA can also make copies of itself. It is still NOT possible to go from Proteins back to RNA or DNA, and no known mechanism has yet been demonstrated for proteins making copies of themselves. Molecular Biology (Cont.)

11 Molecular Biology (cont.) HIV is one of a group of atypical viruses called retroviruses that maintain their genetic information in the form of RNA. Retroviruses are capable of producing DNA from RNA.

12 Molecular Biology (Cont.)

13 Molecular Biology (cont.) Ribonucleic acid (RNA) molecule: Three main categories mRNA (messenger) – carries genetic information from genes to other cells tRNA (transfer) – carries amino acids to a ribosome (cells for making proteins) rRNA (ribosomal) – part of the structure of a ribosome

14 Molecular Biology (cont.) Other types (RNA) molecules: snRNA (small nuclear RNA) – carries genetic information from genes to other cells miRNA (micro RNA) – carries amino acids to a ribosome (cells for making proteins) iRNA (immune RNA) – part of the structure of a ribosome (Important for HIV studies)

15 RNA Secondary Structure “RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigation. “ B. Knudsen and J. Hein, Pfold: RNA secondary structure prediction using stochastic context-free grammars (Nucleic Acids Research, 2003) There are published examples involving tRNA, rRNA, and other types of RNA

16 RNA Secondary Structure (cont.) A ribonucleic acid (RNA) molecule consists of a sequence of ribonucleotides (typically single stranded) Each ribonucleotide contains one of four bases: adenine (A), cytosine (C), guanine (G), and uracil (U)

17 Secondary Structure (cont.) Note U is replaced by thymine (T) in DNA As the molecule forms, chemical bonds join A-U and C-G pairs, (Unstable G-U). These are called the Watson-Crick pairs.

18 Secondary Structure (cont.) Primary Structure – The linear sequence of bases in an RNA molecule Secondary Structure – The folding or coiling of the sequence due to bonded nucleotide pairs: A-U, G-C Tertiary Structure – The three dimensional configuration of an RNA molecule

19 Primary RNA Sequence CAGCAUCACAUCCGCGGGGUAAACGCU Nucleotide Length, 27 bases

20 Geometric Representation Secondary structure is a graph defined on a set of n labeled points (M.S. Waterman, 1978) Biological Combinatorial/Graph Theoretic Random Walk

21

22 RNA COMBINATORICS RNA Numbers 1,1,1,2,4,8,17,37,82,185,423,978,… These numbers count various combinatorial objects including RNA secondary structures of length n.

23

24 The number of RNA secondary structures for the sequence [1,n] is counted by the coefficients of s(z): Coefficients of the formal power series: (1,1,1,2,4,8,17,37,82,185,423,978,…) RNA COMBINATORICS (cont.)

25 RNA COMBINATORICS (cont.) RNA COMBINATORICS (cont.) The number of lattice paths with unit steps R (right), U (up) & D (down) that go from (0,0), remain in the first quadrant of the coordinate plane, and return to the x-axis under the restriction that there are never consecutive UD steps is the nth RNA number: (1,1,1,2,4,8,17,37,82,185,423,978,…)

26 RNA COMBINATORICS (cont.) RNA COMBINATORICS (cont.) The number of RNA sequences of length n that can be formed over the words [A,U,G,C] such that the letters A & U are not adjacent is equal to: What a remarkable formula for an integer, when n = 1 we get 4, and n = 2 we get 14.

27 Counting Sequence Database The On-line Encyclopedia of Integer Sequences: http:/www.research.att.com/njas/sequence s/index.html N.J.A. Sloane & S. Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995.

28 RNA EQUATIONS Recurrence Relations:

29 RNA EQUATIONS (cont.) Generating Function: 1,1,1,2,4,8,17,37,82,185,423,978,…

30 Exact Formula: RNA EQUATIONS (cont.)

31 s(n,k) is the number of structures of length n with exactly k base pairs: For n,k > 0, RNA EQUATIONS (cont.)

32 Asymptotic Estimate: As n grows without bound RNA EQUATIONS (cont.)

33 Random Walk A random walk is a lattice path from one point to another such that steps are allowed in a discrete number of directions and are of a certain length

34 RNA Walk – Type I NSE* Walks – Unit step walks starting at the origin (0,0) with steps up, down, and right No walks pass below the x-axis and there are no consecutive NS steps

35 RNA Walk – Type I (cont.) N = (0,1) up S = (0,-1) down E = (1,0) right

36 Type I Walk Array (n x k)

37 RNA Walk – Type II NSE** Walks – Unit-step walks starting at the origin (0,0) with steps up, down, and right such that no walks pass below the x-axis and there are no consecutive SN steps

38 Type II Walk Array (n x k)

39 Examples Type I:ENNESNESSE Type II:NEEENSEEES

40 RNA Walk Bijection Theorem: There is a bijection between the set of NSE* walks of length n+1 ending at height k = 0 and the set of NSE** walks of length n ending at height k = 0. Source: Lattice paths, generating functions, and the Riordan group, Ph.D. Thesis, Howard University, Washington, DC, 1997

41 Matrices Count Lattice Walks Type I Walks 1 0 0 0 0 0 0 - 1 1 0 0 0 0 0 - 1 2 1 0 0 0 0 - 2 3 3 1 0 0 0 - 4 6 6 4 1 0 0 - 8 13 13 10 5 1 0 - 17 28 30 24 15 6 1 - - - - - - - - - Type II Walks 1 0 0 0 0 0 0 - 1 1 0 0 0 0 0 - 2 2 1 0 0 0 0 - 4 4 3 1 0 0 0 - 8 9 7 4 1 0 0 - 17 20 17 11 5 1 0 - 37 41 41 29 16 6 1 - - - - - - - - - The i th -j th entry corresponds to the number of random walks of length i and ending height j.

42 Type I Formation Rule (Recurrence)

43 The Connection Between RNA and the Walks Theorem: There is a bijection between the set of RNA secondary structures of length n and the set of NSE* walks ending at height k=0. Source: Lattice paths and RNA secondary structures, DIMAC Series in Discrete Math. & Theoretical Computer Science 34 (1997) 137-147. (CAARMS2 Proceedings)

44

45 HIV-1 RNA Sequence Prediction We want to construct a lattice walk method to predict secondary RNA sequences that code for regions of the SL2 and SL3 domains within the HIV-1 5’ UTR RNA molecule. These domains are important for HIV genomic packaging

46 HIV-1 RNA Structural Components

47 Components of Secondary Structure Base pairs Bulges Interior Loops End loops Hairpin Multibranch loops – junctions where more than one hairpin or more complex secondary structures are appended.

48 HIV-1 Sequence (SL2 & SL3) The following sequence was obtained from the NCBI website. The first 363 nucleotides were extracted from the entire HIV-1 RNA genomic sequence: GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCU GGCUAACUAGGGAACCCACUGCUUAAGCCUCAAUAAAGCUU GCCUUGAGUGCUUCAAGUAGUGUGUGCCCGUCUGUUGUGU GACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGU GUGGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACCUGA AAGCGAAAGGGAAACCAGAGGAGCUCUCUCGACGCAGGAC UCGGCUUGCUGAAGCGCGCACGGCAAGAGGCGAGGGGCGG CGACUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUA GAAGGAGAGAGAUGGGUGCGAGAGCGUCAGUAUUAAGCG Color key: SL2 – yellow SL3 - red

49 G G C G A C U G G U G A G U A C G C C mfe -7.1 Original Structure Type I Walk Type II Walk Known Sequence of the SL2 Domain

50 Lattice Walk Model Start with an RNA primary sequence Perform RNA combinatorial analysis on the given sequence Connect lattice walks to the given sequence using Type I and II walks Calculate identified sequences to find the minimum free energy Predict secondary sequence Conduct laboratory experiments for biological functionality

51 Acknowledgments National Science Foundation, DIMACS, AIMS, Burroughs Wellcome, SACEMA, WITS MATH. Modeling 561, Graduate Students Collaborators: Dwayne Hill, Biology Dept., MSU, and Alvin Kennedy, Chemistry Dept., MSU


Download ppt "Predicting RNA Secondary Structures: A Lattice Walk Approach to Modeling Sequences Within the HIV-1 RNA Structure Facing the Challenge of Infectious Diseases."

Similar presentations


Ads by Google