Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.

Similar presentations


Presentation on theme: "Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004."— Presentation transcript:

1 Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

2 Non-Coding RNA Background Basics Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment Experimental Evaluation Conclusion

3 Central Dogma of Molec. Bio.

4 Biology Overview RNA merely plays an accessory role Complexity is defined by proteins encoded in the genome

5 Biology Overview Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)

6 Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs Why Non-coding RNA Protein-coding genes can’t account for all complexity ncRNA is important! Gene regulators

7 Non-coding RNA Problems Finding ncRNA genes in the genome: locate these genes Finding Homologs of ncRNA: figure out what they do

8 Finding ncRNA Genes Protein Approaches Statistically biased (codon triplets) Open Reading Frames ncRNA Approaches High CG content (hyperthermophiles) Promoter/Terminator identification (E. Coli) Comparative Genome Analysis

9 Genetic Code

10 Similarity Searching Proteins BLAST, Sequence Alignment (DP) Genes that code for proteins are conserved across genomes (e.g. low rate of mutation) ncRNA Secondary structure usually conserved Alignment scoring based on structure is imperative

11 ncRNA: Sequence vs Structure

12 Alignment Approaches sCFGs: Modeling secondary structure, scoring sequences HMM for scoring of sequence and secondary structure alignment

13 Pair HMMs on Tree Structures Outline Alignment on Trees Structural Alignment Secondary Structure Representation Hidden Markov Model Recurrence Relations Experimental Evaluation Future Work

14 Alignment on Trees b a c d e fg ih b a c d e fg ih

15 Structural Alignment Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two AUCGAAAGAU G G G G AC A C C C G A C U AA A G A U

16 Structural Representation Skeletal Tree  ( ,  ): Branch Structure  (X, , Y): Base-pairs  (X,  ) or  ( , Y): Unpaired bases X,Y  {A,U,G,C}

17 Hidden Markov Model M: Match state, I: Insertion state, D: Deletion state  XY : State transition probability from X to Y  X : Initial probability : Emission probabilityfor pair x,y X,Y  {M,I,D}

18 Notation Let w=a 1 a 2 …a n be an unfolded RNA sequence of length n Let w[i] denote i th symbol in w Let w[i,j] denote a substring a i a i+1 …a j of w

19 Notation Let T be a skeletal tree representing a folded RNA sequence (known structure) Let v(j) denote the label of node j in tree T Let T[j] denote the subtree rooted at node j in tree T Let j n denote the nth child of node j in tree T

20 Recurrence Relation (Match)

21 Recurrence Relation (Delete)

22 Recurrence Relation (Insert)

23 Structural Alignment Intuition: Given the ncRNA sequence, b with unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a. Complexity: O(K M N 3 ) K = # states in pair HMM, M = size of skeletal tree, N = length of unfolded sequence

24 Experimental Evaluation Dynamic Programming to calculate recurrence relations, prototype system to execute algorithm Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme

25 Parameters Gorodkin et al. (1997)

26 Results: tRNA

27 Results: Hammerhead Ribozyme

28 Future Work Since based on dynamic programming (of pairwise alignment), many DP techniques can apply Refine emission probabilities, relate score matrix (reliable alignment for RNA families)

29 Conclusions ncRNA space is quite open - no really great techniques yet How many ncRNA genes are there? Absence of evidence ≠ evidence of absence Eddy’s call to arms “it is time for RNA computational biologists to step up”

30 Thanks!

31 References Sakakibara, K., “Pair Hidden Markov Models on Tree Structures”, Bioinformatics, 19:232-240, 2003 Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109:137-140, 2002 Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”


Download ppt "Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004."

Similar presentations


Ads by Google