Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Hidden Markov Models in Bioinformatics
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Structural bioinformatics
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Chapter 4 Transcription and Translation. The Central Dogma.
Searching genomes for noncoding RNA CS374 Leticia Britos 10/03/06.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
Hidden Markov Models.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Hidden Markov Models In BioInformatics
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Introduction to Profile Hidden Markov Models
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Some Independent Study on Sequence Alignment — Lan Lin prepared for theory group meeting on July 16, 2003.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Mark D. Adams Dept. of Genetics 9/10/04
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Expected accuracy sequence alignment Usman Roshan.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Motif Search and RNA Structure Prediction Lesson 9.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
(H)MMs in gene prediction and similarity searches.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
CISC667, S07, Lec25, Liao1 CISC 467/667 Intro to Bioinformatics (Spring 2007) Review Session.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
Bioinformatics Overview
Stochastic Context-Free Grammars for Modeling RNA
Predicting RNA Structure and Function
Stochastic Context-Free Grammars for Modeling RNA
Comparative RNA Structural Analysis
Stochastic Context Free Grammars for RNA Structure Modeling
CISC 667 Intro to Bioinformatics (Spring 2007) Review session for Mid-Term CISC667, S07, Lec14, Liao.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Non-Coding RNA Background Basics Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment Experimental Evaluation Conclusion

Central Dogma of Molec. Bio.

Biology Overview RNA merely plays an accessory role Complexity is defined by proteins encoded in the genome

Biology Overview Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)

Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs Why Non-coding RNA Protein-coding genes can’t account for all complexity ncRNA is important! Gene regulators

Non-coding RNA Problems Finding ncRNA genes in the genome: locate these genes Finding Homologs of ncRNA: figure out what they do

Finding ncRNA Genes Protein Approaches Statistically biased (codon triplets) Open Reading Frames ncRNA Approaches High CG content (hyperthermophiles) Promoter/Terminator identification (E. Coli) Comparative Genome Analysis

Genetic Code

Similarity Searching Proteins BLAST, Sequence Alignment (DP) Genes that code for proteins are conserved across genomes (e.g. low rate of mutation) ncRNA Secondary structure usually conserved Alignment scoring based on structure is imperative

ncRNA: Sequence vs Structure

Alignment Approaches sCFGs: Modeling secondary structure, scoring sequences HMM for scoring of sequence and secondary structure alignment

Pair HMMs on Tree Structures Outline Alignment on Trees Structural Alignment Secondary Structure Representation Hidden Markov Model Recurrence Relations Experimental Evaluation Future Work

Alignment on Trees b a c d e fg ih b a c d e fg ih

Structural Alignment Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two AUCGAAAGAU G G G G AC A C C C G A C U AA A G A U

Structural Representation Skeletal Tree  ( ,  ): Branch Structure  (X, , Y): Base-pairs  (X,  ) or  ( , Y): Unpaired bases X,Y  {A,U,G,C}

Hidden Markov Model M: Match state, I: Insertion state, D: Deletion state  XY : State transition probability from X to Y  X : Initial probability : Emission probabilityfor pair x,y X,Y  {M,I,D}

Notation Let w=a 1 a 2 …a n be an unfolded RNA sequence of length n Let w[i] denote i th symbol in w Let w[i,j] denote a substring a i a i+1 …a j of w

Notation Let T be a skeletal tree representing a folded RNA sequence (known structure) Let v(j) denote the label of node j in tree T Let T[j] denote the subtree rooted at node j in tree T Let j n denote the nth child of node j in tree T

Recurrence Relation (Match)

Recurrence Relation (Delete)

Recurrence Relation (Insert)

Structural Alignment Intuition: Given the ncRNA sequence, b with unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a. Complexity: O(K M N 3 ) K = # states in pair HMM, M = size of skeletal tree, N = length of unfolded sequence

Experimental Evaluation Dynamic Programming to calculate recurrence relations, prototype system to execute algorithm Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme

Parameters Gorodkin et al. (1997)

Results: tRNA

Results: Hammerhead Ribozyme

Future Work Since based on dynamic programming (of pairwise alignment), many DP techniques can apply Refine emission probabilities, relate score matrix (reliable alignment for RNA families)

Conclusions ncRNA space is quite open - no really great techniques yet How many ncRNA genes are there? Absence of evidence ≠ evidence of absence Eddy’s call to arms “it is time for RNA computational biologists to step up”

Thanks!

References Sakakibara, K., “Pair Hidden Markov Models on Tree Structures”, Bioinformatics, 19: , 2003 Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109: , 2002 Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”