MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.

Slides:



Advertisements
Similar presentations
04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.
Advertisements

Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
RNA Structure Prediction
A turbo intro to (the bioinformatics of) microRNAs 11/ Peter Hagedorn.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Computational biology seminar
Comparative ab initio prediction of gene structures using pair HMMs
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
Similar Sequence Similar Function Charles Yan Spring 2006.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
microRNA computational prediction and analysis
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
The Ensembl Gene set The “Genebuild” 21 April 2008.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
© Wiley Publishing All Rights Reserved. RNA Analysis.
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Sequence Alignment.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Abstract Premise Figure 1: Flowchart pri-miRNAs were collected from miRBase 10.0 pri-miRNAs were compared to hsa and ptr genomes using BlastN and potential.
RNA Structure Prediction
For Prediction of microRNA Genes Vertebrate MicroRNA Genes Lee P. Lim, et. al. SCIENCE 2003 The microRNAs of Caenorhabditis elegans Lee P. Lim, et al GENES.
1 Group Meeting July 10, 2009 RNA-Informatics University of Georgia ( Slides are arranged according to the order they are received)
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
bacteria and eukaryotes
Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs
Stochastic Context-Free Grammars for Modeling RNA
Vienna RNA web servers
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Predicting RNA Structure and Function
Stochastic Context-Free Grammars for Modeling RNA
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Introduction to Bioinformatics II
1 Department of Engineering, 2 Department of Mathematics,
Identification and Characterization of pre-miRNA Candidates in the C
Stochastic Context Free Grammars for RNA Structure Modeling
CARPEL FACTORY, a Dicer Homolog, and HEN1, a Novel Protein, Act in microRNA Metabolism in Arabidopsis thaliana  Wonkeun Park, Junjie Li, Rentao Song,
Presentation transcript:

microRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Goal of the Presentation Introduction to miRNA Survey of computational and experimental approaches to identify microRNA CYK Algorithm Our Methodology Result/Discussion Future Direction

Computers vs Genetics

Background on microRNA and its Classical Definition Found in Eukaryotes (706 identified in human) Genome-encoded stem-loop precursor Generally Processed by a Dicer and Helicase Mature microRNA is approximately 22 nucleotides (nt) Recognize target mRNA by base-pairing ◦ Acts as a primarily gene silencing ◦ Some cases of gene enhancing

Diagram for miRNA

Problems with miRNA Hunting through Lab Experiments Biology = network of cause and effect miRNA might be bounded by certain Environmental Triggers Hard to detect expression of certain microRNA sequences. Some miRNA may have a hard to clone physical property including sequence composition or post-transcriptional modification

Problems with miRNA Hunting through Computational Approaches Stem loop structure is common in Eukaryotes Eukaryotic genome are long and most computational approach are not practical for scanning through the entire genome

Computational Driven Approach Structure information (Thermodynamics) ◦ RNAz Homology Conservation of structure (ERPIN, MirScan, snarloop) ◦ Stem ◦ Loop ◦ Target sequece Machine Learning (miRFinder, microPred) ◦ Feature selection based on sequence and structural information

Tests done on those methodology ERPIN (2001) (Homology) ◦ Good result but very limited to the availability of the training data. Capable of detecting only 66 of the 706 miRNA if we remove the human training sequences we can only detect 36 miRNA miRFinder (2007) (ab initio) ◦ Human  Specificity: (1320/8494) (84.46%) Sensitivity: (599/706) (84.84%) ◦ Mouse  Specificity: (1759/10213) (82.78%)  Sensitivity: (450/547) (82.27%) microPred (2009) (ab initio) ◦ Found bug for the author, currently getting it fixed.

Negative Set Generation Sequence were obtained from the CDS region of the genome ◦ Implementation of a CDS Extractor for ccdsgenes.txt files from the UCSC Genome Browser CDS means coding region ◦ (Sequence that code for protein) Need to implement a new parser based on the cds.txt from the UCSC Genome Browser

Positive Set Downloaded from MiRBase 706 human and 547 mouse genome

Algorithms for SCFG CYK algorithm ◦ calculates the optimal alignment of a sequence to an SCFG with ambiguity Inside algorithm ◦ calculates the probability of a sequence given an SCFG. Inside-outside algorithm ◦ Estimates optimal probability parameters for an SCFG given a set of example sequences.

Advantages of CYK A relative fast algorithm O(n 3 ) and if we take advantage of the Dynamic Programming table we can scan through the sequence O(n 2 ) We can quickly compute multiple windows at the same time It is able to fold an RNA forcefully into a specific structure that we specify

Introduction to the Modified CYK Algorithm Given X = X1… Xn and a SCFG G, ◦ Find the optimal parse of X ◦ Dynamic Programming  (i, j, V):likelihood of the most likely parse of x i …x j, rooted at nonterminal V

Stochastic CYK Initialization:  (i, i-1) = log P(  ) Iteration: For i = 1 to N For j = i to N  (i+1, j–1) + log P(x i S x j )  (i, j–1) + log P(S x i )  (i, j) = max  (i+1, j) + log P(x i S) max i < k < j  (i, k) +  (k+1, j) + log P(S S)

Weight Estimation of each Non- terminal emission miRNA let7 57 sequences obtained from Rfam Used R Coffee to estimate length of the hairpin loop, stem, and bulge The parameters that we estimated seems to work well with majority of the cases of the microRNA

Result for CYK Insert Plot Here

RNAfold Most commonly used tool for predicting RNA secondary structure All the ab intio approaches or hairpin loop finders currently uses RNAfold for identifying an estimate of a microRNA structure and its MFE We use RNAfold’s mfe as a measuring stick and use some of its structural features to assist our routine

Result for RNAfold Insert Plot Here

CYK RNAfold Hybrid I use the formula as follows [CYK] * 2 + [MFE] = CombinedScore During the calculation, if RNAfold predict a structure with two or more hairpin loops, then we penalize the CYK score

Z score calculation In order for us to combine the features of the MFE and the CYK score we randomly sampled 20,000 sequences from the Human Genome and calculated its MFE and CYK

CYK RNAfold Hybrid Result Insert Plot Here

Optimized Sensitivity Specificity Comparison Human Specificity Test Human Sensitivity TestMouse Specificity Test Mouse Sensitivity Test 8494 pseudo-miRNA706 miRNA10213 pseudo-miRNA547 miRNA MFE 73.15%73.07%65.83%66.97% CYK 79.09%78.60%72.19%72.47% CYK-Hybrid 81.05%81.08%72.17%71.93% miRFinder 84.46%84.84%82.78%82.27%

Disadvangtage of our Program Limited to its structural accuracy

To Do List Possibly test the accuracy in terms of CYK’s ability to predicting the structure of the microRNA Need to run through the

Summary We currently have a routine that is capable of identifying microRNA on a 82% Sensitivity and Specificity based solely on its structure Currently communicating with a student from the UK that published microPred to see if we can use our program to retrain their SVM to see if we can get a better result

See Website for more Details oRNA.html oRNA.html for testing out the grammar

References