Download presentation
Presentation is loading. Please wait.
Published byKathlyn Marilyn Neal Modified over 8 years ago
1
microRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian
2
Goal of the Presentation Introduction to miRNA Survey of computational and experimental approaches to identify microRNA CYK Algorithm Our Methodology Result/Discussion Future Direction
3
Computers vs Genetics
4
Background on microRNA and its Classical Definition Found in Eukaryotes (706 identified in human) Genome-encoded stem-loop precursor Generally Processed by a Dicer and Helicase Mature microRNA is approximately 22 nucleotides (nt) Recognize target mRNA by base-pairing ◦ Acts as a primarily gene silencing ◦ Some cases of gene enhancing
5
Diagram for miRNA
6
Problems with miRNA Hunting through Lab Experiments Biology = network of cause and effect miRNA might be bounded by certain Environmental Triggers Hard to detect expression of certain microRNA sequences. Some miRNA may have a hard to clone physical property including sequence composition or post-transcriptional modification
7
Problems with miRNA Hunting through Computational Approaches Stem loop structure is common in Eukaryotes Eukaryotic genome are long and most computational approach are not practical for scanning through the entire genome
8
Computational Driven Approach Structure information (Thermodynamics) ◦ RNAz Homology Conservation of structure (ERPIN, MirScan, snarloop) ◦ Stem ◦ Loop ◦ Target sequece Machine Learning (miRFinder, microPred) ◦ Feature selection based on sequence and structural information
9
Tests done on those methodology ERPIN (2001) (Homology) ◦ Good result but very limited to the availability of the training data. Capable of detecting only 66 of the 706 miRNA if we remove the human training sequences we can only detect 36 miRNA miRFinder (2007) (ab initio) ◦ Human Specificity: (1320/8494) (84.46%) Sensitivity: (599/706) (84.84%) ◦ Mouse Specificity: (1759/10213) (82.78%) Sensitivity: (450/547) (82.27%) microPred (2009) (ab initio) ◦ Found bug for the author, currently getting it fixed.
10
Negative Set Generation Sequence were obtained from the CDS region of the genome ◦ Implementation of a CDS Extractor for ccdsgenes.txt files from the UCSC Genome Browser CDS means coding region ◦ (Sequence that code for protein) Need to implement a new parser based on the cds.txt from the UCSC Genome Browser
11
Positive Set Downloaded from MiRBase 706 human and 547 mouse genome
12
Algorithms for SCFG CYK algorithm ◦ calculates the optimal alignment of a sequence to an SCFG with ambiguity Inside algorithm ◦ calculates the probability of a sequence given an SCFG. Inside-outside algorithm ◦ Estimates optimal probability parameters for an SCFG given a set of example sequences.
13
Advantages of CYK A relative fast algorithm O(n 3 ) and if we take advantage of the Dynamic Programming table we can scan through the sequence O(n 2 ) We can quickly compute multiple windows at the same time It is able to fold an RNA forcefully into a specific structure that we specify
14
Introduction to the Modified CYK Algorithm Given X = X1… Xn and a SCFG G, ◦ Find the optimal parse of X ◦ Dynamic Programming (i, j, V):likelihood of the most likely parse of x i …x j, rooted at nonterminal V
15
Stochastic CYK Initialization: (i, i-1) = log P( ) Iteration: For i = 1 to N For j = i to N (i+1, j–1) + log P(x i S x j ) (i, j–1) + log P(S x i ) (i, j) = max (i+1, j) + log P(x i S) max i < k < j (i, k) + (k+1, j) + log P(S S)
16
Weight Estimation of each Non- terminal emission miRNA let7 57 sequences obtained from Rfam Used R Coffee to estimate length of the hairpin loop, stem, and bulge The parameters that we estimated seems to work well with majority of the cases of the microRNA
17
Result for CYK Insert Plot Here
18
RNAfold Most commonly used tool for predicting RNA secondary structure All the ab intio approaches or hairpin loop finders currently uses RNAfold for identifying an estimate of a microRNA structure and its MFE We use RNAfold’s mfe as a measuring stick and use some of its structural features to assist our routine
19
Result for RNAfold Insert Plot Here
20
CYK RNAfold Hybrid I use the formula as follows [CYK] * 2 + [MFE] = CombinedScore During the calculation, if RNAfold predict a structure with two or more hairpin loops, then we penalize the CYK score
21
Z score calculation In order for us to combine the features of the MFE and the CYK score we randomly sampled 20,000 sequences from the Human Genome and calculated its MFE and CYK
22
CYK RNAfold Hybrid Result Insert Plot Here
23
Optimized Sensitivity Specificity Comparison Human Specificity Test Human Sensitivity TestMouse Specificity Test Mouse Sensitivity Test 8494 pseudo-miRNA706 miRNA10213 pseudo-miRNA547 miRNA MFE 73.15%73.07%65.83%66.97% CYK 79.09%78.60%72.19%72.47% CYK-Hybrid 81.05%81.08%72.17%71.93% miRFinder 84.46%84.84%82.78%82.27%
24
Disadvangtage of our Program Limited to its structural accuracy
25
To Do List Possibly test the accuracy in terms of CYK’s ability to predicting the structure of the microRNA Need to run through the
26
Summary We currently have a routine that is capable of identifying microRNA on a 82% Sensitivity and Specificity based solely on its structure Currently communicating with a student from the UK that published microPred to see if we can use our program to retrain their SVM to see if we can get a better result
27
See Website for more Details http://128.192.76.177/ProjectUpdate/micr oRNA.html http://128.192.76.177/ProjectUpdate/micr oRNA.html http://128.192.76.177/CYK.html for testing out the grammar http://128.192.76.177/CYK.html
28
References
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.