MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.

Slides:



Advertisements
Similar presentations
Processing of miRNA samples and primary data analysis
Advertisements

Two short pieces MicroRNA Alternative splicing.
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
RNA Structure Prediction
Structural bioinformatics
miRNA Discovery and Prediction Algorithms
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Computational biology seminar
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Similar Sequence Similar Function Charles Yan Spring 2006.
DNA Barcode Data Analysis Boosting Accuracy by Combining Simple Classification Methods CSE 377 – Bioinformatics - Spring 2006 Sotirios Kentros Univ. of.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
microRNA computational prediction and analysis
Sequencing a genome and Basic Sequence Alignment
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
© Wiley Publishing All Rights Reserved.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Identifying and classifying functional small RNAs from pine Ryan Morin BC Genome Sciences Centre (presenting research conducted in the lab of Dr. Peter.
Assembling Sequences Using Trace Signals and Additional Sequence Information Bastien Chevreux, Thomas Pfisterer, Thomas Wetter, Sandor Suhai Deutsches.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.
Small RNAs and their regulatory roles. Presented by: Chirag Nepal.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
RNA interference Definition: RNA interference (RNAi) is a mechanism where the presence of certain fragments.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
RNA Structure Prediction
CSLS Retreat 2007 Matan Hofree & Assaf Weiner 1. Outline  A brief introduction to microRNA  Project motivation and goal  Selecting the data sets 
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Nature, 2008, Doi: /nature07103 Semrah Kati
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Copyright OpenHelix. No use or reproduction without express written consent1.
Improving Intergenic miRNA Target Genes Prediction Rikky Wenang Purbojati.
SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.
Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set.
Sequence Alignment.
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Abstract Premise Figure 1: Flowchart pri-miRNAs were collected from miRBase 10.0 pri-miRNAs were compared to hsa and ptr genomes using BlastN and potential.
RNA Structure Prediction
For Prediction of microRNA Genes Vertebrate MicroRNA Genes Lee P. Lim, et. al. SCIENCE 2003 The microRNAs of Caenorhabditis elegans Lee P. Lim, et al GENES.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
bacteria and eukaryotes
Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs
Genomes and Their Evolution
Predicting RNA Structure and Function
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Identification and Characterization of pre-miRNA Candidates in the C
Matthew W Jones-Rhoades, David P Bartel  Molecular Cell 
Volume 120, Issue 1, Pages (January 2005)
Computational Genomics of Noncoding RNA Genes
Fiona T van den Berg, John J Rossi, Patrick Arbuthnot, Marc S Weinberg 
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li

Outline Introduction Introduction Motivation Motivation Experiment Experiment Materials Materials Methods Methods Results Results Conclusion Conclusion

Introduction What are miRNAs and why are they important? What are miRNAs and why are they important? miRNAs are ~22 nt long non-coding RNAs miRNAs are ~22 nt long non-coding RNAs They are derived from their ~70 nt precursors, which typically have a hairpin structure They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: They are found to regulate the expression of target genes via complementary base pair interactions.

Motivation Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN

Experiment Materials Materials Reference sets Reference sets Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs animal miRNAs and their precursors(1104) composed our raw training set Train_All animal miRNAs and their precursors(1104) composed our raw training set Train_All. Train_Sub_1 : All animal miRNAs except those from C.briggsae Train_Sub_1 : All animal miRNAs except those from C.briggsae Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans Genomic sequences Genomic sequences Sequences of 6 species were used. Sequences of 6 species were used.

Methods Methods Preprocessing Preprocessing Known precursors from training set are used to BLAST against the genome Known precursors from training set are used to BLAST against the genome Potential regions are cut from the genome with 70 nt flanking sequences to each end Potential regions are cut from the genome with 70 nt flanking sequences to each end Such regions are scanned using a 100nt window with 10 nt step Such regions are scanned using a 100nt window with 10 nt step Overlapping sequences with repeat sequences are discarded. Overlapping sequences with repeat sequences are discarded.

Methods (contd) Methods (contd) miRAlign miRAlign Secondary Structure Prediction Secondary Structure Prediction Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. Only hairpins with MFE lower than -20 kcal/mol are retained. Only hairpins with MFE lower than -20 kcal/mol are retained. Pairwise sequence alignment Pairwise sequence alignment Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis

Methods (contd) Methods (contd) Checking miRNA’s position on stemloop Checking miRNA’s position on stemloop 3 properties for miRNA’s position are considered: 3 properties for miRNA’s position are considered: Should not locate on terminal loop of hairpin Should not locate on terminal loop of hairpin Should locate on the same arm of hairpin Should locate on the same arm of hairpin Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position difference of miRNA on precursors A and B:

Methods (contd) Methods (contd) RNA secondary structure alignment RNA secondary structure alignment RNAforester computes pairwise structure alignment and gives similarity score RNAforester computes pairwise structure alignment and gives similarity score Score is a summation of all base (base pair) match (insertion, deletion). Score is a summation of all base (base pair) match (insertion, deletion). Normalized similarity score of structure C and m is given as: Normalized similarity score of structure C and m is given as: where, C – Candidate sequence ; m – known pre-miRNA; sigma_local(C,m) – raw local alignment score between C and m Sigma(m,m) – self-alignment score of m

Methods (contd) Methods (contd) Total similarity score Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s

Methods (contd) Summary -

Results Application on C.briggsae Application on C.briggsae Detection of miRNA homologues - Detection of miRNA homologues - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. Identification of miRNAs in distantly related species - Identification of miRNAs in distantly related species - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded

Graph 1 - Results (contd)

Graph 2 - Results (contd)

Comparison of miRAlign with BLAST - Results (contd)

Comparison of miRAlign with ERPIN - Results (contd)

Other results: miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected. This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected. This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs. miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs. Results (contd)

Conclusion Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway. Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway.

THANK YOU Questions ??