Presentation is loading. Please wait.

Presentation is loading. Please wait.

MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.

Similar presentations


Presentation on theme: "MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong."— Presentation transcript:

1 MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li

2 Outline Introduction Introduction Motivation Motivation Experiment Experiment Materials Materials Methods Methods Results Results Conclusion Conclusion

3 Introduction What are miRNAs and why are they important? What are miRNAs and why are they important? miRNAs are ~22 nt long non-coding RNAs miRNAs are ~22 nt long non-coding RNAs They are derived from their ~70 nt precursors, which typically have a hairpin structure They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: They are found to regulate the expression of target genes via complementary base pair interactions.

4 Motivation Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN

5 Experiment Materials Materials Reference sets Reference sets Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All. 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All. Train_Sub_1 : All animal miRNAs except those from C.briggsae Train_Sub_1 : All animal miRNAs except those from C.briggsae Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans Genomic sequences Genomic sequences Sequences of 6 species were used. Sequences of 6 species were used.

6 Methods Methods Preprocessing Preprocessing Known precursors from training set are used to BLAST against the genome Known precursors from training set are used to BLAST against the genome Potential regions are cut from the genome with 70 nt flanking sequences to each end Potential regions are cut from the genome with 70 nt flanking sequences to each end Such regions are scanned using a 100nt window with 10 nt step Such regions are scanned using a 100nt window with 10 nt step Overlapping sequences with repeat sequences are discarded. Overlapping sequences with repeat sequences are discarded.

7 Methods (contd) Methods (contd) miRAlign miRAlign Secondary Structure Prediction Secondary Structure Prediction Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. Only hairpins with MFE lower than -20 kcal/mol are retained. Only hairpins with MFE lower than -20 kcal/mol are retained. Pairwise sequence alignment Pairwise sequence alignment Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis

8 Methods (contd) Methods (contd) Checking miRNA’s position on stemloop Checking miRNA’s position on stemloop 3 properties for miRNA’s position are considered: 3 properties for miRNA’s position are considered: Should not locate on terminal loop of hairpin Should not locate on terminal loop of hairpin Should locate on the same arm of hairpin Should locate on the same arm of hairpin Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position difference of miRNA on precursors A and B:

9 Methods (contd) Methods (contd) RNA secondary structure alignment RNA secondary structure alignment RNAforester computes pairwise structure alignment and gives similarity score RNAforester computes pairwise structure alignment and gives similarity score Score is a summation of all base (base pair) match (insertion, deletion). Score is a summation of all base (base pair) match (insertion, deletion). Normalized similarity score of structure C and m is given as: Normalized similarity score of structure C and m is given as: where, C – Candidate sequence ; m – known pre-miRNA; sigma_local(C,m) – raw local alignment score between C and m Sigma(m,m) – self-alignment score of m

10 Methods (contd) Methods (contd) Total similarity score Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s

11 Methods (contd) Summary -

12 Results Application on C.briggsae Application on C.briggsae Detection of miRNA homologues - Detection of miRNA homologues - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. Identification of miRNAs in distantly related species - Identification of miRNAs in distantly related species - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded

13 Graph 1 - Results (contd)

14 Graph 2 - Results (contd)

15 Comparison of miRAlign with BLAST - Results (contd)

16 Comparison of miRAlign with ERPIN - Results (contd)

17 Other results: miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected. This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected. This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs. miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs. Results (contd)

18 Conclusion Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway. Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway.

19 THANK YOU Questions ??


Download ppt "MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong."

Similar presentations


Ads by Google