Download presentation
Presentation is loading. Please wait.
Published byCharles Webster Modified over 6 years ago
1
Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs
Md Safiur Rahman Mahdi Advisor: Dr. Michael Domaratzki Department of Computer Science July 27, 2015 Dear audience, good afternoon. Welcome to the presentation of my thesis defence. My thesis title is “Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs”. My advisor is Dr. Michael Domaratzki from Department of Computer Science.
2
Key concepts MicroRNA (miRNA) Small RNA Differential expression
3
From cell to DNA Cells are the basic building blocks of all living things. Image taken from:
4
From DNA to Protein Transcription Translation
Image taken from: Central dogma of molecular biology DNA contains gene RNA Proteins
5
Noncoding RNA (ncRNA) Image taken from:
6
miRNA : how does it work? Part of gene contains small non-coding rna called mirna which contains only 22 neucleotides in length They don’t make proteins, acts completely in a different way….find the complementary portion in mRNA and binds together that blocks translation form mRNA to protein They are the fundamental regulator of gene expression Image taken from:
7
miRNA 1000 nt 70-600 nt Mature/star miRNA 17-24 nt 5’ 3’
There should be 0-4 nt difference between the miRNA mature and star miRNA [Kozomara et al., 2014]
8
What is differential expression?
Responds to signals / triggers Changes in expression level Between two sample groups Control Vs. treatment Up-regulated (increased in expression) Down-regulated (decreased in expression) gene expression that responds to signals or triggers
9
Motivation Wheat is an 11 billion dollar industry [NRCC, 2015]
How can we improve wheat breeding with possible climate change? Which miRNAs are differentially expressed with different stresses? NRCC = National Research Council Canada
10
Related work Mayer et al., 2014 98,068 putative precursor miRNAs
52 miRNA sequences Sun et al., 2014 Used Mayer et al.’s precursors 260 mature and star miRNAs
11
Contribution Designed and implemented a toolchain that identify conserved miRNAs Identified differentially expressed miRNA
12
72 input files (4*6*3) ~523 million reads
Methodology No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences Conserved miRNAs
13
Tools used Python/biopython (Ubuntu Linux system)
bash shell scripting (Hermes, WestGrid) BLAST Bowtie2 MAFFT RNAfold
14
Unique sequence identification
Distinct read Sequence Read count Normalized read count / RPM: 1,000,000 * read count / total number of sequence
15
Removal of ncRNA sequences
Split each unique file into 300 files Performed BLASTN with Rfam database Aggregated 300 files to a single file
16
Filtering Consistent naming at least 10 RPM [Montes et al., 2014]
Identified 15,158 sequences in total
17
Conserved miRNA identification
Used miRNA database: miRBase Discarded all miRNAs except plants Performed BLASTN with miRBase Considered 0-4 mismatches, no indels [Michael Axtell]
18
Conserved miRNA identification [continued]
19
Multiple sequence alignment
Matched species bowtie2 with wheat genome contigs Wheat, precursor, and conserved miRNA sequences Experimental Sequences
20
72 input files (4*6*3) ~523 million reads
No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Putative mature miRNAs Star miRNA prediction Conserved miRNAs
21
Matched species
22
Differential gene expression
Mayer et al. Sun et al. 36 miRNAs, 232 sequences Day 0: Control Vs. Heat Control Vs. light Control Vs. UV Day 10: Control Vs. Heat Control Vs. light Control Vs. UV edgeR Day 1: Control Vs. Heat Control Vs. light Control Vs. UV …………
23
613 experimental sequence
Result - miRBase No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences 87 plant miRNAs 613 experimental sequence
24
Result - Mayer et al. and Sun et al.
No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Sun et al.’s supplementary materials Putative mature miRNAs Star miRNA prediction 36 wheat miRNAs 232 experimental sequences
25
Result - differential gene expression
26
miRNAs: day 7, all stresses
27
miRNAs - in all days, heat stress
miR 398 miR 5064 miR 2020b
28
Number of miRNA differentially expressed
Heat: 34 families Light: 8 families UV: 7 families
29
Per day expression
30
Expression of miRNA families
miRNA 395 and 398 strongly suppressed miRNA 1439, 2020, 5064 and 5175 expressed(+) with heat miRNA 395 was suppressed in all days and in all stresses
31
Toolchain validation Validated our tool with Brassica rapa dataset [Bilichak et al., 2015] Includes pollen, embryo, endosperm and progeny tissue Control and heat stress only
32
Identified Same differential expression
Same miRNA-168 is expressed in endosperm tissue of heat stressed plant Bichilak et al. identified bra-miR168 with 6.48 log fold change Identified cca-miR168 with 5.83 log fold change Our log fold change may be lower because we mapped the Brassica rapa dataset to miRBase with 0 to 4 mismatches, which may possibly have excluded some Brassica rapa miRNAs.
33
Comparison – miRBase result
Researchers are doing more research concerning miRNAs in Triticum aestivum or species related to it than Brassica rapa, which causes more entries in miRBase similar to the Triticum aestivum miRNAs than the Brassica rapa miRNAs.
34
Summary ~ 523 million reads: miRBase: Mayer et al. and Sun et al.:
Filtered down to 15,158 sequences miRBase: Identified 87 plant miRNAs (613 sequences) Mayer et al. and Sun et al.: Identified 36 wheat miRNAs (232 sequences) Differential gene expression: Heat stress (most significant)
35
Find out the known conserved miRNAs
Summary Find out the known conserved miRNAs
36
Future Work Novel miRNA prediction Predict novel miRNA from isomiR
37
Thank you
38
Related work Mayer et al., 2014 Sun et al., 2014
Identified 98,068 putative miRNA precursors Identified 52 miRNA sequences Sun et al., 2014 Used 11 libraries (dry grain, embryo etc.) Used Mayer et al.’s 98,068 miRNA precursors Reported 260 mature miRNA and star sequences
39
Our experimental data 21 GB dataset (~ 523 million small RNA reads)
From leaf samples of 96 wheat plants 72 files: 6 days: 0, 1, 2, 3, 7, 10 4 stresses (for 3 days) Heat: 37◦ C Light: continuous light UV: 2 minutes of exposure/day Control 3 replicates File size: 250 to 450 MB
40
Methodology 4. Conserved miRNA identification 5. Multiple Sequence
Pre-processing Processing Post-processing 1. Unique sequence identification 4. Conserved miRNA identification 5. Multiple Sequence Alignment 2. ncRNA removal 3. Filtering
41
4. Conserved miRNA identification [continued]
42
4. Conserved miRNA identification [continued]
43
Identification of conserved miRNAs using supplementary materials of Mayer et al., 2014
Aligned 10 RPM experimental sequences with the precursor database using Bowtie2 Finding an energetically stable structure of RNA using the sequence is known as the MFE method A dot “." represents an unpaired base open parenthesis “(" represents a base that is paired (5' end) to another base ahead of it (3' end) closed parenthesis “)" represents a base that is paired (3' end) to another base behind it (5' end) Considered exact match and MFE <0.2 Kcal/mol/nt [Kozomara et al., 2014] Considered only the 52 wheat miRNA provided by Kurtoglu et al., 2014
44
Finding star sequence:
5’ end: 2 base bair overhang [Kozomara et al., 2014]
45
Finding star sequence:
5’ end: 3’ end: 2 base bair overhang [Kozomara et al., 2014]
46
Exceptions
47
Exceptions [continued]
48
Differential gene expression
Combination of Mayer et al. and Sun et al., 10 RPM 36 conserved miRNA families 232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common Used edgeR package of R programming language 3 (Control Vs. Heat, light, UV) files * 6 days = 18 input files
49
Differential gene expression
50
Conserved miRNA identification using miRBase database
Identified 87 conserved miRNA families Matched with 613 sequences from experiment (10 RPM) Many miRNA families matched with multiple experimental sequences Tae-miR159b matched with 150 experimental sequences (maximum)
51
conserved miRNAs identification using the supplementary materials of Mayer et al. and Sun et al.
232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.