Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs Md Safiur Rahman Mahdi Advisor: Dr. Michael Domaratzki Department of Computer Science July 27, 2015 Dear audience, good afternoon. Welcome to the presentation of my thesis defence. My thesis title is “Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs”. My advisor is Dr. Michael Domaratzki from Department of Computer Science.
Key concepts MicroRNA (miRNA) Small RNA Differential expression
From cell to DNA Cells are the basic building blocks of all living things. Image taken from: http://physics.fatih.edu.tr/biophysics/dna.htm
From DNA to Protein Transcription Translation Image taken from: http://www.cureangelman.org/what-testing101.html Central dogma of molecular biology DNA contains gene RNA Proteins
Noncoding RNA (ncRNA) Image taken from: http://www.resveratrolnews.com/micrornas-and-resveratrol/136/
miRNA : how does it work? Part of gene contains small non-coding rna called mirna which contains only 22 neucleotides in length They don’t make proteins, acts completely in a different way….find the complementary portion in mRNA and binds together that blocks translation form mRNA to protein They are the fundamental regulator of gene expression Image taken from: http://www.resveratrolnews.com/micrornas-and-resveratrol/136/
miRNA 1000 nt 70-600 nt Mature/star miRNA 17-24 nt 5’ 3’ There should be 0-4 nt difference between the miRNA mature and star miRNA [Kozomara et al., 2014]
What is differential expression? Responds to signals / triggers Changes in expression level Between two sample groups Control Vs. treatment Up-regulated (increased in expression) Down-regulated (decreased in expression) gene expression that responds to signals or triggers
Motivation Wheat is an 11 billion dollar industry [NRCC, 2015] How can we improve wheat breeding with possible climate change? Which miRNAs are differentially expressed with different stresses? NRCC = National Research Council Canada
Related work Mayer et al., 2014 98,068 putative precursor miRNAs 52 miRNA sequences Sun et al., 2014 Used Mayer et al.’s precursors 260 mature and star miRNAs
Contribution Designed and implemented a toolchain that identify conserved miRNAs Identified differentially expressed miRNA
72 input files (4*6*3) ~523 million reads Methodology No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences Conserved miRNAs
Tools used Python/biopython (Ubuntu Linux system) bash shell scripting (Hermes, WestGrid) BLAST Bowtie2 MAFFT RNAfold
Unique sequence identification Distinct read Sequence Read count Normalized read count / RPM: 1,000,000 * read count / total number of sequence
Removal of ncRNA sequences Split each unique file into 300 files Performed BLASTN with Rfam database Aggregated 300 files to a single file
Filtering Consistent naming at least 10 RPM [Montes et al., 2014] Identified 15,158 sequences in total
Conserved miRNA identification Used miRNA database: miRBase Discarded all miRNAs except plants Performed BLASTN with miRBase Considered 0-4 mismatches, no indels [Michael Axtell]
Conserved miRNA identification [continued]
Multiple sequence alignment Matched species bowtie2 with wheat genome contigs Wheat, precursor, and conserved miRNA sequences Experimental Sequences
72 input files (4*6*3) ~523 million reads No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Putative mature miRNAs Star miRNA prediction Conserved miRNAs
Matched species
Differential gene expression Mayer et al. Sun et al. 36 miRNAs, 232 sequences Day 0: Control Vs. Heat Control Vs. light Control Vs. UV Day 10: Control Vs. Heat Control Vs. light Control Vs. UV edgeR Day 1: Control Vs. Heat Control Vs. light Control Vs. UV …………
613 experimental sequence Result - miRBase No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences 87 plant miRNAs 613 experimental sequence
Result - Mayer et al. and Sun et al. No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Sun et al.’s supplementary materials Putative mature miRNAs Star miRNA prediction 36 wheat miRNAs 232 experimental sequences
Result - differential gene expression
miRNAs: day 7, all stresses
miRNAs - in all days, heat stress miR 398 miR 5064 miR 2020b
Number of miRNA differentially expressed Heat: 34 families Light: 8 families UV: 7 families
Per day expression
Expression of miRNA families miRNA 395 and 398 strongly suppressed miRNA 1439, 2020, 5064 and 5175 expressed(+) with heat miRNA 395 was suppressed in all days and in all stresses
Toolchain validation Validated our tool with Brassica rapa dataset [Bilichak et al., 2015] Includes pollen, embryo, endosperm and progeny tissue Control and heat stress only
Identified Same differential expression Same miRNA-168 is expressed in endosperm tissue of heat stressed plant Bichilak et al. identified bra-miR168 with 6.48 log fold change Identified cca-miR168 with 5.83 log fold change Our log fold change may be lower because we mapped the Brassica rapa dataset to miRBase with 0 to 4 mismatches, which may possibly have excluded some Brassica rapa miRNAs.
Comparison – miRBase result Researchers are doing more research concerning miRNAs in Triticum aestivum or species related to it than Brassica rapa, which causes more entries in miRBase similar to the Triticum aestivum miRNAs than the Brassica rapa miRNAs.
Summary ~ 523 million reads: miRBase: Mayer et al. and Sun et al.: Filtered down to 15,158 sequences miRBase: Identified 87 plant miRNAs (613 sequences) Mayer et al. and Sun et al.: Identified 36 wheat miRNAs (232 sequences) Differential gene expression: Heat stress (most significant)
Find out the known conserved miRNAs Summary Find out the known conserved miRNAs
Future Work Novel miRNA prediction Predict novel miRNA from isomiR
Thank you
Related work Mayer et al., 2014 Sun et al., 2014 Identified 98,068 putative miRNA precursors Identified 52 miRNA sequences Sun et al., 2014 Used 11 libraries (dry grain, embryo etc.) Used Mayer et al.’s 98,068 miRNA precursors Reported 260 mature miRNA and star sequences
Our experimental data 21 GB dataset (~ 523 million small RNA reads) From leaf samples of 96 wheat plants 72 files: 6 days: 0, 1, 2, 3, 7, 10 4 stresses (for 3 days) Heat: 37◦ C Light: continuous light UV: 2 minutes of exposure/day Control 3 replicates File size: 250 to 450 MB
Methodology 4. Conserved miRNA identification 5. Multiple Sequence Pre-processing Processing Post-processing 1. Unique sequence identification 4. Conserved miRNA identification 5. Multiple Sequence Alignment 2. ncRNA removal 3. Filtering
4. Conserved miRNA identification [continued]
4. Conserved miRNA identification [continued]
Identification of conserved miRNAs using supplementary materials of Mayer et al., 2014 Aligned 10 RPM experimental sequences with the precursor database using Bowtie2 Finding an energetically stable structure of RNA using the sequence is known as the MFE method A dot “." represents an unpaired base open parenthesis “(" represents a base that is paired (5' end) to another base ahead of it (3' end) closed parenthesis “)" represents a base that is paired (3' end) to another base behind it (5' end) Considered exact match and MFE <0.2 Kcal/mol/nt [Kozomara et al., 2014] Considered only the 52 wheat miRNA provided by Kurtoglu et al., 2014
Finding star sequence: 5’ end: 2 base bair overhang [Kozomara et al., 2014]
Finding star sequence: 5’ end: 3’ end: 2 base bair overhang [Kozomara et al., 2014]
Exceptions
Exceptions [continued]
Differential gene expression Combination of Mayer et al. and Sun et al., 10 RPM 36 conserved miRNA families 232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common Used edgeR package of R programming language 3 (Control Vs. Heat, light, UV) files * 6 days = 18 input files
Differential gene expression
Conserved miRNA identification using miRBase database Identified 87 conserved miRNA families Matched with 613 sequences from experiment (10 RPM) Many miRNA families matched with multiple experimental sequences Tae-miR159b matched with 150 experimental sequences (maximum)
conserved miRNAs identification using the supplementary materials of Mayer et al. and Sun et al. 232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common