Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs

Slides:



Advertisements
Similar presentations
Processing of miRNA samples and primary data analysis
Advertisements

Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Computational biology seminar
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Chapter 15 Noncoding RNAs. You Must Know The role of noncoding RNAs in control of cellular functions.
More regulating gene expression. Fig 16.1 Gene Expression is controlled at all of these steps: DNA packaging Transcription RNA processing and transport.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Identifying and classifying functional small RNAs from pine Ryan Morin BC Genome Sciences Centre (presenting research conducted in the lab of Dr. Peter.
Regulating Gene Expression from RNA to Protein. Fig 16.1 Gene Expression is controlled at all of these steps: DNA packaging Transcription RNA processing.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.
More regulating gene expression. Combinations of 3 nucleotides code for each 1 amino acid in a protein. We looked at the mechanisms of gene expression,
Stefan Aigner Christian Carson Rusty Gage Gene Yeo Crick-Jacobs Center Salk Institute Analysis of Small RNAs in Stem Cell Differentiation.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Welcome Everyone. Self introduction Sun, Luguo ( 孙陆果) Contact me by Professor in School of Life Sciences & National Engineering.
Youngae Lee Identification of microRNA function by target prediction and expression profiling.
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Introduction to RNAseq
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
Computational prediction of miRNA and miRNA-disease relationship
No reference available
Motif Search and RNA Structure Prediction Lesson 9.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Abstract Premise Figure 1: Flowchart pri-miRNAs were collected from miRBase 10.0 pri-miRNAs were compared to hsa and ptr genomes using BlastN and potential.
RNA Structure Prediction
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Homework #2 is due 10/17 Bonus #1 is due 10/24 Office hours: M 10/ :30am 2-5pm in Bio 6.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
Liming Cai’s Research Department of Computer Science The University of Georgia.
Metagenomic Species Diversity.
Fig Prokaryotes and Eukaryotes
ANIMAL TARGET PREDICTION - TIPS
Profiling of follicular fluid microRNAs in high and low Antral Follicle Count ovaries in cattle Rolando Pasquariello1,2, Nadia Fiandanese2, Andrea Viglino2,
The Central Dogma Transcription & Translation
Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.
Predicting RNA Structure and Function
Budding yeast has a small genome of approximately 6000 genes.
Transcription and Translation
Alternative Computational Analysis Shows No Evidence for Nucleosome Enrichment at Repetitive Sequences in Mammalian Spermatozoa  Hélène Royo, Michael Beda.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Coordinately Controlled Genes in Eukaryotes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
MicroRNAs: regulators of gene expression and cell differentiation
Identification and Characterization of pre-miRNA Candidates in the C
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
mRNA Degradation and Translation Control
Computational Discovery of miR-TF Regulatory Modules in Human Genome
Noncoding RNA roles in Gene Expression
Matthew W Jones-Rhoades, David P Bartel  Molecular Cell 
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Structure of the Genome
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CARPEL FACTORY, a Dicer Homolog, and HEN1, a Novel Protein, Act in microRNA Metabolism in Arabidopsis thaliana  Wonkeun Park, Junjie Li, Rentao Song,
Basic Local Alignment Search Tool
Differential protein, mRNA, lncRNA and miRNA regulation by p53.
Julia K. Nussbacher, Gene W. Yeo  Molecular Cell 
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs Md Safiur Rahman Mahdi Advisor: Dr. Michael Domaratzki Department of Computer Science July 27, 2015 Dear audience, good afternoon. Welcome to the presentation of my thesis defence. My thesis title is “Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs”. My advisor is Dr. Michael Domaratzki from Department of Computer Science.

Key concepts MicroRNA (miRNA) Small RNA Differential expression

From cell to DNA Cells are the basic building blocks of all living things. Image taken from: http://physics.fatih.edu.tr/biophysics/dna.htm

From DNA to Protein Transcription Translation Image taken from: http://www.cureangelman.org/what-testing101.html Central dogma of molecular biology DNA contains gene  RNA  Proteins

Noncoding RNA (ncRNA) Image taken from: http://www.resveratrolnews.com/micrornas-and-resveratrol/136/

miRNA : how does it work? Part of gene contains small non-coding rna called mirna which contains only 22 neucleotides in length They don’t make proteins, acts completely in a different way….find the complementary portion in mRNA and binds together that blocks translation form mRNA to protein They are the fundamental regulator of gene expression Image taken from: http://www.resveratrolnews.com/micrornas-and-resveratrol/136/

miRNA  1000 nt  70-600 nt Mature/star miRNA 17-24 nt 5’ 3’ There should be 0-4 nt difference between the miRNA mature and star miRNA [Kozomara et al., 2014]

What is differential expression? Responds to signals / triggers Changes in expression level Between two sample groups Control Vs. treatment Up-regulated (increased in expression) Down-regulated (decreased in expression) gene expression that responds to signals or triggers

Motivation Wheat is an 11 billion dollar industry [NRCC, 2015] How can we improve wheat breeding with possible climate change? Which miRNAs are differentially expressed with different stresses? NRCC = National Research Council Canada

Related work Mayer et al., 2014 98,068 putative precursor miRNAs 52 miRNA sequences Sun et al., 2014 Used Mayer et al.’s precursors 260 mature and star miRNAs

Contribution Designed and implemented a toolchain that identify conserved miRNAs Identified differentially expressed miRNA

72 input files (4*6*3) ~523 million reads Methodology No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences Conserved miRNAs

Tools used Python/biopython (Ubuntu Linux system) bash shell scripting (Hermes, WestGrid) BLAST Bowtie2 MAFFT RNAfold

Unique sequence identification Distinct read Sequence Read count Normalized read count / RPM: 1,000,000 * read count / total number of sequence

Removal of ncRNA sequences Split each unique file into 300 files Performed BLASTN with Rfam database Aggregated 300 files to a single file

Filtering Consistent naming at least 10 RPM [Montes et al., 2014] Identified 15,158 sequences in total

Conserved miRNA identification Used miRNA database: miRBase Discarded all miRNAs except plants Performed BLASTN with miRBase Considered 0-4 mismatches, no indels [Michael Axtell]

Conserved miRNA identification [continued]

Multiple sequence alignment Matched species  bowtie2 with  wheat genome  contigs Wheat, precursor, and conserved miRNA sequences Experimental Sequences

72 input files (4*6*3) ~523 million reads No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Putative mature miRNAs Star miRNA prediction Conserved miRNAs

Matched species

Differential gene expression Mayer et al. Sun et al. 36 miRNAs, 232 sequences Day 0: Control Vs. Heat Control Vs. light Control Vs. UV Day 10: Control Vs. Heat Control Vs. light Control Vs. UV edgeR Day 1: Control Vs. Heat Control Vs. light Control Vs. UV …………

613 experimental sequence Result - miRBase No stress (Control) Heat (37◦) Light 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Plant miRNAs filtering BLASTn 15,158 sequences 87 plant miRNAs 613 experimental sequence

Result - Mayer et al. and Sun et al. No stress (Control) Heat (37◦) Light Mayer et al.’s supplementary materials 6 days UV (2 min) 3 Replicates 72 input files (4*6*3) ~523 million reads 21 GB Putative Precursors Bowtie 2 filtering 15,158 sequences Sun et al.’s supplementary materials Putative mature miRNAs Star miRNA prediction 36 wheat miRNAs 232 experimental sequences

Result - differential gene expression

miRNAs: day 7, all stresses

miRNAs - in all days, heat stress miR 398 miR 5064 miR 2020b

Number of miRNA differentially expressed Heat: 34 families Light: 8 families UV: 7 families

Per day expression

Expression of miRNA families miRNA 395 and 398 strongly suppressed miRNA 1439, 2020, 5064 and 5175 expressed(+) with heat miRNA 395 was suppressed in all days and in all stresses

Toolchain validation Validated our tool with Brassica rapa dataset [Bilichak et al., 2015] Includes pollen, embryo, endosperm and progeny tissue Control and heat stress only

Identified Same differential expression Same miRNA-168 is expressed in endosperm tissue of heat stressed plant Bichilak et al. identified bra-miR168 with 6.48 log fold change Identified cca-miR168 with 5.83 log fold change Our log fold change may be lower because we mapped the Brassica rapa dataset to miRBase with 0 to 4 mismatches, which may possibly have excluded some Brassica rapa miRNAs.

Comparison – miRBase result Researchers are doing more research concerning miRNAs in Triticum aestivum or species related to it than Brassica rapa, which causes more entries in miRBase similar to the Triticum aestivum miRNAs than the Brassica rapa miRNAs.

Summary ~ 523 million reads: miRBase: Mayer et al. and Sun et al.: Filtered down to 15,158 sequences miRBase: Identified 87 plant miRNAs (613 sequences) Mayer et al. and Sun et al.: Identified 36 wheat miRNAs (232 sequences) Differential gene expression: Heat stress (most significant)

Find out the known conserved miRNAs Summary Find out the known conserved miRNAs

Future Work Novel miRNA prediction Predict novel miRNA from isomiR

Thank you

Related work Mayer et al., 2014 Sun et al., 2014 Identified 98,068 putative miRNA precursors Identified 52 miRNA sequences Sun et al., 2014 Used 11 libraries (dry grain, embryo etc.) Used Mayer et al.’s 98,068 miRNA precursors Reported 260 mature miRNA and star sequences

Our experimental data 21 GB dataset (~ 523 million small RNA reads) From leaf samples of 96 wheat plants 72 files: 6 days: 0, 1, 2, 3, 7, 10 4 stresses (for 3 days) Heat: 37◦ C Light: continuous light UV: 2 minutes of exposure/day Control 3 replicates File size: 250 to 450 MB

Methodology 4. Conserved miRNA identification 5. Multiple Sequence Pre-processing Processing Post-processing 1. Unique sequence identification 4. Conserved miRNA identification 5. Multiple Sequence Alignment 2. ncRNA removal 3. Filtering

4. Conserved miRNA identification [continued]

4. Conserved miRNA identification [continued]

Identification of conserved miRNAs using supplementary materials of Mayer et al., 2014 Aligned 10 RPM experimental sequences with the precursor database using Bowtie2 Finding an energetically stable structure of RNA using the sequence is known as the MFE method A dot “." represents an unpaired base open parenthesis “(" represents a base that is paired (5' end) to another base ahead of it (3' end) closed parenthesis “)" represents a base that is paired (3' end) to another base behind it (5' end) Considered exact match and MFE <0.2 Kcal/mol/nt [Kozomara et al., 2014] Considered only the 52 wheat miRNA provided by Kurtoglu et al., 2014

Finding star sequence: 5’ end: 2 base bair overhang [Kozomara et al., 2014]

Finding star sequence: 5’ end: 3’ end: 2 base bair overhang [Kozomara et al., 2014]

Exceptions

Exceptions [continued]

Differential gene expression Combination of Mayer et al. and Sun et al., 10 RPM 36 conserved miRNA families 232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common Used edgeR package of R programming language 3 (Control Vs. Heat, light, UV) files * 6 days = 18 input files

Differential gene expression

Conserved miRNA identification using miRBase database Identified 87 conserved miRNA families Matched with 613 sequences from experiment (10 RPM) Many miRNA families matched with multiple experimental sequences Tae-miR159b matched with 150 experimental sequences (maximum)

conserved miRNAs identification using the supplementary materials of Mayer et al. and Sun et al. 232 unique sequences (325 in total) 205 sequences from Mayer et al. 12 sequences from Sun et al. 15 sequences are common