Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick.

Slides:



Advertisements
Similar presentations
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Small RNA Analysis Gene 760 Jun Lu, PhD
Processing of miRNA samples and primary data analysis
Peter Tsai Bioinformatics Institute, University of Auckland
Do Now:. DNA Fingerprinting Everyone (except identical twins) has a unique DNA sequence in their cells. A technique called ________________________can.
High Throughput Sequencing
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
Next generation sequencing Xusheng Wang 4/29/2010.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
Next Generation DNA Sequencing
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Quality Control Hubert DENISE
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.
The iPlant Collaborative
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
No reference available
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Metagenomic dataset preprocessing – data reduction
Agenda  Epigenetics and microRNAs – Update –What’s epigenetics? –Preliminary results.
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
Canadian Bioinformatics Workshops
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
What is BLAST? Basic BLAST search What is BLAST?
Simon v RNA-Seq Analysis Simon v
Information flow from DNA to trait
The Transcriptional Landscape of the Mammalian Genome
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Cancer Genomics Core Lab
MGmapper A tool to map MetaGenomics data
Figure 1. The overall workflow of RNA-seq QC
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs
EDNA analyze Wang Ying & Huang Junman.
Profiling of follicular fluid microRNAs in high and low Antral Follicle Count ovaries in cattle Rolando Pasquariello1,2, Nadia Fiandanese2, Andrea Viglino2,
Biological Anthropology
RNA-Seq analysis in R (Bioconductor)
Transcriptomics II De novo assembly
exRNA Metadata Standards
RNA post-transcriptional processing
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Genome Expression Balance in a Triploid Trihybrid Vertebrate
RNA molecule RNA fragment Activity Intro Slide:
Additional file 3. Total and percentage of mapped and unmapped RNA-seq reads generated from four successive subapical internodes of bioenergy sorghum genotype.
Outline Overview of RNA-Seq Quality control and read trimming Mapping RNA-Seq reads Transcriptome assembly Additional training resources on RNA-Seq.
The FASTQ format and quality control
Kallisto: near-optimal RNA seq quantification tool
Human Cells Gene Expression
RNA and Protein Synthesis
RNA and Protein Synthesis
Transcriptome Assembly
ChIP-Seq Data Processing and QC
Biological Anthropology
Mukoye B., Mangeni B. C., Ndong’a M. F. O. and Were H. K.
mRNA Degradation and Translation Control
Maximize read usage through mapping strategies
RNA and Protein Synthesis
Working with RNA-Seq Data
Structure of the Genome
Additional file 2: RNA-Seq data analysis pipeline
BF528 - Sequence Analysis Fundamentals
Transcriptomics – towards RNASeq – part III
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
RNA-Seq Data Analysis UND Genomics Core.
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick 21.09.2018

Sus scrofa embryo samples 36 pig embryos collected at day 10 of pregnancy, sows fed with different doses of estradiol (n=6 per group, 6 groups in total) Illumina TruSeq Small RNA libraries Sequenced on Illumina HiSeq 4000 at FGCZ 126 bp single-end reads # of embryos sex E2 (µg) 6 ♀ 10 1000 ♂ exogenous substances affecting: endogenous hormonal systems reproduction health Jochen Bick 21.09.2018

Workflow overview in Galaxy Filter and sequence mapping Count number of sequences Quality control with FastQC trimming, adapter clipping

Filtering Clip adapter Trimmomatic: FastQC Universal Illumina Adapter Min length of 16 bp Only keep sequences with clipped adapter Trimmomatic: MINLEN: 16 LEADING: 3 No quality trimming FastQC After each step for quality checks Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018

How to generate the count table Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018

More converting steps Collapse FASTA-TO-Tabular Convert + Cut >1-234764 CGCGACCTCAGATCAGA >2-44042 CGCGACCTCAGATCAGAC >3-31455 ACGCGACCTCAGATCAGA >4-29208 ACTCAAACTGTGGGGGCACTTT >5-27371 TAGCTTATCAGACTGATGTTGAC >6-26520 TAAGTGCTTCCATGTTTTAGTAG >7-22345 ACGCGACCTCAGATCAGACG 1 2 1-234764 CGCGACCTCAGATCAGA 2-44042 CGCGACCTCAGATCAGAC 3-31455 ACGCGACCTCAGATCAGA 4-29208 ACTCAAACTGTGGGGGCACTTT 5-27371 TAGCTTATCAGACTGATGTTGAC 6-26520 TAAGTGCTTCCATGTTTTAGTAG 7-22345 ACGCGACCTCAGATCAGACG Convert + Cut 1 2 234764 CGCGACCTCAGATCAGA 44042 CGCGACCTCAGATCAGAC 31455 ACGCGACCTCAGATCAGA 29208 ACTCAAACTGTGGGGGCACTTT 27371 TAGCTTATCAGACTGATGTTGAC 26520 TAAGTGCTTCCATGTTTTAGTAG 22345 ACGCGACCTCAGATCAGACG ((Jochen Bick)) 21.09.2018

Jochen Bick 21.09.2018

Join datasets by identifier column Library 1 Library 2 Library 3 sequence counts Sequence_id1 32 Sequence_id3 6 Sequence_id4 1 Sequence_id5 7 sequence counts Sequence_id1 8 Sequence_id4 5 sequence counts Sequence_id1 9 Sequence_id2 3 Sequence_id3 2 Results in joined table sequence counts lib1 counts lib2 counts lib3 Sequence_id1 32 8 9 Sequence_id2 3 Sequence_id3 6 2 Sequence_id4 1 5 Sequence_id5 7 ((Jochen Bick)) 21.09.2018

Join datasets by identifier column results ((Jochen Bick)) 21.09.2018

Counttable statistics Jochen Bick 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Filter count table by CPM cutoff Jochen Bick 21.09.2018

Filtering and mapping mature miRNA precursor miRNA tRNAs - Scc piRNA cluster - Ssc all transcripts - Ssc ncRNAs - Hsa Jochen Bick 21.09.2018

Annotation sources used for BLAST Mature and precursor miRNA Translate RNA to DNA, join FASTA files into groups, remove duplicates, create BLAST databases Non-coding RNA, (tRNAs, small RNAs) Non-coding RNA (tRNA, small RNAs), piRNA cluster mRNAs Precursor mircoRNAs with and hairpin/stemloop structure produce one or two mature mircoRNAs ((Jochen Bick)) 21.09.2018

Analysis of sequence counts Huge variation in percentage of read counts for miRNAs Originating from RNA isolation procedure? % miRNA # sequences ((Jochen Bick)) 21.09.2018

Analysis of sequence counts # sequences ((Jochen Bick)) 21.09.2018

Sequence statistics Unique sequences 97.2% Raw reads ~24,000,000 1.8% Filtering ~68,000 1.1% Annotation ~26,000 used for DEG analysis ~1,300 Only miRNA ((Jochen Bick)) 21.09.2018

Thank you for your attention Acknowledgements Veronika Flöter Supervisors: Stefan Bauersachs Susanne Ulbrich Mark Robinson plus group Thank you for your attention