Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick.

Slides:

Advertisements

Similar presentations

Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.

Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Small RNA Analysis Gene 760 Jun Lu, PhD

Processing of miRNA samples and primary data analysis

Peter Tsai Bioinformatics Institute, University of Auckland

Do Now:. DNA Fingerprinting Everyone (except identical twins) has a unique DNA sequence in their cells. A technique called ________________________can.

High Throughput Sequencing

Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.

Next generation sequencing Xusheng Wang 4/29/2010.

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.

Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.

Next Generation DNA Sequencing

The iPlant Collaborative

RNA-Seq Assembly 转录组拼接唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日.

Quality Control Hubert DENISE

 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman

SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.

The iPlant Collaborative

Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.

No reference available

Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.

Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,

Metagenomic dataset preprocessing – data reduction

Agenda  Epigenetics and microRNAs – Update –What’s epigenetics? –Preliminary results.

Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon

Canadian Bioinformatics Workshops

071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.

What is BLAST? Basic BLAST search What is BLAST?

Simon v RNA-Seq Analysis Simon v

Information flow from DNA to trait

The Transcriptional Landscape of the Mammalian Genome

Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017

Cancer Genomics Core Lab

MGmapper A tool to map MetaGenomics data

Figure 1. The overall workflow of RNA-seq QC

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Identifying Conserved microRNAs in a Large Dataset of Wheat Small RNAs

EDNA analyze Wang Ying & Huang Junman.

Profiling of follicular fluid microRNAs in high and low Antral Follicle Count ovaries in cattle Rolando Pasquariello1,2, Nadia Fiandanese2, Andrea Viglino2,

Biological Anthropology

RNA-Seq analysis in R (Bioconductor)

Transcriptomics II De novo assembly

exRNA Metadata Standards

RNA post-transcriptional processing

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Genome Expression Balance in a Triploid Trihybrid Vertebrate

RNA molecule RNA fragment Activity Intro Slide:

Additional file 3. Total and percentage of mapped and unmapped RNA-seq reads generated from four successive subapical internodes of bioenergy sorghum genotype.

Outline Overview of RNA-Seq Quality control and read trimming Mapping RNA-Seq reads Transcriptome assembly Additional training resources on RNA-Seq.

The FASTQ format and quality control

Kallisto: near-optimal RNA seq quantification tool

Human Cells Gene Expression

RNA and Protein Synthesis

RNA and Protein Synthesis

Transcriptome Assembly

ChIP-Seq Data Processing and QC

Biological Anthropology

Mukoye B., Mangeni B. C., Ndong’a M. F. O. and Were H. K.

mRNA Degradation and Translation Control

Maximize read usage through mapping strategies

RNA and Protein Synthesis

Working with RNA-Seq Data

Structure of the Genome

Additional file 2: RNA-Seq data analysis pipeline

BF528 - Sequence Analysis Fundamentals

Transcriptomics – towards RNASeq – part III

Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen Cell Reports

RNA-Seq Data Analysis UND Genomics Core.

Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick

Presentation transcript:

Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick 21.09.2018

Sus scrofa embryo samples 36 pig embryos collected at day 10 of pregnancy, sows fed with different doses of estradiol (n=6 per group, 6 groups in total) Illumina TruSeq Small RNA libraries Sequenced on Illumina HiSeq 4000 at FGCZ 126 bp single-end reads # of embryos sex E2 (µg) 6 ♀ 10 1000 ♂ exogenous substances affecting: endogenous hormonal systems reproduction health Jochen Bick 21.09.2018

Workflow overview in Galaxy Filter and sequence mapping Count number of sequences Quality control with FastQC trimming, adapter clipping

Filtering Clip adapter Trimmomatic: FastQC Universal Illumina Adapter Min length of 16 bp Only keep sequences with clipped adapter Trimmomatic: MINLEN: 16 LEADING: 3 No quality trimming FastQC After each step for quality checks Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018

How to generate the count table Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018

More converting steps Collapse FASTA-TO-Tabular Convert + Cut >1-234764 CGCGACCTCAGATCAGA >2-44042 CGCGACCTCAGATCAGAC >3-31455 ACGCGACCTCAGATCAGA >4-29208 ACTCAAACTGTGGGGGCACTTT >5-27371 TAGCTTATCAGACTGATGTTGAC >6-26520 TAAGTGCTTCCATGTTTTAGTAG >7-22345 ACGCGACCTCAGATCAGACG 1 2 1-234764 CGCGACCTCAGATCAGA 2-44042 CGCGACCTCAGATCAGAC 3-31455 ACGCGACCTCAGATCAGA 4-29208 ACTCAAACTGTGGGGGCACTTT 5-27371 TAGCTTATCAGACTGATGTTGAC 6-26520 TAAGTGCTTCCATGTTTTAGTAG 7-22345 ACGCGACCTCAGATCAGACG Convert + Cut 1 2 234764 CGCGACCTCAGATCAGA 44042 CGCGACCTCAGATCAGAC 31455 ACGCGACCTCAGATCAGA 29208 ACTCAAACTGTGGGGGCACTTT 27371 TAGCTTATCAGACTGATGTTGAC 26520 TAAGTGCTTCCATGTTTTAGTAG 22345 ACGCGACCTCAGATCAGACG ((Jochen Bick)) 21.09.2018

Jochen Bick 21.09.2018

Join datasets by identifier column Library 1 Library 2 Library 3 sequence counts Sequence_id1 32 Sequence_id3 6 Sequence_id4 1 Sequence_id5 7 sequence counts Sequence_id1 8 Sequence_id4 5 sequence counts Sequence_id1 9 Sequence_id2 3 Sequence_id3 2 Results in joined table sequence counts lib1 counts lib2 counts lib3 Sequence_id1 32 8 9 Sequence_id2 3 Sequence_id3 6 2 Sequence_id4 1 5 Sequence_id5 7 ((Jochen Bick)) 21.09.2018

Join datasets by identifier column results ((Jochen Bick)) 21.09.2018

Counttable statistics Jochen Bick 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018

Filter count table by CPM cutoff Jochen Bick 21.09.2018

Filtering and mapping mature miRNA precursor miRNA tRNAs - Scc piRNA cluster - Ssc all transcripts - Ssc ncRNAs - Hsa Jochen Bick 21.09.2018

Annotation sources used for BLAST Mature and precursor miRNA Translate RNA to DNA, join FASTA files into groups, remove duplicates, create BLAST databases Non-coding RNA, (tRNAs, small RNAs) Non-coding RNA (tRNA, small RNAs), piRNA cluster mRNAs Precursor mircoRNAs with and hairpin/stemloop structure produce one or two mature mircoRNAs ((Jochen Bick)) 21.09.2018

Analysis of sequence counts Huge variation in percentage of read counts for miRNAs Originating from RNA isolation procedure? % miRNA # sequences ((Jochen Bick)) 21.09.2018

Analysis of sequence counts # sequences ((Jochen Bick)) 21.09.2018

Sequence statistics Unique sequences 97.2% Raw reads ~24,000,000 1.8% Filtering ~68,000 1.1% Annotation ~26,000 used for DEG analysis ~1,300 Only miRNA ((Jochen Bick)) 21.09.2018

Thank you for your attention Acknowledgements Veronika Flöter Supervisors: Stefan Bauersachs Susanne Ulbrich Mark Robinson plus group Thank you for your attention