Garbage In, Garbage Out: Quality control on sequence data

Slides:

Advertisements

Similar presentations

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Advertisements

Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.

Peter Tsai Bioinformatics Institute, University of Auckland

RNA-seq Analysis in Galaxy

NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.

Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.

Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.

NGS Analysis Using Galaxy

National Center for Genome Analysis Support: Carrie Ganote Ram Podicheti Le-Shin Wu Tom Doak Quality Control and Assessment.

Expression Analysis of RNA-seq Data

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

Giuseppe D'Auria Norwich September 2014 FISABIO, Valencia Introduction into the processing of raw data.

Pyrosequencing for Metagenomics: accessing and organizing raw data Giuseppe D’Auria FISABIO, Valencia Norwich September 2014.

Next Generation DNA Sequencing

Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.

The iPlant Collaborative

RNA-Seq Assembly 转录组拼接唐海宝基因组与生物技术研究中心 2013 年 11 月 23 日.

Quality Control Hubert DENISE

IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.

Introduction to RNAseq

Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.

First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.

Setting up visualization. Make output folder for visualization files Log into vieques $ ssh

Canadian Bioinformatics Workshops

Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.

From Reads to Results Exome-seq analysis at CCBR

Canadian Bioinformatics Workshops

Quality Control Metrics for DNA Sequencing

Konstantin Okonechnikov Qualimap v2: advanced quality control of

Simon v RNA-Seq Analysis Simon v

Using command line tools to process sequencing data

Placental Bioinformatics

Cancer Genomics Core Lab

RNA-Seq Green Line Overview

OptiSystem applications: SER & BER analysis of QAM-PSK-PAM systems

MGmapper A tool to map MetaGenomics data

3.3 Fundamentals of data representation

Bacterial Genome Assembly

Gene expression from RNA-Seq

Short Read Sequencing Analysis Workshop

QC analysis Uppsala University Work done by Jonas Almlöf

Recall The Team Skills Analyzing the Problem

ChIP-Seq Analysis – Using CLCGenomics Workbench

The FASTQ format and quality control

EMC Galaxy Course November 24-25, 2014

Introduction into the processing of raw data

Workshop on Microbiome and Health

Bacterial Genome Assembly

Inferential Statistics

Stat 217 – Day 28 Review Stat 217.

ChIP-Seq Data Processing and QC

Exploring and Understanding ChIP-Seq data

Identification and Characterization of pre-miRNA Candidates in the C

Digital Certificates and X.509

A critical evaluation of HTQC: a fast quality control toolkit for Illumina sequencing data Chandan Pal, PhD student Sahlgrenska Academy Institute of.

Learning to count: quantifying signal

Maximize read usage through mapping strategies

Splenic CD169+ macrophages express a unique gene profile.

KEY CONCEPT _____ encode _______ that produce a ______ _____ of _____.

Box plots of quality scores over positions in sequenced reads.

BF nd (Next) Generation Sequencing

Additional file 2: RNA-Seq data analysis pipeline

BF528 - Sequence Analysis Fundamentals

Computational Pipeline Strategies

RNA-Seq Data Analysis UND Genomics Core.

Quality Control & Nascent Sequencing

The Variant Call Format

Presentation transcript:

Garbage In, Garbage Out: Quality control on sequence data

Key concepts of session The quality of the data limits what you can confidently say about the data and how you can subsequently use it. An important component to quality control is visualization: you must actually LOOK at your data.

So you have reads off a sequencer … where do you start? The fastQ format: More on the file format and quality encoding: https://en.wikipedia.org/wiki/FASTQ_format

Expectation

But the reality may be very different

So what? Why does QC matter? You are going to spend a LOT of time (and $) on this dataset. Downstream analysis software assumes pretty well behaved data!!

How to assess a bag of reads Pre-mapping: FastQC GC content read quality (Phred score) Post-mapping: read coverage (which regions, how much) complexity (# unique samples)

Protocol matters – how the experiment influences your QC Mistakes in protocol can result in abnormal distributions Poor read quality = poor mapping = poor coverage

WHY doesn’t it look like I wanted? Cell clustering – over-amplification Low library complexity Problems with amplification or size selection Problem with adapters See also: https://sequencing.qcfail.com/

But one person’s garbage is another’s treasure.

You can still obtain information Even low coverage samples can give you information: Which genes are being actively transcribed Differentially expressed genes (depending on depth and coverage)

Running FastQC – Pre-Trim Determine which adapters are present if you are unsure of the protocol Assess whether sequencing/protocol providing the results expected Refine trimming options

In this script, we will: Flip reads (reverse complement) – protocol dependent Run FastQC To run (after adjusting parameters in green box): $ bash fastqc_pretrim.sh

Open up our fastqc .html report

Trimming Many different trimming programs available We will use “bbduk” – quick runtime, lots of trim options $ vi trim.sh

In this script, we will: Trim for adapters (followed by length) Trim for quality To run (after adjusting rootname/project): $ bash trim.sh

View trim stats $ cd /home/user/hackcon/trimmed $ ls $ vim sample.stats What can we learn from this report?

Running FastQC – Post-Trim Determine which adapters are present if you are unsure of the protocol Assess whether sequencing/protocol providing the results expected Refine trimming options

In this script, we will: Assess our trimming parameters Determine if we need to re-trim or move forward with mapping To run (after adjusting rootname/project): $ bash fastqc_postrim.sh

Open up our fastqc .html report