EMC Galaxy Course November 24-25, 2014

Slides:



Advertisements
Similar presentations
Resequencing Genome Timothee Cezard EBI NGS workshop 16/10/2012.
Advertisements

Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
DNAseq analysis Bioinformatics Analysis Team
High Throughput Sequencing
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
NGS data analyses with BioUML Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia.
NGS Analysis Using Galaxy
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Steve Newhouse 28 Jan  Practical guide to processing next generation sequencing data  No details on the inner workings of the software/code &
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Copyright © 2011 Partek Incorporated. All rights reserved. Statistics Visualizations Annotations Start-to-Finish Analysis of Integrated Genomics.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou,
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Quality Control Hubert DENISE
Bioinformatics trainings, Vietnam Hanoi, November, 2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Personalized genomics
Calling Somatic Mutations using VarScan
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Discriminating somatic and germline mutations in tumour DNA samples without matching normals Saskia Hiltemann Erasmus Medical Center, Rotterdam MGC Symposium,
Introduction to Variant Analysis of Exome- and Amplicon sequencing data Lecture by: Date: Training: Extended version see: Dr. Christian Rausch 29 May 2015.
Integrated variant detection Erik Garrison, Boston College.
Introduction to Variant Analysis with NGS data
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
From Reads to Results Exome-seq analysis at CCBR
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
SNP and Genomic analysis SNP/genomic signature Clinical sampling Personalized chemotherapy Personalized Targeted therapy Personalized RNA therapy Personalized.
Canadian Bioinformatics Workshops
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Erasmus Andrew Stubbs
Using command line tools to process sequencing data
> cd ~ > cp –R /media/sf_shared/BioNGS/GenomicVar/* .
Variant Calling Chris Fields
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Short Read Sequencing Analysis Workshop
First Bite of Variant Calling in NGS/MPS Precourse materials
Computational methods for genomics-guided immunotherapy
The FASTQ format and quality control
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
2nd (Next) Generation Sequencing
Performance of Common Analysis Methods for Detecting Low-Frequency Single Nucleotide Variants in Targeted Next-Generation Sequence Data  David H. Spencer,
Discovery tools for human genetic variations
ChIP-Seq Data Processing and QC
Next Gen. Sequencing Files and pysam
A critical evaluation of HTQC: a fast quality control toolkit for Illumina sequencing data Chandan Pal, PhD student Sahlgrenska Academy Institute of.
Maximize read usage through mapping strategies
Garbage In, Garbage Out: Quality control on sequence data
Volume 8, Issue 6, Pages (June 2015)
Canadian Bioinformatics Workshops
BF528 - Sequence Analysis Fundamentals
Variant Calling Chris Fields
The Variant Call Format
Presentation transcript:

Introduction to Data Processing and Variant Detection for NGS DNA Sequencing EMC Galaxy Course November 24-25, 2014 Youri Hoogstrate, David van Zessen, Saskia Hiltemann Guido Jenster, Andrew Stubbs

How does next-gen sequencing work?

Instruments generate short reads that need to be mapped to the reference

High-level overview of NGS data processing

Aligned reads In Galaxy, you can view your data in the built-in genome browser, Trackster

Challenge: distinguishing variants from noise Possible reasons for a mismatch: - True SNP - Error generated in library prep - Base calling error - Misalignment (mapping error) - Error in reference genome

Genotyping - What are the set of alleles at this locus? What are the frequencies? - Genotypers begin with a model of prior knowledge about the likelihood (and types) or errors, and the likelihood of observing real variants. - Error models depend on sequencing technology

What we know about NGS technology Relatively high per-base error rate Reads are higher quality in the middle than at the ends Some technologies are poor with homopolymers, GC rich Indels confuse alignment Sequence coverage is not uniform Alignments are probabilistic Quality Control Local realignment Remove duplicate reads Filter low-quality reads Recalibrate base qualities Read trimming

Quality Score Fastq: raw reads with per-base quality scores Quality = Phred score + 33 (so that all characters are printable) Q= -10 log P (P= base-calling error probability) Q=10 error rate 10% Q=20 error rate 1% Q=30 error rate 0.1% etc..

Quality Control Tool: FastQC

Sequencing Depth Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data. BMC Genomics 2012, 13(Suppl 2):S6

Tools Popular Tools: - SAM Tools Mpileup (practical) - GATK Unified Genotype Caller (practical extra part) - FreeBayes (practical extra part) - MAQ - Varscan2 All available in Galaxy Tool Shed Always a trade-off between sensitivity and specificity; false positives and false negatives

Practical Raw data (fastq files) QC with FastQC Map with BWA Visualize with Trackster Call Variants with Mpileup Annotate variants with ANNOVAR Time permitting: Call Variants with FreeBayes and GATK Unified Genotyper and compare the three callers

Practical Session Learn by doing it yourself! Servers: galaxy-training1.trait-ctmm.cloudlet.sara.nl galaxy-training2.trait-ctmm.cloudlet.sara.nl galaxy-training3.trait-ctmm.cloudlet.sara.nl .. Log in to your account All handouts and slides can be found under Shared Data → Data Libraries Manual: [Course Manual] EMC Galaxy Training 2: Introduction to Galaxy.pdf