What should a bioinformatician know about DNA sequencing, and why?

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Quality Control of Illumina Data Mick Watson Director of ARK-Genomics The Roslin Institute.
Initial set-up and use of OAS for Administrators/Coordinators.
MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio.
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
IMGS 2012 Bioinformatics Workshop: File Formats for Next Gen Sequence Analysis.
PDCB BioC for HTS topic Understanding the tech. 02 LCG Leonardo Collado Torres September 2 nd, 2010.
Copyright © 2009, Biddle Consulting Group, Inc. 1 Using the Export Wizard Training Presentation Click on the screen or press the right arrow key (  )
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
SOLiD Sequencing & Data
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.
Quiz Grades WebCT 6. While quizzes, surveys and self- tests are created in the Build Tab, Quiz grading functions are handled through the Teach Tab. Click.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Regression testing Tor Stållhane. What is regression testing – 1 Regression testing is testing done to check that a system update does not re- introduce.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
National Center for Genome Analysis Support: Carrie Ganote Ram Podicheti Le-Shin Wu Tom Doak Quality Control and Assessment.
Introduction to next generation sequencing Rolf Sommer Kaas.
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
Introduction to Short Read Sequencing Analysis
File formats Wrapping your data in the right package Deanna M. Church
Giuseppe D'Auria Norwich September 2014 FISABIO, Valencia Introduction into the processing of raw data.
Selecting, Formatting, and Printing a finished Report…….
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
Quick introduction to genomic file types Preliminary quality control (lab)
DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
Avoiding the Pitfalls Good Practice in creating e-Learning Resources.
Quality Control Hubert DENISE
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
1.Introduction to SPSS By: MHM. Nafas At HARDY ATI For HNDT Agriculture.
STEP BY STEP INSTALLATION By Eng. BASSEM ALSAID. Step 1: Boot from windows server 2008 installation DVD, windows will load needed files for starting installation.
Page Layout You can quickly and easily format the entire document to give it a professional and modern look by applying a document theme. A document theme.
Sequence File Formats.
De Novo Genome Assembly - Introduction
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Tutorial Revision: Vlookup, Pivot Table 1. Text Functions 2. Data Input Forms 3. Descriptive statistics 4. Histogram 5. What-If-Analysis.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Third Generation Sequencing. Today Illumina – Solexa sequencing technology 454 Life sciences – 454 sequencer Applied Biosystem – SOLiD system Tomorrow.
Canadian Bioinformatics Workshops
Introduction to Illumina Sequencing
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Sequencing technologies
How to assign a test Project Name: How to assign a test Description:
NGS Analysis Using Galaxy
Data Validation and Protecting Workbook
Sequencing technology and assembly
The FASTQ format and quality control
EMC Galaxy Course November 24-25, 2014
Introduction into the processing of raw data
Database Design and Development
B3- Olympic High School Bioinformatics
Quicken oops something went wrong
Python Lesson 6 Mr. Kalmes.
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
2nd (Next) Generation Sequencing
Quality control for Sequencing Experiments
ChIP-seq Robert J. Trumbly
BF nd (Next) Generation Sequencing
Additional file 2: RNA-Seq data analysis pipeline
Canadian Bioinformatics Workshops
Python 4 and 5 Mr. Husch.
Additional file 3 >HWI-EAS344:7:70:153:1969#0/1 Length = 75 
Presentation transcript:

What should a bioinformatician know about DNA sequencing, and why?

Update this table: remove SOLiD, add Life Technologies Ion Proton (PGM), Illumina MiSeq Update all with latest info on read length

What are the error types and rates of the different platforms?

Quality scores Phred Q = -10 log 10 (e) Quality scoreProb wrong base callAccuracy of base call 101/1090% 201/10099% 301/ % 401/10, % 501/100, %

Wikipedia.org

FASTQ format 4 lines, sequence + quality (+optional description) GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + optional repeat of line 1, often left as just the + character to save space !''*((((***+))%%++)(%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 But beware! At least 3 different FASTQ file standards, indistinguishable in format, but incompatible with each other Wikipedia.org

FASTQ variants NameASCII range, offsetQ score typeQ score range Sanger standard; fastq-sanger , 33PHRED0 to 93 (raw 0-40) Solexa/Illumina <1.3 fastq-solexa , 64Solexa-5 to 62 (raw -5-40) Illumina 1.3+ fastq-illumina , 64PHRED0 to 62 (raw 0-40) Illumina , 64PHRED3 to 62 (raw 3-40) Illumina , 33PHRED0 to 93 (raw 0-41)

What use is the quality score?

What factors should be considered in the choice of a DNA sequencing platform?