DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

Slides:



Advertisements
Similar presentations
The Past, Present, and Future of DNA Sequencing
Advertisements

NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
IMGS 2012 Bioinformatics Workshop: File Formats for Next Gen Sequence Analysis.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Differentially expressed genes Sample class prediction etc.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
EBI is an Outstation of the European Molecular Biology Laboratory. CRAM: reference-based compression format developed by Vadim Zalunin.
High Throughput Sequencing
SOLiD Sequencing & Data
High Throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Before we start: Align sequence reads to the reference genome
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
File formats Wrapping your data in the right package Deanna M. Church
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Next Generation Sequencing. Overview of RNA-seq experimental procedures. Wang L et al. Briefings in Functional Genomics 2010;9: © The Author.
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Quick introduction to genomic file types Preliminary quality control (lab)
Organizing information in the post-genomic era The rise of bioinformatics.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”
The cSRA file format. example.csra -A cSRA-file contains a serialized file-structure -It is a read-only archive file format, similar to a tar-file -The.
Compression by Reference a rational approach to storing aligned sequence data.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
Canadian Bioinformatics Workshops
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
FILE COMPRESSION Lossy vs Lossless. Why compress a file? To save storage space. To speed up data transmission.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
What should a bioinformatician know about DNA sequencing, and why?
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Using command line tools to process sequencing data
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Stubbs Lab Bioinformatics – 5 Review tophat, alignment summary and htseq-count exercises: MDS plots and Differential expression We want to be able to.
NGS Analysis Using Galaxy
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
First Bite of Variant Calling in NGS/MPS Precourse materials
The FASTQ format and quality control
2nd (Next) Generation Sequencing
ChIP-Seq Data Processing and QC
Maximize read usage through mapping strategies
Additional file 2: RNA-Seq data analysis pipeline
Canadian Bioinformatics Workshops
BF528 - Sequence Analysis Fundamentals
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing

DM ChurchLast Updated: 7 May Nick Loman and James Hadfield

DM ChurchLast Updated: 7 May 2012

DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 (Figure 3)

DM ChurchLast Updated: 7 May 2012

DM ChurchLast Updated: 7 May 2012 Bench work to build libraries and sequence Clean up and QA reads Alignments to Genome or Transcriptome Analysis of Alignments

DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 Sample Contamination Library chimeras Sample mix-ups Tumor-normal switches Run quality

DM ChurchLast Updated: 7 May 2012 Koboldt et al, (Fig 4A)

DM ChurchLast Updated: 7 May 2012

DM ChurchLast Updated: 7 May 2012 Chor et al., 2009

DM ChurchLast Updated: 7 May 2012 CCL Bio

DM ChurchLast Updated: 7 May 2012 GCTACGGCATTCAGGCATCAGGCATTAGCAG GGCATTCAGGGATCAGGCATTAGC-> <-CATGGCATTCAGGGATCAGGCATT <-GCCATGGCATTCAGGGATCAGGC CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATTAGC-> CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATT-> <-GGATCAGGCATTAGCAG <-GATCAGGCATTAGCAG <-GGATCAGGCATTAGCAG

DM ChurchLast Updated: 7 May 2012 High Coverage: qualities may not be needed

DM ChurchLast Updated: 7 May 2012 Low Coverage: qualities are important

DM ChurchLast Updated: 7 May 2012 Custodia-Lora et al., 2003

DM ChurchLast Updated: 7 May 2012 FASTQ Example FASTQ example from: Cock et al. (2009). Nuc Acids Res 38: For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from Solexa quality scores have to be converted to PHRED quality scores.

DM ChurchLast Updated: 7 May 2012 SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format – SAM is the output of aligners that map reads to a reference genome – Tab delimited w/ header section and alignment section Header sections begin (are optional) Alignment section has 11 mandatory fields – BAM is the binary format of SAM

DM ChurchLast Updated: 7 May Mandatory Alignment Fields

DM ChurchLast Updated: 7 May Alignment Examples Alignments in SAM format

DM ChurchLast Updated: 7 May 2012 chr nsv chr nsv chr nsv chr nsv chr nsv chr nsv chr nsv chr chr1: chr chr1: chr chr1: chr chr1: chr chr1: chr chr1: chr chr1: chr chr1: Valid BED files

DM ChurchLast Updated: 7 May 2012 GTF

DM ChurchLast Updated: 7 May 2012 ##gff-version 3 ##gvf-version 1.02 ##species ##genome-build NCBI MGSCv36 ##assembly-name MGSCv36 ##assembly-accession GCF_ ##file-date # Study_accession: Combined studies on MGSCv36 # Display_name: Combined studies on MGSCv36 # Study_description: Combined studies on MGSCv36 chr1dbVarcopy_number_variation ID=nsv433533;Name=nsv433533;Start_range=., ;End_range= ,. chr4dbVarcopy_number_variation ID=nsv433534;Name=nsv433534;Start_range=., ;End_range= ,. chr9dbVarcopy_number_variation ID=nsv433535;Name=nsv433535;Start_range=., ;End_range= ,. chr17dbVarcopy_number_variation ID=nsv433536;Name=nsv433536;Start_range=., ;End_range= ,. chr17dbVarcopy_number_variation ID=nsv433537;Name=nsv433537;Start_range=., ;End_range= ,. chr17dbVarcopy_number_variation ID=nsv433538;Name=nsv433538;Start_range=., ;End_range= ,. GVF format

DM ChurchLast Updated: 7 May Derived data

DM ChurchLast Updated: 7 May 2012 Derived data

DM ChurchLast Updated: 7 May 2012 Actual data

DM ChurchLast Updated: 7 May 2012 Getting exponential growth under control

DM ChurchLast Updated: 7 May 2012 Trace Organization seq1 seq2 FASTA Quality Chromatogram Experimental info Sample FASTA Quality Chromatogram Experimental info Sample SRA Organization Experiments Samples Sequences and Qualities

DM ChurchLast Updated: 7 May 2012 Era of NGS Explosion FASTQ Era Bits/Base Era As of April 10, 2012 SRA contains less bytes then bases

DM ChurchLast Updated: 7 May 2012 New Cycle Decision Circle What data series to store Redundancy removal Normalization Lossy vs Lossless Compression tuning Practical Application BAM and similar formats containing both raw reads and alignments become primary output of raw sequencing Increases the number of data series Compression By Reference reduces sizes of other data series New sets of tradeoffs New compression algorithms

DM ChurchLast Updated: 7 May 2012 Analyzing New Compression Method Data from 1000 Genome Project All available combinations of samples, platforms, and aligners 3114 files 27 Tb of disk space after compression BAMs from 1000 Genome Project Names are dropped after restoring mates Only sequencing quality score is saved None of non-redundant optional tags are preserved BAM treatment Occasional alignments to stretches of Ns on the reference and beyond the reference were converted to unaligned Different PCR duplicate flags for mates Correction of BAM inconsistencies

DM ChurchLast Updated: 7 May 2012 Changes To SRA Run Browser

DM ChurchLast Updated: 7 May

DM ChurchLast Updated: 7 May

DM ChurchLast Updated: 7 May

DM ChurchLast Updated: 7 May 2012 Science 1 July 2011: Vol. 333 no pp DOI: /science

DM ChurchLast Updated: 7 May 2012 Li et al., 2011, Figure 1

DM ChurchLast Updated: 7 May 2012 Li et al., 2011 Fig. 2

DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Fig 1

DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Table 1

DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 1

DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 2

DM ChurchLast Updated: 7 May 2012 Pickrell et al., 2012 Fig 1

DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 1

DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 2

DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 3

DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 4