Introduction to next-gen sequencing bioinformatics.ca Canadian Bioinformatics Workshops
Introduction to next-gen sequencing bioinformatics.ca
Module 1 Introduction to next-gen sequencing
Introduction to next-gen sequencing bioinformatics.ca Overview “next-gen” or “next-next-gen”: why are we here? What kinds of sequencing are we doing? How does DNA sequencing works? Trying to stay away from vender-specific challenges, but can we really? Where next?
History of DNA Sequencing Avery: Proposes DNA as ‘Genetic Material’ Watson & Crick: Double Helix Structure of DNA Holley: Sequences Yeast tRNA Ala Miescher: Discovers DNA Wu: Sequences Cohesive End DNA Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation Messing: M13 Cloning Hood et al.: Partial Automation Cycle Sequencing Improved Sequencing Enzymes Improved Fluorescent Detection Schemes 1986 Next Generation Sequencing Improved enzymes and chemistry New image processing Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998) , ,00 0 1, , ,000,000 Efficiency (bp/person/year) 15, ,000,000,
Introduction to next-gen sequencing bioinformatics.ca Why are we sequencing? Before Next-generation: – Reductionist perspective on life – DNA, RNA, (proteins), (populations), sampling, averages, consensus Problems: sampling, averages, consensus. After Next-generation: – We are still reductionist, but better – Genome sequence and structure – Less cloning/PCR – Single molecules (for some)
Introduction to next-gen sequencing bioinformatics.ca Basics of the “old” technology Clone the DNA. Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. Separate mixture on some matrix. Detect fluorochrome by laser. Interpret peaks as string of DNA. Strings are 500 to 1,000 letters long 1 machine generates 57,000 nucleotides/run Assemble all strings into a “whole”.
Introduction to next-gen sequencing bioinformatics.ca Sanger (old-gen) Sequencing Now-Gen Sequencing Whole GenomeHuman (early drafts), model organisms, bacteria, viruses and mitochondria (chloroplast), low coverage New human (!), individual genome, 1,000 normal, 25,000 cancer matched control pairs, rare-samples RNAcDNA clones, ESTs, Full Length Insert cDNAs, other RNAs RNA-Seq: Digitization of transcriptome, alternative splicing events, miRNA CommunitiesEnvironmental sampling, 16S RNA populations, ocean sampling, Human microbiome, deep environmental sequencing, Bar-Seq OtherEpigenome, rearrangements, ChIP-Seq
Introduction to next-gen sequencing bioinformatics.ca Differences between the various platforms: Nanotechnology used. Resolution of the image analysis. Chemistry and enzymology. Signal to noise detection in the software Software/images/file size/pipeline Cost $$$
Next Generation DNA Sequencing Technologies Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” Human Genome6GB == 6000 MB Req’d Coverage Illumina bp/read X75 reads/run96500,000100, bp/run57, GB15 GB # runs req’d625, runs/day210.1 Machine days/human genome 312,500 (856 years) Cost/run$48$6,800$9,300 Total cost$15,000,000$979,200$111,600
Next-gen sequencers read length bases per machine run 10 bp1,000 bp100 bp 1 Gb 100 Mb 10 Mb 10 Gb AB/SOLiDv3, Illumina/GAII short-read sequencers ABI capillary sequencer 454 GS FLX pyrosequencer ( Mb in bp reads, 0.5-1M reads, 5-10 hours) (10+Gb in bp reads, >100M reads, 4-8 days) 1 Mb ( Mb in bp reads, 96 reads, 1-3 hours) 100 Gb From John McPherson, OICR
2009/10 Promises? read length bases per machine run 10 bp1,000 bp100 bp 1 Gb 100 Mb 10 Mb 10 Gb ABI capillary sequencer 454 GS FLX Titanium Gb, bp reads Illumina GAII 90Gb, 175bp reads 1 Mb ( Mb, bp reads 100 Gb AB SOLiDv3 120Gb, 100 bp reads From John McPherson, OICR
Introduction to next-gen sequencing bioinformatics.ca
Solexa-based Whole Genome Sequencing Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome”
Introduction to next-gen sequencing bioinformatics.ca Illumina (Solexa)
Introduction to next-gen sequencing bioinformatics.ca Illumina (Solexa)
Introduction to next-gen sequencing bioinformatics.ca Illumina (Solexa)
From Debbie Nickerson, Department of Genome Sciences, University of Washington,
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca AB SOLiD: file management
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca AB SOLiD
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca SOLiD color space
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca Sample AB data Lab >443_1087_001_F3 T >443_1087_002_F3 T >443_1087_003_F3 T >443_1087_004_F3 T >443_1088_005_F3 T >443_1088_006_F3 T >443_1088_007_F3 T >443_1088_008_F3 T >443_1088_009_F3 T >443_1088_010_F3 T Get sequence assignment from instructor Work with people at your table. Use info from lecture notes (Panel E) BLAST sequence at NCBI What is it?
Introduction to next-gen sequencing bioinformatics.ca Module 1 lab
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca Also known as “pyrosequencing” million bp/run 10 hr run bp/read & > 1 M reads Roche / 454 : GS FLX
Introduction to next-gen sequencing bioinformatics.ca Roche / 454 : GS FLX Made for de novo sequencing. Too expensive for resequencing. For example, this platform will be used a lot by laboratories doing new bacterial genomes. Baylor Genome Center involved in Sea Urchin, Bee, Platypus genomes: They have a number of 454.
Introduction to next-gen sequencing bioinformatics.ca Roche / 454 : GS FLX
Introduction to next-gen sequencing bioinformatics.ca Roche / 454 : GS FLX
Introduction to next-gen sequencing bioinformatics.ca Roche / 454 : GS FLX
Introduction to next-gen sequencing bioinformatics.ca It’s more complicated! Get files with quality scores Get files with miss-matches Need to align them to a reference genome Multiple tools do this today … and there will be more later. What do you do? Do it all!
Introduction to next-gen sequencing bioinformatics.ca Pacific Biosystems (PacBio) July 2008
Introduction to next-gen sequencing bioinformatics.ca Pacific Biosystems (PacBio)
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca Things to keep in mind All people are learning, if you don’t know, ask, and they probably won’t know either, and you can figure it out together! The technology is changing – This workshop next year will be totally different! We can only do so much in two days – you will need to find things, find people who can help you, and you will need to teach your friends!
Introduction to next-gen sequencing bioinformatics.ca Other factors Changing technology –New and disappearing companies? Changing price structure –Cost of machine –Cost of operation (reagents/people) –Service from the company –1 machine vs (2 or 3 machines) vs 40 machines. Changing software and processing
Introduction to next-gen sequencing bioinformatics.ca OICR Informatics: servers, CPU, Storage, and Backups 14 Sequencers cluster 8 core 16 GB RAM 8 core 96 or 256 GB RAM 200 X 5 X Web Dev SVN 125 X MS- Windows 12 X 50 X local (150 GB) 10 X seq (9 TB) FC (25 TB) N-series SATA (25 TB) BlueArc SATA (1PB) SAS (40 TB) Storage Robot 800 GB/tape 12 Drives > 300 tape library Back Up 1640 cores 1259 TB
Introduction to next-gen sequencing bioinformatics.ca What have we learned? Sequencing technologies are changing fast Allowing new biology to be performed, new questions to be asked Understand the difference between some of the technologies You can work in “color space”.
Introduction to next-gen sequencing bioinformatics.ca What next?
Introduction to next-gen sequencing bioinformatics.ca Day 1
Introduction to next-gen sequencing bioinformatics.ca URLs