Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
Classic chain-terminator sequencing Dye chain-terminator sequencing Next-generation sequencing
Next-gen sequencing principle › Massive parallel › Add ACTGs › Catch a signal
Roche/454 GS-FLX+ (‘454’) › Pyrosequencing problems with homopolymers (e.g. AAAAAA) › Long-read sequencing: bp › Variable sequencing length › 1 million reads/run 1Gb/run › Sequencing speed: ~ 1 day/run › Next-next generation: IonTorrent PGM/Proton
Illumina › Sequence by synthesis › Short-read sequencing: 36, 72, …, 150bp › Fixed sequencing length › 1 billion reads/run 100Gb/run (= 33 x human genome!) Sequencing speed: 3 day – 10 days ~ length Solid › Short-read sequencing (similar to Illumina)
454 Illumina
Price per run: $10000/run Price per machine: $ › Supporting IT hardware › Peripheral devices such as fragmentation instrument, PCR equipment … › Negotiating power… Use service centers! › Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI … › No overhead cost, no maintenance etc. › Cheaper
Next-generation sequencing has become 2 nd generation sequencing Next-next-generation sequencing is almost there: 3 rd generation sequencing › Helicos: True Single Molecule Sequencing › IonTorrent/Life: Cheap and fast › Nanopore: Unlimited read size ›…›…
Evolution sequencing technology goes hand in hand with evolution of › IT infrastructure/hardware › Analysis software Hardware › 1 Illumina run ~ 100Gb text-file ~ 5million page book › Processing power/storage are an issue! Software › Mapping to a human genome: ‘couple of hours’
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
Prokaryotic genomics 101 › Prokaryotes = bacterias + archaea › Prokaryotic genomes Large circular genome (0.5 – 10 Mb) ‘chromosome’ Small plasmids ( kb) (virulence factors, antibiotics resistance …) (Almost) no introns Easy ORF annotation
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
1953: Watson/Crick discover DNA helix 1977: First complete genome bacteriophage φX174 1995: First genome of free-living organism H. influenza 2001: First draft of the human genome 2006: >200 complete bacterial genomes 2012: An uncountable number of bacterial genomes have been sequenced using next-gen sequencing
Complete bacterial genomes used to be › Expensive › Difficult to obtain › ‘Nature’ or ‘Science’ work › Remained complex until the invention of next-generation sequencing
Using next-generation sequencing, de novo sequencing has become › Relatively easy › Relatively cheap › Routine research Already >10 complete bacterial genomes published in 2012 › More than just an assembly!
Practical 1. Get some DNA from an isolated species of interest 2. Sequence: long or short reads (1-10 days) 3. Obtain your sequences 4. Assemble (1h) Pure de novo assembly Guided assembly 5. Annotate the genome (days-weeks)
Assembly: Multiple ‘short’ reads 1 long sequence Existing software › Velvet › SSAKE › Newbler › SSAKE › … Source: Nature 2009, MacLean et al.
Relatively cheap › Sequencing cost: depending on coverage Illumina, 30x, 5Gb genome: $10-$100 454, 30x, 5Gb genome: $1000-$5000 › Equipment IT infrastructure, sequencing equipment, people … Relatively easy › Need for IT support › No out-of-the-box standard solution for everything › Several different software packages for assembly
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
De novo genome assembly › Study of 1 single species › Need for species isolation Metagenomics analysis › Study of a community of species › No need for isolation (culturing bias!) › Study the collective gene pool and function of the community/ecology › No need for individual functions
Practical 1. Get bacterial DNA or RNA from a sample Soil Gut/Fecal Ocean water (e.g. Craig Venter) … 2. Sequence: long or short reads (1-10 days) 3. Obtain your sequences 4. Map on a database of known genes (1 day) 5. Annotate/analyse the community (weeks)
2010: Giant Panda genome (2 nd carnivore) › No umami taster receptor -> no meat affinity › The panda is more a dog than a bear › The panda is a carnivore eating bamboo!
Still 2010 !: Panda ‘microbiome’ Gut microbiome of the panda reveals the presence of bamboo/cellulose degrading pathways
A clinical example: gut microbiome can predict diabetes and malnourishment Plos One (2011), Brown et al.Plos One (2010), Valladares et al.Gut Pathology (2011),Gupta et al.
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
Classical SNP analysis - practical 1. Design PCR primers 2. Generate amplicons 3. Re-sequence using long read sequencing Conserve ‘SNP blocks’ 4. Detect SNPs 5. Correlate SNPs to drug resistance, severity of symptoms …
Amplicon resequencing is the same for human, prokaryotic, viral analyses Many standardized out-of-the-box solutions available Very simple analysis Watch out for the overkill… › Don’t use a bazooka to kill a fly! › Throughput can be too high
Profile the coding region of hepatitis C Lauck et al. 2012
Use next-generation sequencing to predict the optimal HIV therapy Thielen et al. 2012
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
Imagine the following research questions › Which (known) species/groups are present in a certain sample › Does this composition alter given a certain treatment, change of conditions, patients etc. No need for de novo genome sequencing No metagenomics: species instead of functions
Prokaryotes have the gene 16S rDNA, coding for ribosomal RNA The 16S rDNA region is 1.5 kb long 16S rDNA is specific for each species/strain Theoretical: 4 1,500 = possibilities In practice: 16S rDNA sequence known for millions of species
16S rDNA can be isolated in different species using universal PCR primers › Isolate/amplify different regions using the same primers Compare the isolated sequences against a database of known sequences
Practical procedure 1. Sample an environment and isolate DNA 2. Do a universal PCR amplification 3. Sequence using long read sequencing: the longer the better! 4. Obtain sequences 5. Map sequences against a reference database 6. Annotate the data
Example: The Antarctica project › Which parameters determine the composition of bacterial communities in antarctical lakes? › 20 different samples/lakes › Sequence 16S rDNA genes › 1 x 454 run (1 million 500bp sequences) › Map all sequences back to the RDP database
Analyse the data using computing power › Compare different locations Is species A present in location1, location2,… › Assess the distribution in a single location How dominant is the most dominant species in location 1 How many species are in location 1 … Visualize !
Analyse different samples on different taxonomic levels › Include taxonomic tree of life of bacterias › Use a ‘taxonomy browser’
Analyse a single location
Compare different locations
AnalysisLab work difficultyAnalysis difficulty De novo genome++ (isolate)+ Metagenomics++++ (pathways etc.) SNP+++ (design primers)++ (correlate) Species quantification++ (universal primers)++
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species quantification Viral profiling › De novo genome sequencing
Viral profiling › Viral profiling = prokaryotic profiling, but… Cheaper Faster Easier › De novo genome sequencing = OK › Don’t spend $ on a 100kb genome! › Multiplexing/pooling capacity is limited!
Watch out for the overkill › An illumina run can be split into 8 lanes › >20 samples per lane can be combined Still >100Mb per sample…
Thanks for your attention !