Download presentation
1
GS Junior System – First Results
2
IMPORTANT NOTICE Intended Use
Unless explicitly stated otherwise, all Roche Applied Science and 454 Life Sciences products and services referenced in this presentation / document are intended for the following use: For Life Science Research Only. Not for Use in Diagnostic Procedures.
3
Hemorrhagic Fever Virus Discovery in Native Host
3
4
Hemorrhagic Fever Virus Discovery in Native Host
Darted Red Colobus monkey in the wild in Kibale National Park, Uganda Collected blood sample, isolated viral RNA/DNA Sequenced on GS Junior System Assembled using CLC genomics assembler, screened out host contigs Identified two novel SHFV (simian hemorrhagic fever virus) strains Generated near full-length viral sequences by filling in short gaps with PCR/Sanger sequencing and 3’RACE Significant findings: Not one, but TWO divergent SHFV viruses were present in one individual Red Colobus monkey is a native reservoir for these pathogenic viruses DNA was isolated from a healthy animal, demonstrating that these viruses can hide in apparently healthy individuals Consequences for human contact, spreading viruses through research colonies 4
5
Plant Pathogen Sequencing
5
6
Plant Pathogen Sequencing
Erwina amylovora, fire blight pathogen, isolated from blackberry in Illinois Commercial apple and pear blight, reported in 1790s 3.81 Mb genome, 53% GC, three circular plasmids Sequenced using 3/8 of GS FLX run and one GS Junior run (equal to four GS Junior runs) 31x coverage, 375 bp avg. read length Assembled by 454 GS De Novo Assembler into 29 contigs, gaps closed in silico using LaserGene Used GenDB to assign gene function for 3869 coding sequences Comparative genomics with related strains 6
7
Rare Variant Detection for HIV-1 Saliou et al. Antimicrob
Rare Variant Detection for HIV-1 Saliou et al. Antimicrob. Agents Chemother April 2011
8
Why Detect HIV Variants?
HIV variants or “quasispecies” can use CCR5 and/or CXCR4 cell-surface receptors to enter cells Drugs that block CCR5 receptors work only if CXCR4-binding variants are absent As a result, there are tests to be sure that there are no CXCR4 binding viral variants before administering this class of HIV drugs to an individual
9
Why use 454 Sequencing System
Why use 454 Sequencing System? Potential to deliver speed, ease of use, cost savings Current high sensitivity assays can detect viral variants at 0.3%, but are slow, expensive and difficult Current Sanger sequencing assays are rapid, cheap but cannot detect quasi-species below 10-20% Sensitivity at 0.3% can best predict treatment outcomes 454 Sequencing Systems can deliver sequencing specificity for ~25 samples in one GS Junior run
10
Experimental Design 415 base cDNA amplicon covering V3 env. region of HIV-1 Nested RT-PCR to generate amplicons with MIDs 23 individual samples obtained ~3,500 reads/sample, sequenced in one GS Junior run GS AVA software used to align to reference Processed the reads using third party prediction software Detected quasispecies to 0.6% reliably Calculated mean error rate of for pyrosequencing from control plasmids!
11
Results Detection limited by software that predicts phenotype Summary
- 84,000 reads - 23 samples - 0.6% detection limit Critical Factors - 415 bp amplicon or more reads per sample
12
First Publication using GS Junior System Data
13
Summary of Results Sequencing of MHC class I transcripts in macaques to discover all expressed transcripts from common class I haplotypes Sequenced 3 amplicons from ~440 to 620 bases Combination experiment 7 individuals on GS FLX System, 3 using GS Junior System Identified all sequences found previously Discovered 2x more haplotypes than with previous Sanger-based approach base amplicons allow resolution of haplotypes that are impossible with 190 base amplicons
14
GS Junior System Primary applications
de novo sequencing sequencing of whole microbial, viral and other small genomes Targeted sequencing Using sequence capture, PCR, amplicons, transcriptome cDNA sequencing Genotyping, rare variant detection, somatic mutation detection, disease associated genes, genomic regions Metagenomics characterization of complex environmental samples (16s rRNA and shotgun)
15
Whole Genome Shotgun Sequencing Sequencing of three representative bacterial genomes
System GS FLX GS Junior Organism E. coli K-12 T. thermophilus C. jejuni Genome Size (in Kb) 4563 2120 1600 Avg. Contig Size (in Kb) 39 58 44 53 49 46 N50 ContigSize 84 112 121 115 95 Largest Contig 209 352 474 578 304 173 Number Of Contigs 78 48 40 33 35 The table shows genome assembly comparisons at 25x coverage using GS FLX Titanium and GS Junior Titanium reads. E. coli is a balanced genome, T. thermophilus is a GC rich genome, C. jejuni is a AT rich genome. Although the specific numbers vary, assemblies are comparable. Genome assembly is a dynamic process, even multiple assemblies of the same reads can yield a similar level of variation as seen here between the two systems. Overall, both systems yielded draft quality assemblies with few, large contigs allowing gene prediction and finding. Most comparative genomics can be accomplished on genomic assemblies of this quality. Breaks in the genome tend to center around large repeat regions like rRNA that are in excess of 3,000 bases. Often these types of breaks can be resolved using the appropriate paired end strategy which was not included in these assemblies. We will have paired end data in the near future! de novo Assemblies at 25x coverage using GS Junior and GS FLX Titanium reads
16
Data from GS Junior System Shotgun Runs Variety of different microbes, early access site data
Whole genome shotgun data from one of the GS Junior System’s early access sites. They were sequencing different microbial genomes over the course of a few weeks. 3kb paired end- 1M base genome, 1 run, one scaffold
17
Read Length One GS Junior System run produces reads from or more in length Average is in base range Most reads are in the base range Number of reads Readlength (bases)
18
CFTR Exon Resequencing on GS Junior System
Experimental design: 11 Coriell samples with known mutations in CF gene Each sample was MID-labeled (11 MIDs) Amplified all 27 coding exons with 34 amplicons Mixed 11x34 = 374 amplicons Sequenced in 1 GS Junior System run Average coverage 182x 96% of the reads mapped back to the CF gene region Amplicon resequencing study of individuals with Cystic fibrosis or family members of cystic fibrosis individuals. Two tubes used for multiplexing. No normalization, no optimization, i.e. amplicon representation not balanced. Graph shows varying coverage amounts for each amplicon (large range as expected without optimization). Optimization of PCR conditions would be expected to yield more uniform coverage. Results showed no drop-outs. All amplicons were represented, even without any upstream efforts into optimization. Average 182x coverage. All genotypes called (even the amplicon coverage, 27x, is more than sufficient for genotype calling) Coverage graph: range x Since multiplex PCR reactions could not be normalized, PCR efficiency dictated the coverage levels for each amplicon
19
CFTR Variant Detection by GS Junior System
Heterozygous AVA output – showing 5 of 11 samples vs. variants discovered ΔF508: known, phenotype-associated CFTR mutation Sizes of actual amplicons Snapshot of AVA software showing 5 samples and 4 of the different types of variants found. DF508 is very well known, well characterized CF variant– a codon deletion. Variant is not present in the first two samples. Other three samples are heterozygous (frequency of variant reads close to 50%) Graph shows the range of sizes of amplicons included in the experiment–Nicely shows the long amplicons.
20
GS Junior and GS FLX reads are equivalent CFTR Variant Detection
R668C known, phenotype-associated CFTR mutation Synonymous same mutation detected in two separate, overlapping, amplicons Comparison slide showing GS Junior Titanium and GS FLX Titanium Chemistry. Same chemistry! Both systems made the same calls, similar results.
21
GS Junior Haplotyping of HLA Loci
Read length and clonality critical for resolution of individual haplotypes- sequencing covers multiple alleles in each clonal read! The longer the read, the better haplotype discrimination- below 200 bases=very poor =poor =good =excellent Allele 1 HLA genotyping using amplicon sequencing. The long read lengths AND the clonality of each read are necessary to haplotype the mixture of HLA alleles. In other words, to clearly and unambiguously assign the two alleles that make up a heterozygotic individual to the correct parental strand. AVA screenshot on left shows the two alleles of the heterozygote interleaved together into a complex mixture. The mixture of the data is similar to what you would get from Sanger sequencing. However, with one click of the mouse (shows screen shots on the right) you can separate the data unambiguously into two individual alleles. This simple and straightforward resolution of the two alleles is made possible by the clonal nature of the 454 Sequencing System – all reads are derived from a single molecule, not a complex mixture of molecules that you see in Sanger sequencing. Allele 2
22
Studying SIV using GS Junior System
Ben Burwitz in Dave O’Connor’s lab, Univ. of Wisconsin Follow changes in GAG gene as virus evolves to evade immune response Find genome-wide mutations in viral pool Simian Immunodeficiency Virus Rhesus macaque
23
Amplicon Sequencing- Basic Amplicon 454 amplicon design using tailed primers
MID key A B 454 Titanium A-primer (21 bp) Sequence of interest Locus-specific PCR amplification bp emPCR Amplification and sequencing 454 Titanium B-primer (21 bp) The region of interest is amplified using PCR primers containing target-specific sequences and GS A and B sequences Directionality is maintained, AVA can be used for analysis. Advantage of using AVA is that it is designed to find rare variants in mixed pools of samples. After amplification, emPCR and high-throughput sequencing of single molecules can be performed. If MID barcodes were added, then amplicons can be mixed together for emPCR and sequencing Amplicon fusion primers can be ordered from Integrated DNA Technologies for Titanium sequencing chemistry, they can be used for both the GS Junior and the GS FLX Systems ( Long reads required to sequence through the locus specific primer, enable haplotyping over longer distances 100s to 1000s of amplicon clones sequenced simultaneously
24
Amplicon Sequencing- Long Range Amplicons Using long range amplicons for whole viral or other genomic region sequencing Locus-specific long range PCR amplification 1,500-15,000 or more bp Sequence of interest Shear to bases using gDNA protocol Ligate sheared amplicon into 454 primers using gDNA protocol MID key A B 454 Titanium A-primer (21 bp) 454 Titanium B-primer (21 bp) B A B A Advantage of using long range amplicons is that they cover a large amount of target sequence and have less up front cost than sequence capture. Using MIDs allows multiple samples to be sequenced together. All amplicons from one sample can use the same MID. Variant calling can be done using the GS Mapper- high confidence variants will be in the HCDiffs file. B A emPCR Amplification and sequencing
25
SIV Genome Sequencing SIV Proteome SIV Genome (Viral RNA) 0bp 10535bp
Direct Amplicon Full Genome Using the direct amplicon approach, a probe for a portion of the GAG gene was designed using the recommended 454 amplicon method, in order to find mutations in the GAG protein structure that change during infection. Each read focuses on the specific region of interest, and the variation of a population can be sequenced to great depth, and multiple samples can be barcoded with MIDs and sequenced together in one run. Using the full genome approach, long range amplicons were designed that covered the entire genome with small amounts of overlap. Because the amplicons were too long to sequence completely, they were sheared into fragments using the shotgun protocol and mapped back to the genome to find variants. * Slide courtesy of U Wisconsin
26
SIV Genome Sequencing – Direct Amplicon
354bp # of Samples - 28 Total Reads - 82,079 Median Length - 356bp Number of Reads Read Length (bp) * Slide courtesy of U Wisconsin
27
Viral Mutations in the Structural SIV Protein Gag evolve to escape immune response
Mutations in the SIV protein Gag affect viral fitness- Gag protein is the ‘particle making machine’ * Slide courtesy of U Wisconsin
28
Viral Mutations in the Structural SIV Protein Gag evolve to escape immune response
Multiple mutations evade the immune response and compensate to restore activity Mutations in the SIV protein Gag affect viral fitness- Gag protein is the ‘particle making machine’ * Slide courtesy of U Wisconsin
29
SIV Genome Sequencing SIV Proteome SIV Genome (Viral RNA) 0bp 10535bp
Direct Amplicon Full Genome * Slide courtesy of U Wisconsin
30
SIV Genome Sequencing - Amplicons
~2kb ~2kb ~2kb ~2kb Total Reads - 59,097 Median Length - 321bp Number of Reads The way the amplicons were sheared was enzymatic and resulted in many short reads. However, there was a major size peak at 500 bases. Read Length (bp) * Slide courtesy of U Wisconsin
31
SIV Genome - Base Pair Position
SIV Full Genome Sequencing Coverage Number of Reads The entire genome was covered in a GS Junior System run of the four long range amplicons SIV Genome - Base Pair Position * Slide courtesy of U Wisconsin
32
454 Sequencing System vs. Sanger
Animal 1 Animal 2 Animal 3 There are many low level mutations in the pool of viruses. However, low level mutations cannot be detected using Sanger sequencing. Therefore, Sanger sequencing misses most of the mutants in the viral pool. Many more mutations are discovered by 454 Sequencing Systems, because each 454 read is a clone of a single viral sequence. Every 454 read is sequenced individually on a single bead, and the proportion of mutations discovered is proportional to the depth of the sequencing (number of reads sequenced). * Slide courtesy of U Wisconsin
33
Ben’s Conclusions GS Junior System detects low frequency genetic variants that are missed by traditional Sanger sequencing A bench-top GS Junior System improves turn around time and can be readily adapted to small academic lab settings Acknowledgements O’Connor Lab Watkins Lab Jonah Sacha Matt Reynolds Nick Maness Nancy Wilson David Watkins Ben Burwitz Roger Wiseman Shelby O’Connor Dawn Dudley Julie Karl Simon Lank Charlie Burns Ericka Becker Ben Bimber Dave O’Connor
34
Inherited Disease Looking for rare mutations in affected individuals
Target gene from GWAS study Two PCR approaches- long range PCR and short amplicon MID sequences used to distinguish individuals in a pool Target Gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 MID 1 MID 2 MID 3
35
Long Range Amplicon Sequencing Results
Shotgun processing Run Reads Average Read Length (bases) Total Bases # of Sample Sequenced * 1 96,947 385 37,363,295 8 2 134,252 389 52,263,214 9 3 149,809 417 62,540,439 10 4 143,498 59,930,800 5 151,370 394 59,732,290
36
Small Amplicon Sequencing Results
Amplicon Processing Run Reads Average Read Length (bases) Total Bases # of Sample Sequenced 1 72,191 322 23,289,440 11 2 75,424 313 23,664,312 12 3 84,441 325 27,443,160 4 101,395 339 34,394,604 5 60,243 435 26,248,268 6 25,884 374 9,690,154 7 70,406 424 29,905,454 8 71,587 434 31,064,908
37
Amplicon Coverage- Accurate Pooling Required!
Poorly Pooled Amplicon Amplicons Poor Performing Amplicon Sampling Variability Individual Samples First run of amplicons yielded varying representation of each amplicon. Any cell with fewer than 50 reads is highlighted pink. Issues were with amplicons that did not amplify well, suboptimal quantitation and pooling, and standard (Gaussian) sampling variability. Coverage requirements vary according to tolerance for each of the sources of variability. Poor performing Sample
38
Verification of Novel Mutations
Sample ID ASP Result GS Junior Agreement 1 Heterozygous 50.94% / 106 Y 2 52.5% / 200 3 39.33% / 178 4 Homozygous 94% / 100 5 48% / 125 6 47.06% / 221 7 99.18% / 243 8 46.71% / 167 9 46.07% / 191 10 54.17% / 24 11 97.57% / 288 12 42.33% / 163 13 41.88% / 191 14 47.02% / 151 15 48.07% / 441 16 17.86% / 252 N 17 50.32% / 157 18 16.18% / 272 19 14.85% 330 Allele-Specific PCR: Selective PCR amplification of one of the alleles to detect Single Nucleotide Polymorphism (SNP). Selective amplification is usually achieved by designing a primer such that the primer will match/mismatch one of the alleles at the 3'-end of the primer. Wild-Type Primer Set Assay Primer Set Genotype Sample 1 Amplified Not Amplified Wild Type Sample 2 Heterozygous Sample 3 Homozygous
39
Pathogen Discovery on the GS Junior System
Case from Sandton, South Africa Infected paramedic during transfer, nurse at hospital, cleaning staff, and nurse of paramedic- 4/5 did not survive Serum and tissue samples from victims were subjected to unbiased pyrosequencing, yielding within 72 hours of sample receipt, multiple discrete sequence fragments that represented approximately 50% of a prototypic arenavirus genome. Recapitulated GS FLX System study in single GS Junior System run 250 Hits to LuJo Virus covering 57% of the L-segment and 79% of the S-segment
40
Coming Soon GS Junior System Publications in
Metagenomic characterization of human environments Whole Genome Sequencing of bacterial pathogens Rare variant discovery in human disease- GWAS follow up experiments Viral pathogen sequencing Many more!
41
GS Junior System First Results Disclaimer & Trademarks
For life science research only. Not for use in diagnostic procedures. Trademarks: 454, 454 LIFE SCIENCES, 454 SEQUENCING, EMPCR, GS FLX, GS FLX TITANIUM, GS JUNIOR and SEQCAP are trademarks of Roche. Other brands or product names are trademarks of their respective holders.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.