Proprietary Signal Generation and Imaging Photons Generated Reagent Flow PicoTiterPlate Wells Sequencing By Synthesis 1600K field of addressable wells Photons Detected by Camera Spectral Instruments Series 800 Camera with Fairchild Imaging LM485 CCD (4096x4096, 15 μm pixels), directly bonded to a 1:1 imaging fiber bundle; cooled to -25 °C PTP in direct contact with imaging fiber bundle (no alignment or focusing issues); NA ~ 0.75 Full-frame imaging mode; read-out during wash (dark portion of flow cycle)
Proprietary Raw data - series of images dNTP Base Addition T A G C T Image Processing Data converted into flowgrams”
Proprietary Key sequence = TCAG for well identification and calibration TACGTACG Flow Order 1-mer 2-mer 3-mer 4-mer TTCTGCGAA Signal-Processing & Basecalling Image Data → Signal processing Pipeline → Flowgrams → Quality Filtering → HQ Reads → Basecalling → HQ Bases → Mapping & Assembly
Proprietary Mapping in Flowgram Space Reference Chromosome Flowgram Fragment Flowgrams (R N ) …,1, 3, 1, 0, 0, 2, 2, 0, 0, 1, 2, 3, 0, 1, 0… …, 1.00, 3.14, 0.15, 0.20, 0.21, 1.84, 1.95,… Re-sequencing and find variants to the reference genome
Proprietary De Novo Assembly in Flowgram Space Draft sequences of new genomes (species that have not been sequenced before) Fragment Flowgrams (R N ) Overlap to form contigs
Proprietary Potential Sources of Error PPi / ATP produced in DNA well appears in down-stream non-DNA wells due to convection / diffusion; leads to signal contamination DNA Well Flow PPi True signal False signal Non-DNA Well Chemical Cross-Talk Optical Cross-Talk Light penetrates to adjacent fibers; leads to signal contamination DNA Well True signal False signal Non-DNA Well
Proprietary Filtering removes signal cross-contamination Pixel OriginalFiltered (51x51) kernel 2D intensity contour Signal-Processing Solution
Proprietary 42 Flow Cycles ~ 30MB per run ~ 300,000 reads ~ 100 bp per read > 50% wells error-free ~ 1 % individual read error Typical GS20 Run Results
Proprietary Consensus Accuracy > % when 10x over-sampling Read #: FlowSignal Read 1: 2.52 Read 2: 1.95 Read Read 4: 1.53 Read 5: 1.32 Read 6: 2.14 Read 7: 2.06 Read 8: 1.85 Read 9: 2.21 Read 10: 2.17 Consensus (mean): GTGCGCGCGCGGGACTAATCCCGGTTCGCGCGTCGGGCATGACACGCAAC- 2 Example Ref / assembled genome: 10 reads aligned to this position
Proprietary De Novo Assembly Results 4 runs of GS20 (E. Coli 4,639,675 bp) Each data point represents 1/2 GS20 run 1 run2 runs3 runs4 runs
Proprietary Improved fluidics for faster reagent delivery Firmware control of reagent delivery & camera timing On-board reagent dilution Optimized biochemistry Improved algorithms with corrections for –Crosstalk (for higher densities) –Signal droop & Phasing Numerical filter for improved rejection of low quality reads At least 400K (200K) reads of avg. 250 bp (100 bp) 8 hours (5 hours) run time Single read avg accuracy >99.5% (99%) over 200 (100) bases Consensus read accuracy > 99.99% Avg yield from single run ~ 100 Mb (20-30 Mb) Genome Sequencer FLX (GS20 performance) Recently Launched Next Generation Sequencing Platform in Q1, 2007 in collaboration with Roche Diagnostics
Proprietary Fluidics Modification – Air Plug Insertion G A C Air Camera cover De- bubbler t MM air plug in tube air bubbles removed at debubbler Concentration Profile with & w/o air bubble
Proprietary E. Coli (50% GC) C. jejuni (35%GC) T. thermophilus (71% GC) Whole Genome Sequencing Results from GS FLX
Proprietary All blind-filtered reads (no reference genome required) E. coli run #1 E. coli run #2 E. coli run #3 E. coli run #4 T. thermophilus C. jejuni Reported in Nature 2005 GS20 Q Observed Individual Read Accuracy
Proprietary C. jejuniT. thermophilusE. coli (GS20) Genome Size:1,641,4812,127,5754,639,675 Number of Runs:½½13 Assembly Contigs: Assembly Cover:98.31%98.15%97.61%97.46% Overall Accuracy:99.996%99.991%99.998% Avg. Contig Size:64.6 kb39.8 kb43.3 kb32.4 kb N50 Contig Size:116 kb82.1 kb105.5 kb67.2 kb Largest Contig:481 kb383.0 kb204.7 kb164 kb Newbler™ Assembly Results from GS FLX