Single Cell, RNA, & Chromosome Sequencing Technologies George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies Thanks to: NCI/NIH HMS-CGCC AppliedBiosystems-Agencourt, Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices
Muliplex Polony Summary Technologies for selecting genomic regions Mbp scale for rearrangements RNA tags & spliceforms 1 to 200 bp scale for SNPs & exons (1%) Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements Detection of rare mutations (e.g. drug resistance alleles) 60 million reads per run
Selective genome sequencing Numerous (100K) Small Regions (exons & point mutations) PCR : 21 Mbp >$250K Sjoblom et al (2006) Science Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Hardenbol et al. Genome Res. 2005 Feb;15(2):269-75. Analyzing genes using closing and replicating circles. Nilsson et al. (2006) Trends Biotechnol 24:83. One large region Single molecule amplification 1 to 4 Mbp Zhang et al. 2006 Nature Biotech. 24:680 Direct genomic [BAC hybridization] selection. [50% pure] Bashiardes et al (2005) Nat Methods 2: 63.
Selective genome sequencing Two ways to capture alleles from genomic ss-DNA In vitro Paired-tag library Gap fill Cleave & ligate Red=Synthetic; Yellow=genomic Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83. How do we optimize >100K 100mers ? Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson
How? 10 Mbp of oligos / $1000 chip ~1000X lower oligo costs Digital Micromirror Array 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid 12K Combimatrix/Codon Electrolytic 44K Agilent Ink-jet standard reagents 380K Nimblegen/GA Photolabile 5'protection Amplify pools of 50mers using flanking universal PCR primers & 3 paths to 10X error correction Tian et al. Nature. 432:1050; Carr & Jacobson 2004 NAR; Smith & Modrich 1997 PNAS
Padlock, Molecular Inversion Probes (MIPs) CG to CA,TG 35% of germline, 44% of colorectal cancer mutations (not restricted to single nucleotides nor common polymorphisms) R Universal primers Optional multiplex tag L Genomic DNA CG CA TG Alternative alleles Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson (10K to 1M 100-mer probes per pool -- see Kun Zhang’s poster) Vitkup, Sander, Church The Amino-acid Mutational Spectrum of Human Genetic Disease. Genome Biol. 4: R72. (CG to CA, TG)
Sequencing genomes from single cells via polymerase clones -- Plones (single chromosome, cell , RNA or particle) Zhang, et al. (2006) . Nature Biotech. June ’06 1) When we only have one cell as in Preimplantation Genetic Diagnosis/Haplotyping (PGD/PGH) or environmental samples (poor lab growth) 2) Candidate chromosome region sequencing 3) Prioritizing or pooling (rare) species based on an initial DNA screen (metagenomics) 4) Multiple chromosomes in a cell or virus 5) RNA splicing 6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites) Announce speaker names; recognize Jim & George Phi-29 Polymerase Stand-displacement amplification
Single molecule amplification sequencing Multiple Displacement Amplification (MDA) NBT (2006) 24: 657-8. . Note!: Single human cell 1000X easier than 5 Mbp Zhang et al., Nature Biotechnology (2006) 24:680
Single-cell sequencing: 4.7 Mbp (plones) Ultra-clean conditions for reduction of background amplification + Real-Time monitoring Post-amplification chip hybridization distinguishes alleles Amplification variation random & easily filled by PCR
CD44 Counts (RNA splicing forms) Eph4 = mammary epithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) Zhu, Shendure, Mitra, Church, Single Molecule Profiling of Alternative Pre-mRNA Splicing. Science 301:836-8.
Beads or not, Ligase or Polymerase Reading Polonies Beads or not, Ligase or Polymerase A G C T
‘Next Generation’ Sequencing Status Multi-molecule Reaction Volume AB/APG Ligase beads 1 fL 454/Roche Pol beads 100,000 fL Solexa Pol term 1 fL CGI Ligase 1 fL Affymetrix Hybr array 100 fL Single molecules Helicos Biosci Pol <1fL Visigen Biotech Pol FRET <1fL Pacific Biosci Pol <1fL Agilent Nanopores <1fL fL =1E-15 liters (femto) (7/9 involve our lab)
Length& run-time vs. Accuracy&Cost "Future improvements in the read lengths, demonstrated at 7 consecutive bases per tag (Shendure et al., 2005) and reductions in the run time, currently 60 hours, will make this a useful platform for resequencing." --Leamon, et al. (454) Gene Therapy and Regulation 3: 15-31 Note that without ‘future improvements’: Affymetrix/Illumina read-lengths of 1 base per tag are useful. 60 million reads/run is 10X faster per read than 500K reads/run. & 50X lower cost per bp due to lower reagent & instrument costs. $500/run $140K
Polony Sequencing Equipment CCD camera microscope with xyz controls Autosampler (96 wells) (HPLC-like) flow-cell syringe pump temperature control
Monolayer immobilization Integrated Polony Sequencing Pipeline (open source hardware, software, wetware) Monolayer immobilization In vitro paired tag libraries Bead polonies via emulsion PCR Enrich amplified beads Dressman et al PNAS 2003 SBE or SBL sequencing SOFTWARE Images → Tag Sequences Tag Sequences → Genome Epifluorescence & Flow Cell $140K Shendure, Porreca, Reppas, Lin, McCutcheon, Rosenbaum, Wang, Zhang, Mitra, Church (2005) Science 309:1728.
4 positions for paired-end anchor 'primers' ePCR bead L Tag 1 M Tag 2 R 5’ 3’ 7 bp 6 bp 7 bp 6 bp 4 positions for paired-end anchor 'primers' Each yields 6 to 7 bp of contiguous sequence 26 bp new sequence per 135 bp amplicon
Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers Excitation Emission 647 700 555 605 572 630 555 700 5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’ nm 5'PO4 ACUCAUC… (3’)…TAGAGT????????????????TGAGTAG…(5’) Shendure, Porreca, et al. (2005) Science 309:1728
Why low error rates? Goal of genotyping & resequencing Discovery of variants e.g. cancer somatic mutations 4E-6 (&lab-evolved cells) Consensus error rate Total errors (E.coli) (Human) 1E-4 Bermuda/Hapmap 500 600,000 4E-5 454 200 240,000 3E-7 Polony-SbL @6X 0 1800 1E-8 Goal for 2006 0 60 Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
Microbial lab evolution Lenski Citrate utilization Church Trp/Tyr exchange Palsson Glycerol utilization Edwards Radiation resistance Ingram Lactate production Stephanopoulos Ethanol resistance Marliere Thermotolerance J&J Diarylquinoline resistance (TB) DuPont 1,3-propanediol production
Polony-based Whole-Genome Mutation Discovery of DTrp clone Position Type Gene Location Function Mechanism 986,334 T > G ompF Promoter-10 Promoter of Non-specific transport channel Makes promoter more consensus-like 985,797 Glu > Ala Non-specific transport channel Makes pore bigger and more hydrophobic 931,960 D8 bp lrp frameshift General Transcriptional Regulator ? ompF – non specific transport channel Glu-117 → Ala (in the pore) Charged residue known to affect pore size and selectivity Can increase import & export capability simultaneously Shendure, et al. (2005) Science 309:1728
Multiple Genotypes, Similar Themes Evolving Population: Multiple Genotypes, Similar Themes PCR amplification and sequencing of OmpF and Lrp from multiple clones from 3 independent lines of Trp/Tyr co-cultures: OmpF: 42R G, L, C, 113 DV, 117 EA Arg Gly, Leu, Cys ; Asp Val; Glu Ala Hydrophillic and bulky hydrophobic and smaller Promoter: -12AC, -35 CA More consensus like Lrp: 1bp deletion, 9bp deletion, 8bp deletion, IS2 insertion, R->L in DBD. Change in global gene regulation? Heterogeneity within each time-point reflects colony heterogeneity. Reppas, Lin, et al (unpublished)
Mixture of wild & 2kb Inversion (pin) proximal tag placement Incorrect distance Red=same strand Black opposite strand distal tag placement 1,206k 1,210k Using paired ends, rearrangement & copy-number detection is >1000X easier than point mutation detection (6X vs 6000X)
Polonies for human inversions >300 kbp long inverted repeats Turner, Hurles, et al. 2006 Nat Methods 3:439-45. Sanger Inst. & HMS
Polonies for haplotyping, recombination, LOH Sequencing/genotyping on single human chromosomes Polonies for haplotyping, recombination, LOH 153 Mbp Zhang et al. Nature Genet. Mar 2006
Monitoring resistance to BCR-ABL-kinase inhibitors with polonies during CML patient therapy Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley (submitted) M244V T315I E255K
Muliplex Polony Summary Technologies for selecting genomic regions Mbp scale for rearrangements RNA tags & spliceforms 1 to 200 bp scale for SNPs & exons (1%) Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements Detection of rare mutations (e.g. drug resistance alleles) 60 million reads per run
.
.
Polonies with & without beads or gels Increases from 14 to 57 million polony beads per run & improves data quality. Kim, Porreca, Seidman, Church unpublished
Why low error rates? Goal of genotyping & resequencing Discovery of variants e.g. cancer somatic mutations 4E-6 (&lab-evolved cells) Consensus error rate Total errors (E.coli) (Human) 1E-4 Bermuda/Hapmap 500 600,000 4E-5 454 @40X 200 240,000 3E-7 Polony-SbL @6X 0 1800 1E-8 Goal for 2006 0 60 Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
Cost vs consensus error rate Polony Sep05 Sep 06 AB3730 454 Sep05 $/kb @4E-5 $7 $9 0.8 0.07 $/3e9@1X 3M 300K $30K Paired ends yes no yes Device $ 365K 400K 140K Announce speaker names; recognize Jim & George Cost vs consensus error rate
Cancer exon sequencing $250K per sample (13,023 genes, 21 Mbp, 135,483 primer pairs) using PCR & capillary sequencing. $3K per sample (estimate) using single tube capture & polonies Sjoblom et al. The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science. 2006 Sep; Davies et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 2005 65:7591-5.