Harvard Med, WashU, MIT NIH-CEGS, DARPA, PhRMA, DOE-GTL Sequencing: Helicos, Ambergen, Caliper, BC-Agencourt-GTC, Synthesis: Nimblegen, Atactic/Invitrogen, CodonDevices Systems: BGMedicine, Genomatica MPM Santa Barbara Thu 21-Jul :30 PM Synthesizing & Sequencing Genomes 1974 Code 1984 Genome Project 1994 H.pylori 2004 Synthetic Biology
3 Exponential technologies (synergistic) Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Carlson 2003 ; Kurzweil 2002; Moore 1965 urea E.coli B12 tRNA operons telegraph Computation & Communication (bits/sec~m$) Synthesis (amu/project~M$) Analysis (kamu~base/$) tRNA
Safer biology via synthetic biology Cell & Eco Systems modeling HiFi automated gene replacement Inexpensive bio-weather-map custom biosensors (airborne & medical fluids), International bio-supply-chain licensing (min research impact, max surveillance) Metabolic dependencies prevent survival outside of controlled environments Multi-epitope vaccines & biosynthetic drugs. Cells resistant to most existing viruses via codon changes see: arep.med.harvard.edu/SBP difficulty
Constructing new genetic codes (two examples) 1. Codons: 313 UAG stop > UAA stop 2. Delete RF1 (1 free codon, for new aa e.g. PEG-pAcPhe-hGH) 1. Codons: AGY Ser > UCX Ser 2. tRNAs: AGY Ser > AGY Leu 3. Codons: UUR/CUX Leu > AGY 4. tRNAs: UUR Leu > UUR Ser 5. Codons: UCX Ser > UUR Ser (Leu & Ser now switched & 8 codons free)
Why Synthetic Genomes & Proteomes? Test or engineer cis-DNA/RNA-elements Drug biosynthesis e.g. Artemesinin (malaria) Epitopes & vaccines (HIV gag) Unnatural aa & post-translational modifications De novo protein design & selection. Humanizing imm/tox systems, E.colizing codons 20 bit in vivo counters Why whole genomes? Changing the genetic code, safety, genome stability, enhanced restriction, recombination
10 Mbp of oligos / $1000 chip <1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng, Zhou, Gulari, Gao (U.Houston) 12K Combimatrix Electrolytic 24K Agilent Ink-jet standard reagents 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert Tian et al. Nature. 432:1050 ~1000X lower oligo costs (= 2 E.coli genomes or 20 Mycoplasmas /chip)
Improve DNA Synthesis Cost Synthesis on chips in pools is 1000X less expensive per oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!) Solution: Amplify the oligos then remove tags => ss-70-mer (chip) 20-mer PCR primers with restriction sites at the 50mer junctions Tian, Gong, Sheng, Zhou, Gulari, Gao, Church => ds-90-mer => ds-50-mer
Improve DNA Synthesis Accuracy via mismatch selection Tian & Church Other mismatch methods: MutS (&H,L)
Improving DNA synthesis accuracy Method Bp/error Chip assembly only 160 Hybridization-selection 1,400 MutS-gel-shift 10,000 PCR 35 cycles 10,000 MutHLS cleavage 100,000 In vivo replication 1,000,000,000 0 MutS 2 MutS Tian & Church 2004 Nature Carr & Jacobson 2004 NAR Smith & Modrich 1997 PNAS {
CAD-PAM: Computer-Aided Design - Polymerase Assembly Multiplexing For tandem, inverted and dispersed repeats: Focus on 3' ends, hierarchical assembly, size-selection and scaffolding. Mullis 1986 CSHSQB, Dillon&Rosen 1990 BioTechniques, Stemmer 1995 Gene, Smith et 2003 PNAS, Tian et al Nature, … 100*2^(n-1)
E.coli genome synthesis pipeline PAM cycle# #bp HS PAM 425 MutS PAM 10K anneal 100K red 5Mbp USER-T4Pol USER-T4Pol USER-5'only 1 pool 480 pools 480 genomic 48 1 of 117K universal primers primer pairs HS=Hybridization-Selection USER=Uracil DNA glycosylase &EndoVIII to remove flanking primer pairs PCR in vitro Isaacs, Carr, Emig, Gong, Tian, Reppas, Jacobson, Church in vivo >3 days >1 day ~1day >7days
5 Mbp Genome assembly alternatives 1. cat Automated in vivo homologous recombination: Serial electroporation: 48 stages: 1 strain (21 hr/stage) vs. Hierarchical conjugation: 7 stages: 48 > 24 > 12 > 6 > 3 > 2 > 1 strains vs. Random/simultaneous 1 or more stages 3. cat 2. kan Reppas & Church
All 30S-Ribosomal-protein DNAs (codon re-optimized) Tian, Gong, Sheng, Zhou, Gulari, Gao, Church 1.7 kb 0.3 kb s19 0.3kb Nimblegen 95K chip Atactic <4K chip 14.6 kb
Full codon remapping for enhanced protein expression RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable. Solution: Iteratively resynthesize all mRNAs with less mRNA structure, lower %GC Tian et al. Nature. 432:1050 Western blot based on His-tags
Genome engineering: safety via metabolic dependency trp/ tyrA pair of genomes shows the best co-growth Reppas, Lin & Church unpublished. First Passage SecondPassage (ideally ecologically uncommon molecules)
Sequence monitoring of evolution (anticipate escape & resistance)
Molecular Genomic Imaging Center (CEGS) Harvard / Wash U George Church, Rob Mitra Greg Porreca, Jay Shendure Sequencing by Ligation on Polony Beads with Nick Reppas, Kun Zhang, Shawn Douglas, Mike Wang, Abraham Rosenbaum, Agencourt Personal Genomics, Stem Cells, ELSI Synthetic Biology
A’ B B B B B B A Single molecule from a library or population B B A’ Primer is Extended by Polymerase B A’ B Polymerase colony (polony) PCR in a gel (or emulsion-beads) Primer A has 5’ immobilizing Acrydite (or double biotin to SA-beads) Mitra & Church Nucleic Acids Res. 27: e34
New Polony Sequencing Pipeline In vitro paired tag libraries Bead polonies via emulsion PCR Monolayer gel immobilization Enrich amplified beads SOFTWARE Images → Tag Sequences Tag Sequences → Genome SBE or SBL sequencing Epifluorescence & Flow Cell Mitra, Shendure, Porreca, Rosenbaum, Church unpub.
~1 kb genomic fragment paired genomic tags (17 to 18 bp each) common sequences MmeI Fisseq-F -R T30Tag 2Tag 1 Fisseq-FLeftRight Mid Seq2Seq1 In vitro construction of a complex, mate-paired library 43 bp Total = bp amplicon
Enrichment by Hybridization Selector Bead
One of 750 megapixel frames of gel-immobilized 1.0 micron beads, 0.3 micron pixels, 4-colors
ACUCAUC… (3’)…TAGAGT????????????????TGAGTAG…(5’) 5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’ 5'PO 4 Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers Excitation Emission nm
Consensus AccuracyFalse Positives (E.coli)False Positives (Human) 1E-34,0003,000,000 1E-4 BERMUDA/ABI ,000 1E-6Polony-SBL Goal of Resequencing Discovery of Uncommon Variation E.g. cancer mutations <1E-6 Why low error rates?
ABI2004 Jun >2007 # bp/expt-2e73e7 3e8 60e9 Complexity (bp) -744e6 3e9 6e9 Avg Fold Cov83e Pix per bp Read-length (SBE) 25 (pair)35 42 $ / kb (e<1e-5) e-5 $/ 1X 3e9 b 2e6 - 2e5 5e4 200 Indel Error 5e-30.6%1e-3 1e-3 1e-3 Subst Error 4e-34e-61e-3 1e-3 1e-3 3X Cons Err 1e-4 - 1e-6 3e-7 1e-7 Kb / min e3 1e6 Pix / sec-2e52e6 6e6 2e7 Enz $/mg Sequencing cost ( 10 to 100,000 fold improvements)
PositionTypeGeneLocation ABI Confirmation Comments 986,334T > GompFTATA box Only in evolved strain 931,9608 bp dellrpframeshift Only in evolved strain 1,976, bp delinsB_5IS element MG1655 heterogeneity 3,957,960C > TppiC5' UTR MG1655 heterogeneity 4,654,533T > CcIGlu > Glu heterogeneity 4,647,960T > CORF61Lys > Gly heterogeneity 985,797T > GompFGlu > Ala(in progress) 454,864T > CtigGly > Gly (in progress) 4,648,691G > AexoPhe > Phe (in progress) Mutation Discovery in Engineered/Evolved E.coli
rs rs rs39284 rs rs GM10835 CGCGCCGCGC TATATTATAT TT=137 CT=2 (TC=1) CC= Mb Single human chromosome molecules (polony sequencing)
Genome Analysis Policy Insurance/employment: What probability & level of advantage can be hidden/examined? Individual/group stigma Choice, stem cells, cloning Privacy & transparency NHGRI/DOE ELSI, Genetic Screening Study Group
Anonymity, privacy, disclosure, identity
"Open-source" meets Personal Genome-Phenome Project Are information-rich resources (e.g. facial imaging & genome sequence) really anonymous? What are the risks and benefits of "open-source"? What level of training is needed to give informed consent on open-ended studies? Harvard Medical School IRB Human Subjects protocol submitted 16-Sep-2004.
.