Thanks to: Broad Inst., DARPA-BioComp, DOE-GTL, EU-MolTools, NGHRI-CEGS, NHLBI-PGA, NIGMS-CECBSR, PhRMA, Lipper Foundation Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu BU BME retreat 23-Jun :45-10:30 Seacrest, N. Falmouth, MA Optimal Combinatorial Biology & Genome Engineering
Exponential technologies Shendure J, Mitra R, Varma C, Church GM (May 2004) Advanced Sequencing Technologies: Methods & Goals. Nature Reviews of Genetics 5, ABI
Programming cells with DNA vs. Digital computers simulating cells Cells simulating digital computers Drugs & devices simulating human systems
Engineering complex systems (comparative genomics) Stedman et al. (2004) [Masticatory] Myosin gene mutation correlates with anatomical changes in the human lineage Nature 428,
DNA RNA Proteins Metabolites Replication rate Environment Biosystems Engineering Integrating Measures & Models Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms RNAi Insertions SNPs interactions
Now that we have 200 genomes, why sequence? Once per organism Phylogenetic footprinting, biodiversity RNA splicing & chromatin modification patterns. Cell-lineage during development NA "aptamers" & Ab for any protein Once per person Preventative medicine & genotype–phenotype associations Frequently Cancer: mutation sets for individual clones, loss-of-heterozygosity B & T-cell receptor diversity: Temporal profiling, clinical New & old pathogen "weather map", biowarfare sensors DNA computing & lab selections Shendure et al Nature Rev Gen 5, 335.
Why 'single molecule' sequencing? (1) Single-cell analyses, e.g. Preimplantation (PGD) (2) Co-occurrence on a molecule, complex, cell e.g. RNA splice-forms (3) Cost: $1K-100K "personal genomes" (4) Precision: Counting 10 9 RNA tags (to reduce variance) (~5e5 RNAs per human cell) Fixed 5e3 5e4 5e6 5e9 (goal) Costs EST SAGE MPSS Polony-FISSeq (polymerase colony)
Polony Fluorescent In Situ Sequencing Libraries Greg Porreca Abraham Rosenbaum 1 to 100kb Genomic L R M L R PCR bead Sequencing primers Selector bead 2x20bp after MmeI Dressman et al PNAS 2003 emulsion
Cleavable dNTP-Fluorophore (& terminators) Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65 Reduce or photo- cleave
Polony- FISSeq : up to 2 billion beads/slide White= Fe-core pixels, Cy5 primer (570nm) ; Cy3 dNTP (666nm) Jay Shendure
# of bases sequenced (total)23,703,953 # bases sequenced (unique)73 Avg fold coverage324,711 X Pixels used per bead (analysis)~3.6 Read Length per primer14-15 bp Insertions 0.5% Deletions 0.7% Substitutions (raw) 4e-5 Throughput:360,000 bp/min Polony FISSeq Stats Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X) (This may omit: PCR, homopolymer, context errors) Shendure
CD44 Exon Combinatorics (Zhu & Shendure) Alternatively Spliced Cell Adhesion Molecule Specific variable exons are up-or-down-regulated in various cancers (>2000 papers) v6 & v7 enable direct binding to chondroitin sulfate, heparin… Zhu,J, et al. Science. 301:836-8.
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 RNA exon examples auto- regridded & quan- titated Zhu,J, Shendure,J, Mitra, RD, Church, GM (2003) Science. 301: Single Molecule Profiling of Alternative Pre-mRNA Splicing.
Zhu J, Shendure J, Mitra RD, Church GM. Science 301: Single molecule profiling of alternative pre-mRNA splicing. Eph4 = murine mammary epithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) CD44 RNA isoforms
DNA RNA Proteins Metabolites Replication rate Environment Biosystems Engineering Integrating Measures & Models Escherichia Darwinian optima Prochlorococcus mutant suboptimality Homo RNAi Insertions SNPs interactions
Integer Stochiometric matrix (Roche/ExPASy) Metabolic Pathways Cellular Processes
XiXi Membrane V transport V syn V deg V growth Growth: c 1 X i + c 2 X c m X m Biomass Flux ratios at each branch point yields optimal polymer composition for replication X i =const. v j =0
AcCoA CoA ATP FAD NADH Xi = metabolites Ci = coeff. in growth reaction Biomass composition Edwards & Palsson, PNAS 2000, BMC Bioinf Optimize flow from input C,N,P to Biomass GTP Trp Leu Ala Arg Gly Cys Ser Asn Asp His CTP UTP SucCoA Val Glu Gln Phe Pro Ile Lys Met Tyr Thr dACGT
Minimization of Metabolic Adjustment (MoMA) Linear Programming (LP) to find optima, Quadratic (QP) to find closest points x,y are two of the 100s of flux dimensions Wild-type optimum Mutant optimum Mutant initially (closest point) Mutant Wild type (feasible flux polyhedra) Objective function = growth flux hyperplanes Segre, Vitkup, & Church PNAS 99:
12 C 13 C MS/NMR Flux Ratio Data
Experimental Fluxes Predicted Fluxes pyk (LP) WT (LP) Experimental Fluxes Predicted Fluxes Experimental Fluxes Predicted Fluxes pyk (QP) =0.91 p=8e-8 =-0.06 p=6e-1 =0.56 p=7e-3 Flux Data C009- limited
Reproducibility of mass competition Correlation between two selection experiments Badarinarayana, et al. Nature Biotech.19: 1060
Competitive growth data 2 p-values 4x x10 -5 Position effects Novel redundancies On minimal media negative small selection effect Hypothesis: next optima are achieved by regulation of activities. LP QP
Motif Co-occurrence, comparative genomics, RNA clusters, and/or ChIP 2 -location data P= to Genome Res. 14:201–208 Bulyk, McGuire,Masuda,Church
Synthetic testing of DNA motif combinations (1.3 in argR) RNA Ratio (motif- to wild type) for each flanking gene Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208
Systems Biology Loop Synthesis / Perturbation Model Experimental design (Systematic) Data Proteasome targeting Genome Engineering
Engineering BioSystems Perturbations Action Specificity %KO "Design" Small molecules (drugs) Fast Varies Varies Hard Antibodies Fast Varies Varies Hard RNAi Slow Varies Medium OK Insertion "traps" Slow Yes Varies Random Proteasome targeting Fast Excellent Medium Easy Homologous recombination Slow Perfect Complete Easy
Programming proteasome targeting Janse, DM, Crosas,B Finley,D & Church, GM (2004) Localization to the Proteasome is Sufficient for Degradation.
Synthetic Genomes & Proteomes. Why? Test or engineer cis-DNA/RNA-elements Access to any protein (complex) including post-transcriptional modifications Affinity agents for the above. Mass spectrometry standards, protein design Utility of molecular biology DNA-RNA-Protein in vitro "kits" (e.g. PCR, SP6, Roche) Toward these goals design a chassis: 115 kbp genome. 150 genes. Nearly all 3D structures known. Comprehensive functional data.
PURE translation utility (yet room for improvement) Removing tRNA-synthetases, RNases & proteases makes feasible: Optimal mRNA structure & codon usage Lee et al J Immunol Methods. 284: Selection of scFvs specific for HBV DNA polymerase using ribosome display. Forster et al. 2003Programming peptidomimetic syntheses by translating genetic codes designed de novo. PNAS 100: Klammt et al Eur J Biochem. 271: High level cell-free expression & specific labeling of integral membrane proteins. Shimizu et al Nat Biotechnol. 19: Cell-free translation reconstituted with purified components.
in vitro genetic codes 5' mS yU eU UGG UUG CAG AAC... GUU A 3' GAAACCAUG fMTNVE | | | 5' Second base 3' U A C C U mS yU eU A C U G A Forster, et al. (2003) PNAS 100: % average yield per unnatural coupling. bK = biotinyllysine, mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid
Mirror world : resistant to enzymes, parasites, predators L-amino acids & D-ribose (rNTPs, dNTPs) Transition: EF-Tu, peptidyl transferase, DNA-ligase D-amino acids & L-ribose (rNTPs, dNTPs) Dedkova, et al. (2003) Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125,
Forster & Church Oligos for 150 & 776 synthetic genes (for E.coli minigenome & M.mobile whole genome respectively)
Up to 760K Oligos/Chip 18 Mbp for $700 raw (6-18K genes) <1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng, Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert Tian, Gong, Church
Improve DNA Synthesis Cost Synthesis on chips in pools is 5000X less expensive per oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!) Solution: Amplify the oligos then release them => ss-70-mer (chip) 20-mer PCR primers with restriction sites at the 50mer junctions Tian, Gong, Sheng, Zhou, Gulari, Gao, Church => ds-90-mer => ds-50-mer
Improve DNA Synthesis Accuracy via mismatch selection Tian & Church
Genome assembly Challenges: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding) 2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous recombination (Nick Reppas) Stemmer et al Gene 164: Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides … 100*2^(n-1)
M DNA Templates RNA Transcripts All 30S-Ribosomal-protein DNAs & mRNAs synthesized in vitro s19 0.5kb 0.3kb Nimblegen Xeotron/Atactic Wild-type DNA Templates Tian, Gong, Sheng, Zhou, Gulari, Gao, Church
Improving synthesis accuracy 9-fold Method Total bp # Clones Trans- ition Trans- versionDeletionAddition Bp/error Hyb selection, PCR Gel selection, PCR No selection, ligation +PCR No selection, PCR Tian & Church
Extreme mRNA makeover for protein expression in vitro RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable. Solution: Iteratively resynthesize all mRNAs with less mRNA structure. Tian & Church Western blot based on His-tags
Enabling technologies Multi-Gene Assembly Protein, peptidomimetic synthesis CAD/CAM & Design for manufacturing Automated homologous recombination for E.coli & embryonic stem cells Fidelity enhancements Sequencing 10 7 bp/$ ($1K/human)
Thanks to: DOE-GTL, DARPA-BioComp, NIGMS-CECBSR, NGHRI-CEGS, PhRMA, EU-MolTools, NHLBI-PGA, Broad Inst., Lipper Foundation Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu BU BME retreat 23-Jun :45-10:30 Seacrest, N. Falmouth, MA Optimal Combinatorial Biology & Genome Engineering
.
Improve DNA Synthesis accuracy Synthesis on a chip pools of "construction" ~50-mers and two complementary "selection" ~26-mers (Left & Right) => ss-70-mer (chip) Tian, Gong, Sheng, Zhou, Gulari, Gao, Church => ds/ss-50-mer (amplif/restrict) => ss-56-mer (chip) 20-mer PCR primers (one biotinylated) Biotin => ss-76-mer (amplif/avidin)
Improve DNA Synthesis Accuracy via D-HPLC or MutS Smith & Modrich (1997) PNAS 94: 6847–50. Removal of polymerase-produced mutant sequences from PCR products. MutHLS Cleaves at GATC near mismatches. Lowers error rate from 6e-6 to 6e-7. Bellanne-Chantelot et al. (1997) Mutat Res. 382: Search for DNA sequence variations using a MutS-based technology. Mulligan & Tabone (2002) US Patent 6,664,112. Methods for improving the sequence fidelity of synthetic doublestranded- oligonucleotides.