Thanks to Harvard/MIT Team: Jake Jaffe, Kyriacos Leptos, Matt Wright, Daniel Segre, Martin Steffen DARPA BIOCOMP 23-May-2002 Model-data integration. Issues of flux optimality & polymer mechanics of 4D cell models
gggatttagctcagtt gggagagcgccagact gaa gat ttg gag gtcctgtgttcgatcc acagaattcgcacca Post- 300 genomes & 3D structures
DoD Relevance: Accurate Bio I/O Engineering Over-determined Calculable Protein folding vs. crystallography Accurate Comprehensive/Quantitative Bio-Systems Embrace outliers Analytic & Synthetic Useful Computer-Aided-Design (CAD) >>INTEGRATION<<
DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Technical challenge: Integrating Measures & Models Microbes Cancer & stem cells Darwinian In vitro replication Small multicellular organisms RNAi Insertions SNPs
Human Red Blood Cell ODE model 200 measured parameters GLC e GLC i G6P F6P FDP GA3P DHAP 1,3 DPG 2,3 DPG 3PG 2PG PEP PYR LAC i LAC e GL6PGO6PRU5P R5P X5P GA3P S7P F6P E4P GA3PF6P NADP NADPH NADP NADPH ADP ATP ADP ATP ADP ATP NADH NAD ADP ATP NADH NAD K+K+ Na + ADP ATP ADP ATP 2 GSHGSSG NADPHNADP ADO INO AMP IMP ADO e INO e ADE ADE e HYPX PRPP R1P R5P ATP AMP ATP ADP Cl - pH HCO 3 - Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286. (
Gene deletions Normalized optimal growth Linear Programming Flux Balance Analysis (v ko =0)
Minimal Perturbation Analysis for the analysis of non-optimal metabolic phenotypes Daniel Segre Challenge #1: Suboptimality of mutants --integrating growth rate and flux data
This is a Quadratic Programming (QP) problem: Minimize Dist= i (x i -a i ) 2 given Sx=b ; x 0 Minimize (x T Qx)/2 + a T x given Sx=b ; x 0 Standard form:
Optimal (FBA) Suboptimal(MPA) p = 4·10 -3 p = 22 test for prediction of essential genes:
C009-limited Experimental Fluxes Predicted Fluxes pyk (LP) WT (LP) Experimental Fluxes Predicted Fluxes Experimental Fluxes Predicted Fluxes pyk (QP) =0.91 p=8e-8 =-0.06 p=6e-1 =0.56 P=7e-3
DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Technical challenge: Integrating Measures & Models Microbes Cancer & stem cells Darwinian In vitro replication Small multicellular organisms RNAi Insertions SNPs
Minimal Perturbation Analysis for the analysis of non-optimal metabolic phenotypes Challenge #1: Suboptimality of mutants --integrating growth rate and flux data
Polymer mechanics of 4D cell models (Automating integration of data) Challenge #2: integrating proteomics & in vivo crosslinking data
Mapping genome folding DNA:DNA, DNA:protein, protein:protein in vivo crosslinks Dekker etal. Science : Capturing chromosome conformation.
In vivo crosslinking DNA-binding proteins
Retention time min P S W C M V A R C C T K D Q G A G L F E K [Optional 1 st & 2 nd Protein dimensions: Subcellular fractions, Sizing of native protein complexes 1st peptide Dimension: Strong Cation Exchange Charge 2 nd peptide Dimension: Reverse Phase Chromatography Hydrophobicity 3 rd peptide Dimension: Mass Spectrometry Mass per charge Multidimensional protein and peptide separations for MS quantitation m/z
Β.Β.A. C. rt 1 rt 2 rt 3 MS1 D.
Minimal Cell Projects The first FULL proteome model would benefit from a small number of natural cell states & genes. 3D-structure of a cell during replication & motility. Genome engineering / complete synthesis.
Small sequenced genomes (excludes organelle/symbionts) Mollicutes = cell-wall-less bacteria, a subgroup of Clostridia “gram-positive” o Acholeplasmataceae Acholeplasma, Anaeroplasma, Phytoplasma o Mycoplasmatales Entomoplasmataceae (florum) Mycoplasmataceae pulmonis urealyticum pneumoniae genitalium (mobile) Spiroplasmataceae Megabases
Motility Species nm/ secReplicateTemp M. mobile30005 hr25 M. pneumoniae M. florum U. urealyticum 0>1037 E.coli H. sapiens 1000 >1037 RNA Pol / ribosome20 (=50 nt/s) E.coli DNA Pol3 300 (=1000 nt/s)
Attachment organelle replication Seto S, Layh-Schmitt G, Kenri T, Miyata M. J Bacteriol :1621 Visualization of the attachment organelle and cytadherence proteins of Mycoplasma pneumoniae by immunofluorescence microscopy.
Mycoplasma pneumoniae Regula, et al, Microbiology 147: , scale bar = 100 nm
Hypothetical mechanisms
Proteo- genomic mapping (of peptide data in 3 forward & 3 reverse frames)
Use of proteogenomic mapping to discover B. a new ORF. C. a new ORF & delete an inaccurately predicted ORF. D. N-terminal extension of an existing ORF.
Constraints Replication Membrane-bound polyribosomes Other RNA and/or protein complexes Metabolism DNA Structural Forces
Genome folding & cell 3D structure Seto & Miyata (1999) Partitioning, movement, and positioning of nucleoids in Mycoplasma capricolum J. Bact. 181:6073 Cell = 0.5 kbp genome Extended diameter = 80 ~200 transverses with each membrane encoding gene anchored to the cell surface. How to segregate this?
Paired fork model Dingman CW. Bidirectional chromosome replication: some topological considerations. J Theor Biol 1974 Jan;43(1): Sundin O, Varshavsky A. Terminal stages of SV40 DNA replication proceed via multiply intertwined catenated dimers. Cell Aug;21(1): Hearst JE, Kauffman L, McClain WM. A simple mechanism for the avoidance of entanglement during chromosome replication. Trends Genet Jun;14(6): Bouligand, Y, Norris V (2000) “Both replication forks appear to be part of a single complex or factory, as first proposed by Dingman.” Roos M, Lingeman R, Woldringh CL, Nanninga N. Biochimie 2001 Jan;83(1):67-74 Experiments on movement of DNA regions in Escherichia coli evaluated by computer simulation.
Constraints Replication Membrane-bound polyribosomes could anchor the RNA polymerase and hence the gene’s DNA to within 20 nm of the cell surface. Other RNA and/or protein complexes Metabolism DNA Structural Forces
Origin Blue: First MPN gene# Green : Mid gene # 344 (ter) Red: Last gene# 688 Side view, no replication ( gene#)
Off-axial view, no replicated segments, unoptimized membrane Yellow: Membrane Pink: Ribosomal White: Hypothetical & abundant Green : Misc. abundant Blue: Weak
Axial view, no replicated segments Yellow: Membrane Pink: Ribosomal White: Hypothetical & abundant Green : Misc. abundant Blue: Weak
Origin Yellow: Membrane Pink: Ribosomal White: Hypothetical & abundant Green : Misc. abundant Blue: Weak Side view, no replicated segments
Origin Blue: Origin of replication Red: Terminus Side view, no replication (dis from ori)
R1R1 R2R2 M1M1 M2M2 M3M3 Simple example cost function for chromosome structure optimization
2002_5_16_h18_ _5_16_h19_ _5_16_h19_ _5_16_h19_ _5_16_h19_ _5_16_h20_ _5_16_h20_ _5_16_h20_ _5_16_h21_ _5_16_h21_ _5_16_h21_ _5_16_h23_ _5_17_h0_ _5_17_h0_ _5_17_h4_ _5_17_h4_ _5_17_h4_ _5_17_h4_ _5_17_h4_ _5_17_h5_ _5_17_h5_ _5_17_h6_ _5_17_h6_ _5_17_h7_ _5_17_h7_ _5_17_h7_ _5_17_h7_ _5_17_h8_ E_final s Searching six helical parameters for chromosomal fold
Monte carlo minimization of the model fit to constraints.
2002_5_17_h5_
2002_5_16_h20_
2002_5_17_h4_
2002_5_17_h4_
data_2002_5_19_h0_40
data_2002_5_16_h18_42
data_2002_5_16_h19_34
data_2002_5_16_h21_50
data_2002_5_16_h19_42
data_2002_5_16_h21_56
data_2002_5_16_h20_3
data_2002_5_16_h19_0
data_2002_5_16_h20_30
data_2002_5_16_h21_5
Origin Blue: Left replicated segment (yelgr=high gene#) Red: Right (i.e. middle) segment Aqua: unduplicated segment of the circular genome Avoidance of entanglement throughout cell cycle
M. pneumoniae genes generally point away from Ori More significant if abundance data are integrated Alignment of known motors: Polymerases,b ribosomes, F1 ATPase
Biospice 2.0 Deliverables: toolsets for data integration & optimality assessment #1QP MPA flux & growth modeling #2: 4D-model current plan: Chromosome segregation Membrane-bound polysomes Ribosomal protein/rRNA assembly Motility (coordination with replication origin) Next few months: Other protein complexes Space filling metric Replication entanglement metric In vivo crosslinking