A Model of Bacterial Chromosome Architecture Matthew Wright, Daniel Segre, George Church
Ja mie Goodsell
Genomic Scale Structure
Can we understand the 3-d structure of the chromosome? How optimal is the spatial organization of DNA for cell? Can we link function and chromosome structure?
DNA structure has conserved features Hypothesis
Mycoplasma Pneumoniae 816 Kbp 90% Coding 688 Genes 110 Membrane Proteins 52 Ribosomal Proteins No Active Transport No Regulation Limited Metabolism Few DNA Binding Proteins A Model System
.5 m diameter.06 m 3 volume 8000 Ribosomes would fill the cell Extended DNA 80 m in diameter over 100 times cell diameter “Nose” polarity Features
MicroscopyCross-linkingLoop Patterns Tom Knight Gasser et al. Science Dekker etal. Science Empirical Constraints
Transmembrane Proteins Potter MD, Nicchitta CV, 2002 J Biol Chem Jun 28;277(26) 110 genes RNA and or Protein Complexes 52 genes Metabolism DNA Structural Forces Tobias I et al Phys Rev E Stat Phys Plasmas Fluids Relat Intdisc. Topics Jan;61(1) Replication Theoretical Constraints
Symmetry Constraints Symmetric Replication If polymerases replicate at a constant rate symmetric sites from origin are close when replicated Flattened Circle O T
R1R1 R2R2 M1M1 M2M2 M3M3 Cost Function + other terms
Random Walk of GenomeMontecarlo of Parametrized Structures Methods
Random Walk r n segments 2n-1 Parameters
Montecarlo of Parametrized Structures A Random Walk in Helical Parameter Space
General Helix Parameters a (rise) Supercoil Parameters w (frequency) Ac (amplitude of cos) As (amplitude of sin) Radial Parameters R (maximum large radius) d (frequency of large radial oscillations) Helix Parameters
Energy Decreases
Trivial Solution
Entangled Solution
Possible Solution
Gene Distribution on Structure
Begin With Optimization in Helical Parameter Space Then Perform Random Walk of Genome for Secondary Optimization Generate Relatively Ordered Structures while allowing Local Disorder to Meet Constraints Combine Both Methods
Starting Structure
Final Structure
time steps cost Energy
Prelimary data are promising Incorporate Distance Geometry Need to calculate statistics Gather experimental Data predict and test Incorporate Replication and Dynamics Current
Distance Geometry Represent Structure in terms of distances Constraints fit into a single matrix Matrix with “bounds” defines all possible configurations Can find inconsistencies in constraints Rotationally invariant
Basis Cholesky or eigenvalue decomposition of inner product matrix, M Can get M from D, matrix of distances by defining an origin
Additional Cost Terms Proximity of Enzymes during Metabolism Stoichiometric Matrix Curvature Replication Incorporate Forces on DNA by Using Elastic Rod Model
Classical Model Constraints from Replication Paired Fork Model
Polymerase Based Model Replicate chromosome structure and separate t
If constraints based on function predict structure then structure and function are related at genome scale Potential new class of model Conclusions
Acknowledgements George Church Daniel Segre’ Church Lab
Method Place constraints in matrix Solve for upper and lower bounds from triangle inequalities Randomly choose a configuration within these bounds Embed in 3 dimensions Minimize error
Model for nose replication Seto S, Layh-Schmitt G, Kenri T, Miyata M. J Bacteriol 2001 Mar;183(5): Visualization of the attachment organelle and cytadherence proteins of Mycoplasma pneumoniae by immunofluorescence microscopy.
Bidirectional 2 Polymerase Complexes Remain Attached Daughter DNA Separate Sides Causes Minimal Entanglement Allows for Multiple Firing of Origins Paired fork model
Topological Consequences
Triangle Bound Smoothing Upper bounds Lower bounds
Frenet Frame on Helix
P(i,t) P(i,t+1) P(i-1,t) P(i+1,t) P(i+1,t+1) P(i-1,t+1) dd dd Relaxing the Perturbed Structure
Melting Temperature Short Duplex –C total concentration of single strands Long Duplex
Wordsize (a digression) Blast seeds with at least 7 base string of identities Want to find all alignments with at most 20 mismatches What is the probability of finding a stretch of 7 identities in a string of length 70 with 20 mismatches?
Marbles Maps into the problem of partitioning a string of length 70 into 21 bins Total number of ways etc
Counting Now count the fraction with at least a stretch of 7 But over-counting is a problem
Correcting The cases where 2 bins each have a 7 mer is counted twice so subtract this number once Problem with the cases where there are 3 bins with a 7 mer
Correction Continued Principle of inclusion-exclusion
Extension Coefficients for at least m bins of wordsize l m=2 –1,-2, 3,-4 … m=3 –1,-3,5,-7
A familiar object?
Hello Blaise