Physics of Information Technology MIT – Spring 2006 PART II Avogadro Scale Engineering ‘COMPLEXITY’
Homework I] Nanotech Design: Find an error function for which it is optimal to divide a logic area A into more than one redundant sub-Areas. II] Design Life: (a)Design a biological system which self replicates with error correction (either genome copy redundancy with majority voting or error correcting coding). Assume the copying of each nucleotide is consumptive of one unit of energy. Show the tradeoff between energy consumption and copy fidelity. (b)Comment on the choice biology has taken (64 -3 nucleotide) codons coding for 20 amino acids. Why has biology chosen this encoding? What metric does it optimize? Could one build a biological system with 256 – 4 bit codons? Questions:
Area = A Area = 2*A/2 Probability of correct functionality = p[A] ~ e A (small A) Scaling Properties of Redundant Logic (to first order) P 1 = p[A] = e A P A P 2 = 2p[A/2](1-p[A/2])+p[A/2] 2 = eA –(eA) 2 /4 Conclusion: P 1 > P 2
Designing Life Redundancy Fault Tolerant Error Correcting Other Coding (e.g. Parity) Fault Tolerant Error Correcting
Designing Life Gene1Gene2Gene3Gene1Gene2Gene3 I] Fault Tolerant Redundancy
1.Beese et al. (1993), Science, 260, Replicate Linearly with Proofreading and Error Correction Fold to 3D Functionality template dependant 5'-3' primer extension 5'-3' error-correcting exonuclease 3'-5' proofreading exonuclease Error Rate: 1: Steps per second
MutS Repair System
[Nature Biotechnology 18, (January 2000)] Uniformed Services University of the Health Deinococcus radiodurans (3.2 Mb, 4-10 Copies of Genome ) D. radiodurans: 1.7 Million Rads (17kGy) – 200 DS breaks E. coli:25 Thousand Rads – 2 or 3 DS breaks Approach 1b] Redundant Genomes
Basic Idea: M strands of N Bases Result: By carrying out a consensus vote one requires only To replicate with error below some epsilon such that the global replication error is: Combining Error Correcting Polymerase and Error Correcting Codes One Can Replicate a Genome of Arbitrary Complexity M N
M (# of Copies of Genome) N (Genome Length)
Ribosome mRNA Amino Acid II] Coding
4 Base Parity Genetic Code Let A=0, U,T=1, G=2, C=3 Use 3+1 base code XYZ Sum(X+Y+Z, mod 4) Leu: UUA -> UUAG
Fault Tolerant Translation Codes (Hecht): NTN encodes 5 different nonpolar residues (Met, Leu, Ile, Val and Phe) NAN encodes 6 different polar residues (Lys, His, Glu, Gln, Asp and Asn) Local Error Correction: Ribozyme: 1:10 3 Error Correcting Polymerase: 1:10 8 fidelity DNA Repair Systems: MutS System Recombination - retrieval - post replication repair Thymine Dimer bypass. Many others… Error Correction in Biological Systems E. Coli Retrieval system - Lewin Biology Employs Error Correcting Fabrication + Error Correcting Codes
Physics of Information Technology MIT – Spring /10 1] Von Neumann / McCullough/Winograd/Cowan Threshold Theorem and Fault Tolerant Chips 2] Simple Proofs in CMOS Scaling and Fault Tolerance 3] Fault Tolerant Self Replicating Systems 4] Fault Tolerant Codes in Biology 4/24 1]Introduction of the concept of Fabricational Complexity 2]Examples, numbers and mechanisms from native biology: error correcting polymerase and comparison to best current chemical synthesis using protection group (~feedforward) chemistry. 3]Examples from our error correcting de novo DNA synthesis (with hopefully a demo from our DNA synth simulator) 4]Error correcting chip synthesis 5]Saul's self replicating system with and without error correction
Fabricational Complexity F fab = ln (W) / [ a 3 fab E fab ] F fab = ln (M) -1 / [ a 3 fab E fab ] Total Complexity Complexity Per Unit Volume Complexity Per Unit Time*Energy Complexity Per unit Cost
Fabricational Complexity A AG GTC ATACGT … AGTAGC … Total Complexity Accessible to a Fabrication Process with Error p per step and m types of parts: Complexity Per Unit Cost: For given complexity n*: Where C is cost per step
Fabricational Complexity Non Error Correcting: Triply Error Correcting: AGTC AGTC AGTC AGTC P = 0.9 np n = 300 n P = 0.85
1] Quantum Phase Space 2] Error Correcting Fabrication 3] Fault Tolerant Hardware Architectures 4] Fault Tolerant Software or Codes Resources which increase the complexity of a system exponentially with a linear addition of resources Resources for Exponential Scaling
…Can we use this map as a guide towards future directions in fabrication ? Fabricational Complexity
1.Beese et al. (1993), Science, 260, Replicate Linearly with Proofreading and Error Correction Fold to 3D Functionality template dependant 5'-3' primer extension 5'-3' error-correcting exonuclease 3'-5' proofreading exonuclease Error Rate: 1: Steps per second
Caruthers Synthesis DNA Synthesis /services/catalog99.pdf Error Rate: 1: Seconds Per step
Molecular Machine (Jacobson) Group – MIT - May, 2005 Avogadro Scale Engineering
Gene Level Error Removal Error Rate 1:10 4 Nucleic Acids Research (20):e162
In Vitro Error Correction Yields >10x Reduction in Errors Nucleic Acids Research (20):e162
Error Reduction: GFP Gene synthesis Nucleic Acids Research (20):e162
Autonomous self replicating machines from random building blocks
1] Consider biological cells which are able to copy their genome using appropriate pieces of molecular machinery (e.g. polymerase). Assume that the total probability of correctly copying each nucleotide is p=.999 per nucleotide. Calculate the Total Fabrication Complexity accessible to this system assuming that there are 4 types of nucleotides (i.e. A,G,C,T). Now assume that we have created a new type of cell which has a genome possessing six different types of nucleotides (i.e. A,G,C,T,X,Y). If we assume that we wish to keep the total Fabricational Complexity the same what must the probability per nucleotide addition, p, now be? 2] Consider now the fabricational complexity per unit cost f. Calculate the threshold probability p for which it is advantageous to use a redundant error correction scheme (such as trible redundancy) and majority voting than no error correction. Into which regime does biology fall? HOMEWORK – DUE 5/1/06