Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field.

some simple basics

1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field 5Splicing systems 6P systems 7Hairpins 8Detection techniques 9Micro technology introduction 10Microchips and fluidics 11Self assembly 12Regulatory networks 13Molecular motors 14DNA nanowires 15Protein computers 16DNA computing - summery 17Presentation of essay and discussion Course outline

Some basics

The contents of the following presentation are based on work discussed in Chapter 2 of DNA Computing by G. Paun, G. Rozenberg, and A. Salomaa Book

 We have seen how DNA can be used to solve various optimization problems.  Leonard Adelman was able to use encoded DNA to solve the Hamiltonian Path for a single- solution 7-node graph.  The drawbacks to using DNA as a viable computational device mainly deal with the amount of time required to actually analyze and determine the solution from a test tube of DNA. Adelman’s experiment

Further considerations  For Adelman’s experiment, he required the use of words of 20 oligonucleotides to encode the vertices and edges of the graph.  Due to the nature of DNA’s 4-base language, this allowed for 420 different combinations.  However longer length oligonucleotides would be required for larger graphs.

 Given the nature of DNA, we can easily determine a set of rules to operate on DNA.  Defining a Rule Set allows for programming the DNA much like programming a computer.  The rule set assume the following:  DNA exists in a test tube  The DNA is in single stranded form Defining a rule set

 Merge simply merges two test tubes of DNA to form a single test tube.  Given test tubes N 1 and N 2 we can merge the two to form a single test tube, N, such that N consists of N 1 U N 2.  Formal Definition: merge(N 1, N 2 ) = N Merge

 Amplify takes a test tube of DNA and duplicates it.  Given test tube N 1 we duplicate it to form test tube N, which is identical to N 1.  Formal Definition: N = duplicate(N 1 ) Amplify

 Detect looks at a test tube of DNA and returns true if it has at least a single strand of DNA in it, false otherwise.  Given test tube N, return TRUE if it contains at least a single strand of DNA, else return FALSE.  Formal Definition: detect(N) Detect

 Separate simply separates the contents of a test tube of DNA based on some subsequence of bases.  Given a test tube N and a word w over the alphabet {A, C, G, T}, produce two tubes +(N, w) and –(N, w), where +(N, w) contains all strands in N that contains the word w and –(N, w) contains all strands in N that doesn’t contain the word w.  Formal Definition:  N ← +(N, w)  N ← -(N, w) Separate / Extract

 Length-Separate takes a test tube and separates it based on the length of the sequences  Given a test tube N and an integer n we produce a test tube that contains all DNA strands with length less than or equal to n.  Formal Definition: N ← (N, ≤ n) Length - separate

 Position-Separate takes a test tube and separates the contents of a test tube of DNA based on some beginning or ending sequence.  Given a test tube N 1 and a word w produce the tube N consisting of all strands in N 1 that begins/ends with the word w.  Formal Definition:  N ← B(N 1, w)  N ← E(N 1, w) Position - Separate

 From the given rules, we can now manipulate our strands of DNA to get a desired result.  Here is an example DNA Program that looks for DNA strands that contain the subsequence AG and the subsequence CT: 1input(N) 2N ← +(N, AG) 3N ← -(N, CT) 4detect(N) A simple example

1. input(N) Input a test tube N containing single stranded sequences of DNA 2. N ← +(N, AG) Extract all strands that contain the AG subsequence. 3. N ← -(N, CT) Extract all strands that contain the CT subsequence. Note that this is done to the test tube that has all AG subsequence strands extracted, so the final result is a test tube which contains all strands with both the subsequence AG and CT. 4. detect(N) Returns TRUE if the test tube has at least one strand of DNA in it, else returns FALSE. Explanation

Now that we have some simple rules at our disposal we can easily create a simple program to solve the Hamiltonian Path problem for a simple 7-node graph as outlined by Adelman. Back to Adelman’s experiment

1input(N) 2N ← B(N, s 0 ) 3N ← +(N, s 6 ) 4N ← +(N, ≤ 140) 5for i = 1 to 5 do begin N ← +(N, s i ) end 6detect(N) The program

1Input(N) Input a test tube N that contains all of the valid vertices and edges encoded in the graph. 2N ← B(N, s 0 ) Separate all sequences that begin with the starting node. 3N ← E(N, s 6 ) Further separate all sequences that end with the ending node. Explanation

5N ← (N, ≤ 140) Further isolate all strands that have a length of 140 nucleotides or less (as there are 7 nodes and a 20 oligonucleotide encoding). 6for i = 1 to 5 do begin N ← +(N, s i ) end Now we separate all sequences that have the required nodes, thus giving us our solutions(s), if any. 7detect(N) See if we actually have a solution within our test tube. Explanation

 In most computational models, we define a memory, which allows us to store information for quick retrieval.  DNA can be encoded to serve as memory through the use of its complementary properties.  We can directly correlate DNA memory to conventional bit memory in computers through the use of the so called sticker Model. Adding Memory

 We can define a single strand of DNA as being a memory strand.  This memory strand serves as the template from which we can encode bits into.  We then use complementary stickers to attach to this template memory strand and encode our bits. The sticker model

How it works  Consider the following strand of DNA: CCCC GGGG AAAA TTTT  This strand is divided into 4 distinct sub-strands.  Each of these sub-strands have exactly one complementary sub-strand as follows: GGGGCCCCTTTTAAAA

How it works  As a double Helix, the DNA forms the following complex: CCCC GGGG AAAA TTTT GGGG CCCC TTTT AAAA  If we were to take each sub-strand as a bit position, we could then encode binary bits into our memory strand.

 Each time a sub-sequence sticker has attached to a sub-sequence on the memory template, we say that that bit slot is on.  If there is no sub-sequence sticker attached to a sub-sequence on the memory template, then we say that the bit slot is off. How it works

Some memory examples  For example, if we wanted to encode the bit sequence 1001, we would have: CCCC GGGG AAAA TTTT GGGG AAAA  As we can see, this is a direct coding of 1001 into the memory template.

 This is a rather good encoding, however, as we increase the size of our memory, we have to ensure that our sub-strands have distinct complements in order to be able to set and clear specific bits in our memory.  We have to ensure that the bounds between subsequences are also distinct to prevent complementary stickers from annealing across borders.  The Biological implications of this are rather difficult, as annealing long strands of subsequences to a DNA template is very error-prone. Disadvantages

 The clear advantage is that we have a distinct memory block that encodes bits.  The differentiation between subsequences denoting individual bits allows a natural border between encoding sub-strands.  Using one template strand as a memory block also allows us to use its complement as another memory block, thus effectively doubling our capacity to store information. Advantages

 Now that we have a memory structure, we can being to migrate our rules to work on our memory strands.  We can add new rules that allows us to program more into our system. So now what?

 Separate now deals with memory strands. It takes a test tube of DNA memory strands and separates it based on what is turned on or off.  Given a test tube, N, and an integer i, we separate the tubes into +(N, i) which consists of all memory strands for which the i th sub-strand is turned on (e.g. a sticker is attached to the i th position on the memory strand). The –(N, i) tube contains all memory strands for which the i th sub-strand is turned off.  Formal Definition: Separate +(N, i) and –(N, i) Separate

 Set sets a position on a memory position (i.e. turns it on if it is off) on a strand of DNA.  Given a test tube, N, and an integer i, where 1 ≤i ≤ k (k is the length of the DNA memory strand), we set the i th position to on.  Formal Definition: set(N, i) Set

 Clear clears a position on a memory position (i.e. turns it off if it is on) on a strand of DNA.  Given a test tube, N, and an integer i, where 1 ≤ i ≤ k (k is the length of the DNA memory strand), we clear the i th position to off.  Formal Definition: clear(N, i) Clear

 Read reads a test tube, which has an isolated memory strand and determines what the encoding of that strand is.  Read also reports when there is no memory strand in the test tube.  Formal Definition: read(N) Read

 To effectively use the Sticker Model, we define a library for input purposes.  The library consists of a set of strands of DNA.  Each strand of DNA in this library is divided into two sections, a initial data input section, and a storage/output section. Defining a library

 The formal notation of a library is as follows: (k, l) library (where k and l are integers, l ≤ k )  k refers to the size of the memory strand  l refers to length of the positions allowed for input data.  The initial setup of the memory strand is such that the first l positions are set with input data and the last k – l positions are clear. Library setup

 Consider the following encoding for a library: (3, 2) library.  From this encoding, we see that we have a memory strand that is of size 3, and has 2 positions allowed for input data.  Thus the first 2 positions are used for input data, and the final position is used for storage/input. A simple example

A quick visualisation Here is a visualisation of this library: CCCCCCCCCCCCEncoding: 000 CCCCGGGGAAAA GGGGCCCC Encoding: 110 CCCCGGGGAAAA CCCCEncoding: 010 CCCCGGGGCCCC GGGGEncoding: 100

 From this visualisation we see that we can achieve an encoding of 2 l different kinds of memory complexes.  We can formally define a memory complex as follows: w0 k-l, where w is the arbitrary binary sequence of length l, and 0 represents the off state of the following k-l sequences on the DNA memory strand. Memory considerations

 Consider the following NP-complete problem: Minimal Set Cover Given a finite set S = {1, 2, …, p} and a finite collection of subsets {C 1, …, C q } of S, we wish to find the smallest subset I of {1, 2, …, q} such that all of the points in S are covered by this subset.  We can solve this problem by using the brute force method of going through every single combination of the subsets {C 1, …, C q }.  We will use our rules to implement the same strategy using our DNA system. An interesting example

 We will use a library with the following attributes: (p+q, q) library.  This basically means that our memory stick has p+q positions to model the p points we want to cover and the q subsets that we have in the problem.  Q will then be our data input positions, which are the q subsets that we have in the problem.  What we basically have is the first q positions as are data input section, and the last p position as our storage area. Using DNA

 The algorithm is rather simple. We encode all of the subsets that we have in our problem into the first q positions of our DNA strand. This represents a potential solution to our problem  Each position in our q positions represent a single subset that is in our problem.  A position that is turned on represents inclusion of that set in the solution.  We simply go through each of the possibilities for the q subsets in our problem. Using DNA

 The p positions represents the points that we have to cover, one position for each point.  The algorithm simply takes each set in q and checks which points in p it covers.  Then it sets that particular point position in p to on.  Once all of the positions in p are turned on, we know that we have a sequence of subset covers that covers all points.  Then all we have to do is look at all solutions and determine which one contains the smallest amount of subset covers. Using DNA

 So far we’ve mapped each subset cover to a position and each point to a position.  However, each subset cover has a set of points, which if covers.  How do we encode this into our algorithm?  We do this by introducing a program specific rule, known as cardinality. But how is it done?

 The cardinality of a set, X, returns the number of elements in a set.  Formally, we define cardinality as: card(X)  From this we can determine what elements are in a particular subset cover in terms of its position relative to the points in p.  Therefore, the elements in a subset C i, where 1 ≤ i ≤ q, are denoted by C i j, where 1 ≤ j ≤ card(C i ). Cardinality

 Now that we can easily determine the elements within each subset cover, we can now proceed with the algorithm.  We check each position in q and if it is turned on, we simply see what points this subset covers.  For each point that it covers, we set the corresponding position in p to on.  Once all positions in p have been turned on, then we have a solution to the problem. Checking each point

for i = 1 to q Separate +(N o, i) and –(N o, i) for j = 1 to card(C i ) Set(+(N o, i), q + c i j ) N o ← merge((N o, i), -(N o, i)) for i = q + 1 to q + p N o ← +(N o, i) The program

Loop through all of the positions from 1 to q for i = 1 to q Now, separate all of the on and off positions. Separate +(N o, i) and –(N o, i) loop through all of the elements that the subset covers. for j = 1 to card(C i ) Set the appropriate position that that element covers in p. Set(+(N o, i), q + c i j ) Now, merge both of the solutions back together. N o ← merge(+(N o, i), -(N o, i)) Unraveling it all

Now we simply loop through all of the positions in p. for i = q + 1 to q + p separate all strands that have position i on. N o ← +(N o, i) This last section of the code ensures that we isolate all of the possible solutions by selecting all of the strands where all positions in p are turned on (i.e. covered by the selected subsets). Unraveling it all

 So now that we have all of the potential solutions in one test tube, we still have to determine the final solution.  Note that the Minimal Set Cover problem finds the smallest number of subsets that covers the entire set.  In our test tube, we have all of the solutions that cover the set, and one of these will have the smallest amount of subsets.  We therefore have to write a program to determine this. Output of the solution

for i = 0 to q – 1 for j = i down to 0 separate +(N j, i + 1) and –(N j, i + 1) N j+1 ← merge(+(N j, i + 1), N j+1 ) N j ← -(N j, i + 1) read(N 1 ); else if it was empty, read(N 2 ); else if it was empty, read(N 3 ); … Finding the solution

 The program takes each test tube and separates them based on number of positions in q turned on.  Thus for example, all memory strands with 1 position in q turned on are separated into one test tube, all memory strands with 2 positions in q turned on are separated into one test tube, etc.  Once this is done, we simply read each tube starting with the smallest number of subsets turned on to find a solution to our problem (of which there may be many). Finding the solution

 The operations outlined above can be used to program more practical solutions to other programs.  One such area is in cryptography, where it is postulated that a DNA system such as the one outlined is capable of breaking the common DES (Data Encryption Standard) used in many cryptosystem.  Using a (579, 56) library, with 20 oligonucleotide length memory strands, and an overall memory strand of 11, 580 nucleotides, it is estimated that one could break the DES with about 4 months of laboratory work. Final considerations

More serious

 There is a solid theoretical foundation for splicing as an operation on formal languages.  In biochemical terms, procedures based on splicing may have some advantages, since the DNA is used mostly in its double stranded form, and thus many problems of unintentional annealing may be avoided.  The basic model is a single tube, containing an initial population of dsDNA, several restriction enzymes, and a ligase. Mathematically this is represented as a set of strings (the initial language), a set of cutting operations, and a set of pasting operations.  It has been proved to a Universal Turing Machine. Tom Head – splicing systems

These are the techniques that are common in the microbiologist's lab and can be used to program a molecular computer. DNA can be:  synthezisedesired strands can be created  separatestrands can be sorted and separated by length  mergeby pouring two test tubes of DNA into one to perform union  extractextract those strands containing a given pattern  melt/annealbreaking/bonding two ssDNA molecules with complementary sequences  amplifyuse of PCR to make copies of DNA strands  cutcut DNA with restriction enzymes  rejoinrejoin DNA strands with 'sticky ends'  detectconfirm presence or absence of DNA Tom Head – splicing systems

 Initial set (finite or infinite) consists of double-stranded DNA molecules  Specific classes of enzymatic activities considered-those of restriction enzymes  Recombinant behavior modeled and associated sets analyzed by new formalism called Splicing Systems  Attention focused on effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and Re-associated to produce further molecules. Tom Head – splicing systems

Circular DNA and Splicing Systems DNA molecules exist not only in linear forms but also in circular forms. Splicing systems

SPLICING LINEAR CIRCULAR Splicing systems

Linear splicing

…ATTGACCC… …CAATCAGG… G|AG|A AT | C ligase …ATTG ACCC… …CAAT CAGG… …ATTGCAGG… …CAATACCC… Splicing in nature

V alphabet r = u1u1 u2u2 u4u4 u3u3 splicing rule u 1, u 2, u 3, u 4  V * (x, y) x, y, z, w  V * x = x 1 u 1 u 2 x 2 y = y 1 u 3 u 4 y 2 x 1, x 2, y 1, y 2  V * (z, w) r r x 1 u 1 u 4 y 2 = z y 1 u 3 u 2 x 2 = w Splicing in DNA computing

 = (V, T, A, R) L(  ) =  * (A)  T * if A, R  FIN then L(  )  REG … with permitting context u1u1 u2u2 u4u4 u3u3 C1C1 C2C2  R if A, R  FIN then L(  )  RE V alphabet T  V terminal alphabet A  V * set of strings R splicing rules C 1, C 2  V * Extended H-system

tAtA h A a t 1 hAhA h A bs 1 cAt A h 1 a At A t1t1 h1ah1a h A At A hAhA h 1 abs 1 ct 1  h 1 bs 1 cAt 1 {s 1 }  {h 1, s 1 }  {h A, s 1 }  {s 1, t A }  1 2 3 4 h 1 a bs 1 ct 1 h A bs 1 c t 1 h 1 bs 1 cA t A h 1 bs 1 cAt 1 1 2 3 4 h1ah1a h A At A t1t1 tAtA 3 Rotation h 1 a At A h 1 at 1 h A a t 1 h A at A h A At A h 1 at 1

 : (x u 1 u 2 y, wu 3 u 4 z) r = u 1 |u 2 $ u 3 |u 4 rule (x u 1 u 4 z, wu 3 u 2 y) xy wz x w zcut paste y sites Pattern recognition u1u1 u2u2 u3u3 u4u4 u1u1 u2u2 u3u3 u4u4 x u1u1 z u4u4 w u3u3 u2u2 y Păun’s linear splicing operation (1996)

Circular splicing

restriction enzyme 1restriction enzyme 2 ligase enzymes Circular splicing

Conjugacy relation on A* w, w  A*, w ~ w  w = xy, w = yx Example abaa, baaa, aaab, aaba are conjugates A o = A* o = set of all circular words o w = [w] o, w  A* Circular languages

Circular language C  A o set of equivalence classes A* A*  o  L Cir(L) = { o w | w  L} (circularization of L) C L C {w  A*| o w  C}= Lin(C) (Full linearization of C) (A linearization of C, i.e. Cir(L)=C ) Circular languages

FA o ={ C  A o |  L  A*, Cir(L) = C, L  FA, FA  Chomsky hierarchy} Definition Theorem [Head, Păun, Pixton] C  Reg o  Lin (C)  Reg Circular languages

Păun’s definition (A= finite alphabet, I  A o initial language) SC PA = (A, I, R) R  A* | A* $ A* | A* rules ohu1u2,ohu1u2, oku3u4oku3u4  A o r = u 1 | u 2 $ u 3 | u 4  R u2hu1u2hu1 u4ku3u4ku3 o u 2 hu 1 u 4 ku 3 Circular splicing systems

Definition A circular splicing language C(SC PA ) (i.e. a circular language generated by a splicing system SCPA) is the smallest circular language containing I and closed under the application of the rules in R. Circular splicing systems

Head’s definition SC H = (A, I, T) T  A*  A*  A* triples  A o (p, x, q ), (u,x,v)  T vkux o hpx vkux q o hpxq, o kuxv q hpx SC PI = (A, I, R)  A o ( ,  ;  ), ( ,  ;  )  R oh  h oh  h  oh,oh, o  h h h  Pixton’s definition R  A*  A*  A* rules h  Other splicing systems (A= finite alphabet, I  A o initial language)

Characterize FA o  C(Fin, Fin) C(Reg, Fin) class of circular languages C= C(SC PA ) generated by SC PA with I and R both finite sets. Problem

Theorem [Păun96] F  {Reg o, CF o, RE o } R +add. hyp. (symmetry, reflexivity, self-splicing) Theorem [Pixton95-96] R  Fin+add. hyp. (symmetry, reflexivity) C(F, Fin)  F F  {Reg o, CF o, RE o } C(F, Reg)  F C(Reg o, Fin)  Reg o, Problem

CS o CF o Reg o o ((aa)*b) o (aa)* o (a n b n ) I= o aa  o 1, R={aa | 1 $ 1 | aa}I= o ab  o 1, R={a | b $ b | a} Circular finite splicing languages

Circular automata

J. Kari and L. Kari Context-free Recombinations, words, sequences, languages where computer science, biology and linguistics meet, C. Martin-Vide, V. Mitrana (Eds.). Kluwer, the Netherlands. Finite automata for circular languages

Definition  Finite automaton A, circular language K-accepted by A, L( A ) o K, all words w o such that A has a cycle labeled by w  K–Acceptance Circular/linear language accepted by a finite automaton A, defined as L(A)  o L(A), L(A) linear language accepted by automaton A defined in the usual way Definition A circular/linear language L   *  o is regular if there is a finite automaton A that accepts the circular and linear parts of L, i.e. that accepts L  * and L   o Finite automata for circular languages

The following definition is equivalent to a definition given by Pixton: the circular language accepted by a finite automaton is a set of all words that label a loop containing at least one initial and one final state. Definition Given a finite automaton A, the circular language accepted by A, L(A) o P is the set of all words o w such that A has a cycle labeled by w that contains at least one final state. P-acceptance

The circular languages accepted by finite automaton by the following definition coincide with the regular circular languages introduced by Head. Given a finite automation A, the circular language accepted by A, L( A ) o H is the set of all words o w such that w = u v and v u  L( A ) Pixton has shown that if in addition we assume that the family of languages is closed under repetition (i.e., w n is in the language whenever w is) H – acceptance and P – Acceptance are equivalent H-acceptance

Advantages of K-acceptance The same automaton accepts both the linear and circular components of the language K-acceptance

Counting problem

T. Head, Circular Suggestions for DNA Computing, in: Pattern Formation in Biology, Vision and Dynamics, Eds. A.Carbone, M Gromov and P. Prusinkiewicz, World Scientific, Singapore, 2000, pp. 325-335. J. Kari, A Cryptosystem Based on Propositional Logic, in: Machines, Languages and Complexity, 5th International Meeting of Young Computer Scientists, Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow and J.Kelemen, LNCS 381, Springer, 1989, pp.210-219. Rani Siromoney, Bireswar Das, DNA Algorithm for Breaking a Propositional Logic Based Cryptosystem, Bulletin of the EATCS, Number 79, February 2003, pp.170-176. Sources

Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model Combine features in Divide-Delete-Drop (D-D-D) (Leiden) and CUT-EXPAND-LIGATE (C-E-L) (Binghamton) to form CUT-DELETE- EXPAND-LIGATE (C-D-E-L) model This enables us to get an aqueous solution to 3SAT which is a counting problem and known to be in IP. 3SAT Defined as follows: Instance: F a propositional formula of form F = C 1  C 2  …C m where C i are clauses and i = 1, 2, …, m. Each C i is of the form ( l i1  l i2  l i3 ) where l i j, j = 1, 2, 3 are literals from the set of variables {x 1, x 2, …, x n } Question What is the number of truth assignments that satisfy F? C-D-E-L model

Standard double stranded DNA cloning plasmid are commercially available. A plasmid is a circular molecule approximately 3 kb. It contains a sub-segment, MCS (multiple cloning site) of approximately 175 base pairs that can be removed using a pair of restriction enzyme sites that flank the segment. The MCS contains pair-wise disjoint sites at which restriction enzymes act such that each produces a 5’ overhang. Data register molecule

In C-D-E-L, a segment of the plasmid used is of the form …c 1 s 1 c 1 …c 2 s 2 c 2 …c n s nn c n … Where c i are called sites, such that no other subsequence of plasmid matches with this sequence and s i are called stations and i=1,…,n In D-D-D, lengths of stations are required to be the same However in C-D-E-L, lengths of stations all different which is fundamental in solving #3SAT Bio-molecular operations used in C-D-E-L are similar to the operations in C-E-L C-D-E-L model

x 1, …, x n the variables in F,  x 1, …,  x n their negations s i station associated with x i  s i station associatd with  s i c i site associated with station s i  c i site associated with station  s i v i length of station associated with x i, i=1, …, n v n+j length of station associated with literal  x j, j=1,…, n Choose stations in such a way that the sequence [ v 1, …, v 2n ] satisfies the property k  v i < v k+1, k = 1, …, 2n-1 i=1 i.e. an Super-increasing (Easy) Knapsack Sequence From sum, sub-sequence efficiently recovered. Design

 Solution in C n is analyzed by gel separation  If more than one solution is present, they will be of different lengths, thus will form separate bands  By counting number of bands we count the number of satisfying assignments.  Furthermore, from lengths of satisfying assignment,exact assignment is read.  This can be done since stations have lengths from easy knapsack sequence any subsequence of an easy knapsack sequence has different sum from the sums of other subsequences. Solution

C-D-E-L model

Thus solution to 3–SAT viz. finding the number of satisfying assignments is effectively done. Moreover, reading the truth assignments is a great advantage to break the cryptosystem based on propositional logic Solution

Advantage over previous method of attack  In the cryptanalytic attack proposed earlier, modifying D-D-D, it was required to execute the DNA algorithm for each bit in the crypto-text  But in the present method proposed, using C-D-E-L (combining features of C-C-C and C-E-L ) apply 3-SAT on P and read any satisfying assignment from the final solution  This gives an equivalent public key, which amounts to breaking the cryptosystem Advantage

 H-system  Lipton[94-95a-95b] Formalization and generalization of Adleman’s approach to other NP-complete problems.  Ex H-system  Circular H-system  Sticker system  P-system Splicing systems so far

For computational strength  Turing Equivalence Expansion  Finiteness & Regularity  More Operator Formalization To confirm homogeneity  HPP solving & AGL Splicing systems so far

Molecular application

 Separating and fusing DNA strands  Lengthening of DNA  Shortening DNA  Cutting DNA  Multiplying DNA Operations of DNA molecules

Denaturation  separating the single strands without breaking them  weaker hydrogen than phosphodiester bonding  heat DNA (85° - 90° C) Renaturation  slowly cooling down  annealing of matching, separated strands Separating and fusing DNA strands

Machinery for Nucleotide Manipulation  Enzymes are proteins that catalyze chemical reactions.  Enzymes are very specific.  Enzymes speed up chemical reactions extremely efficiently (speedup: 1012)  Nature has created a multitude of enzymes that are useful in processing DNA. Enzymes

 DNA polymerase enzymes add nucleotides to a DNA molecule Requirements  single-stranded template  primer,  bonded to the template  3´-hydroxyl end available for extension  Note: Terminal transferase needs no primer. Lengthening DNA

DNA nucleases are enzymes that degrade DNA. DNA exonucleases  cleave (remove) nucleotides one at a time from the ends of the strands  Example: Exonuclease III 3´- nuclease degrading in 3´- 5´direction Shortening DNA

DNA nucleases are enzymes that degrade DNA. DNA exonucleases  cleave (remove) nucleotides one at a time from the ends of the strands  Example: Bal31 removes nucleotides from both strands Shortening DNA

DNA nucleases are enzymes that degrade DNA. DNA endonucleases  destroy internal phosphodiester bonds  Example: S1 cuts only single strands or within single strand sections Restriction endonucleases  much more specific  cut only double strands  at a specific set of sites (EcoRI) Cutting DNA

 Amplification of a „small“ amount of a specific DNA fragment, lost in a huge amount of other pieces.  „Needle in a haystack“  Solution: PCR = Polymerase Chain Reaction  devised by Karl Mullis in 1985  Nobel Prize  a very efficient molecular copy machine Multiplying DNA

Start with a solution containing the following ingredients:  the target DNA molecule  primers (synthetic oligonucleotides), complementary to the terminal sections  polymerase, heat resistant nucleotides PCR - initialisation

 Solution heated close to boiling temperature.  Hydrogen bonds between the double strands are separated into single strand molecules. PCR – denaturation

 The solution is cooled down (to about 55° C).  Primers anneal to their complementary borders. PCR - priming

 The solution is heated again (to about 72° C).  Polymerase will extend the primers, using nucleotides available in the solution.  Two complete strands of the target DNA molecule are produced. PCR - extension

2n copies after n steps PCR – copying

Measuring the Length of DNA Molecules  DNA molecules are negatively charged.  Placed in an electric field, they will move towards the positive electrode.  The negative charge is proportional to the length of the DNA molecule.  The force needed to move the molecule is proportional to its length.  A gel makes the molecules move at different speeds.  DNA molecules are invisible, and must be marked (ethidium bromide, radioactive) Gel electrophoresis

 reading the exact sequence of nucleotides comprising a given DNA molecule  based on  the polymerase action of extending a primed single stranded template  nucleotide analogues  chemically modified e.g., replace 3´-hydroxyl group (3´-OH) by 3´-hydrogen atom (3´-H)  dideoxynucleotides: - ddA, ddT, ddC, ddG  Sanger method, dideoxy enzymatic method Sequencing

Objective  We want to sequence a single stranded molecule a. Preparation  We extend a at the 3´ end by a short (20 bp) sequence g, which will act as the W-C complement for the primer compl(g). l Usually, the primer is labelled (radioactively, or marked fluorescently)  This results in a molecule b´= 3´- ga. Sequencing 5' ATTAGACGTCCGTGCAATGC 3' 3'ACGTTACG 5'

Sequencing

4 tubes are prepared  Tube A, Tube T, Tube C, Tube G  Each of them contains  b molecules  primers, compl(g)  polymerase  nucleotides A, T, C, and G.  Tube A contains a limited amount of ddA.  Tube T contains a limited amount of ddT.  Tube C contains a limited amount of ddC.  Tube G contains a limited amount of ddG. Sequencing

Structure of ddTTP

Termination with ddTTP

Reaction in Tube A  The polymerase enzyme extends the primer of b´, using the nucleotides present in Tube A: ddA, A, T, C, G.  using only A, T, C, G:  b´ is extended to the full duplex.  using ddA rather than A:  complementing will end at the position of the ddA nucleotide. Sequencing

Sequencing - stopping

Tube C  GCCTGCAGATTA  C  CGGAC  CGGACGTC Tube G  GCCTGCAGATTA  CG  CGG  CGGACG Sequencing -results Tube A  GCCTGCAGATTA  CGGA  CGGACGTCTA  CGGACGTCTAA Tube T  GCCTGCAGATTA  CGGACGT  CGGACGTCT  CGGACGTCTAAT

Sequencing – reading the results

Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field.

Similar presentations

Presentation on theme: "Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field.

Similar presentations

Presentation on theme: "Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field."— Presentation transcript:

Similar presentations

About project

Feedback