Maximum clique
1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field 5Splicing systems 6P systems 7Hairpins 8Detection techniques 9Micro technology introduction 10Microchips and fluidics 11Self assembly 12Regulatory networks 13Molecular motors 14DNA nanowires 15Protein computers 16DNA computing - summery 17Presentation of essay and discussion Course outline
NP complete continued
Some problems are undecidable: no computer can solve them. e.g. Turing’s “Halting Problem” Other problems are decidable, but intractable: as they grow large, we are unable to solve them in reasonable time What constitutes “reasonable time”? tractibility
P =set of problems that can be solved in polynomial time NP =set of problems for which a solution can be verified in polynomial time P NP The big question: Does P = NP? P and NP summary
The NP-Complete problems are an interesting class of problems whose status is unknown No polynomial-time algorithm has been discovered for an NP-Complete problem No suprapolynomial lower bound has been proved for any NP-Complete problem, either Intuitively and informally, what does it mean for a problem to be NP-Complete? NP-complete problems
A problem P can be reduced to another problem Q if any instance of P can be rephrased to an instance of Q, the solution to which provides a solution to the instance of P. This rephrasing is called a transformation Intuitively: If P reduces in polynomial time to Q, P is “no harder to solve” than Q reduction
Though nobody has proven that P != NP, if you prove a problem NP-Complete, most people accept that it is probably intractable Therefore it can be important to prove that a problem is NP-Complete Don’t need to come up with an efficient algorithm Can instead work on approximation algorithms Why prove NP-completenss
What is a clique of a graph G? Answer: a subset of vertices fully connected to each other, i.e. a complete subgraph of G The clique problem: how large is the maximum- size clique in a graph? Can we turn this into a decision problem? Answer: Yes, we call this the k-clique problem Is the k-clique problem within NP? clique
What should the reduction do? Answer: Transform a 3-CNF formula to a graph, for which a k-clique will exist (for some k) iff the 3-CNF formula is satisfiable clique
The reduction: Let B = C 1 C 2 … C k be a 3-CNF formula with k clauses, each of which has 3 distinct literals For each clause put a triple of vertices in the graph, one for each literal Put an edge between two vertices if they are in different triples and their literals are consistent, meaning not each other’s negation Run an example: B = (x y z) ( x y z ) (x y z ) clique
Prove the reduction works: If B has a satisfying assignment, then each clause has at least one literal (vertex) that evaluates to 1 Picking one such “true” literal from each clause gives a set V’ of k vertices. V’ is a clique (Why?) If G has a clique V’ of size k, it must contain one vertex in each clique (Why?) We can assign 1 to each literal corresponding with a vertex in V’, without fear of contradiction clique
A clique of a graph G=(V,E) is a subgraph C that is fully-connected (every pair in C has an edge). CLIQUE: Given a graph G and an integer K, is there a clique in G of size at least K? CLIQUE is in NP: non-deterministically choose a subset C of size K and check that every pair in C has an edge in G. This graph has a clique of size 5 Clique problem, summary
Maximum clique with DNA
Clique a set of vertices defined as a set of vertices in which every vertex is connected to every other vertex by an edge Maximal clique problem Given a network containing N vertices and M edges, how many vertices are in the largest clique? Finding the size of the largest clique has been proven to be an NP- complete problem Introdcution
complete data pool Step 1 Make the complete data pool For a graph with N vertices, each possible clique is represented by an N-digit binary number 1: a vertex in the clique 0: a vertex out of the clique i.e. i.e. clique (4,1,0) binary number Step 2 Find pairs of vertices in the graph that are not connected by an edge (0,2) (0,5) (1,5) (1,3) The complementary graph Algorithm
Step 3 Eliminate from the complete data pool all numbers containing connections in the complementary graph xxx1x1 or 1xxxx1 or 1xxx1x or xx1x1x Step 4 Sort the remaining data pool to find the data containing the largest number of 1’s the largest number of 1’s size the clique with the largest number of 1’s tells us the size of the maximal clique Algorithm
two DNA sections bit’s value bit’s value (Vi) V 0 ~V 5 0 bp when V i =1 10 bp when V i =0 position value position value (Pi) P 0 ~P 6 20 bp Longest = 6 20 = 200bp (000000) Shortest = 6 20 = 140bp(111111) dsDNA Construction of DNA molecules
sequence construction - randomly generated to avoid mispairing, avoid accidental homologies longer than 4bp restriction sequences embedded restriction sequences within each V i =1 POA (parallel overlap assembly) Construction of DNA molecules
POA (parallel overlap assembly) with 12 oligonucleotides P i V i P i+1 for even i for odd i P 0 V 0 P 1 P 2 V 2 P 3 P 4 V 4 P 5 PCR with P 0 and as primers (lane2 in fig3) POA
Construction of DNA molecules
Break DNA : internal sequence V i =1 PCR with P 0 and as primers broken sting were not amplified Division of the data pool into two test tube t 0 : Alf IIcut V o =1 t 1 : Spe Icut V 2 =1 combine t 0 and t 1 into test tube t, which did not contain xxx1x1 Digestion of restriction enzymes
Elimination all strings connected by edges xxx1x1, 1xxxx1, 1xxx1x, xx1x1x PCR amplification of remaining data DNA ( Fig 3), Lane 5: digestion result Lane 6: PCR result Digestion and PCR amplification
Reading the size of the largest clique(s) shortest length : 160bp four vertices What is the maximal clique? 6 C 4 = 15, 15 different strings read the answer by molecular cloning 1 insertion the DNA into M13 bacteriophage through site-directed mutagenesis 2 transfection of the mutagenized M13 phase DNA into E.coli 3 cloning 4 DNA extraction and sequencing Readout
correct answer Readout
Production of ssDNA during PCR cannot be cut by restriction enzymes solution : digestion of the ssDNA with S1 nuclease before restriction digestion Incomplete cutting by restriction enzymes repetition of digestion-PCR process increase the signal-to-noise discussion - major error
Strengths high parallelism Weaknesses limitation on the number of vertices that this algorithm can handle maximum number of vertices with picomole operations = 27 (36 vertices with nanomole) exponential increase in the size of the pool with the size of the problem Further scale-up becomes impractical New algorithms are needed Discussion - strengths and weaknesses
Rapid and accurate data access is needed biotin-avidin purification electrophoresis DNA cloning too slow/ too noising biochip is needed to accelerate readout Discussion – future direction
Clique in microreactors
all possible solutions {000} {001} {010} {011} {100} {101} {110} {111} clauses (x=1)^(y=0)^(z=1) Selection principle
Positive selection
Negative selection
Logical operations
logical NOT operations Logical operations
a ba b logical AND operations Logical operations
a ba b logical OR operations Logical operations
magnet Microreactor structure
magnet Microreactor structure
Selection principle
DNA input and transport principle
6 nodes, 2 initial answers 6 Max: S ABCDE = Maximal cliques
ABCDEF A B C D E F Maximal cliques – connectivity matrix
SA=0SA=0 SE=0SE=0 SD=0SD=0 SC=1SC=1SC=0SC=0 SB=0SB=0 SA=0SA=0SA=1SA=1 SF=0SF=0 SF=1SF=1 Maximal cliques – flow diagram
0xxxxx 00xxxx 0xx0xx 00x0xx 0xxx0x 00xx0x 0xx00x x0x00x 00x00x 0xxxxx 00xxxx 0xx0xx x0x0xx 00x0xx 0xxxxx x0xxxx 00xxxx 0xxxxx xxxxxx XXXXXX with x={0,1} SA=0SA=0 SA=0SA=0 SA=0SA=0 SA=0SA=0SE=0SE=0 SD=0SD=0 SC=1SC=1SC=0SC=0 SB=0SB=0 SA=0SA=0SA=0SA=0SA=1SA=1 SA=0SA=0SF=0SF=0SF=1SF=1 Maximal cliques – flow diagram
DNA in DNA out Optical control DNA computer design
DNA computer design – selection modules
DNA information flow
100 m Flow separation – laminar flow
100 m Flow separation – laminar flow
Micro fabrication
DNA computer design – 20 nodes
word codes optical programmability usage of masks to programme immobilisation of DNA to paramagnetic beads hybridisation of DNA-strands DNA sequence handling
The DNA library
PBS1: 5'-GCCCTAAAGGATCCACGTAAGGTCCTATGC V0-1: 5'-AACCACCAACCAAACC V0-0: 5'-AAAACGCGGCAACAAG V1-1: 5'-TCAGTCAGGAGAAGTC V1-0: 5'-TCTTGGGTTTCCTGCA V2-1: 5'-TTTTCCCCCACACACA V2-0: 5'-TTGGACCATACGAGGA V3-1: 5'-CGTTCATCTCGATAGC V3-0: 5'-AGAGTCTCACACGACA V4-1: 5'-AAGGACGTACCATTGG V4-0: 5'-CTCTAGTCCCATCTAC V5-1: 5'-CAACGGTTTTATGGCG V5-0: 5'-GCGCAATTTGGTAACC V6-1: 5'-TAGCAGCTTCCTTACG V6-0: 5'-ACACTGTGCTGATCTC V7-1: 5'-CACATGTGTCAGCACT V7-0: 5'-TGTGTGTGCCTACTTG V8-1: 5'-GATGGGATAGAGAGAG V8-0: 5'-AATCCCACCAGTTGAC V9-1: 5'-ATGCAGGAGCGAATCA V9-0: 5'-GCTTGTTCAACCTGGT V10-1: 5'-CCCAGTATGAGATCAGV10-0: 5'-CTGTCCAAGTACGCTA V11-1: 5'-ATCGAGCTTCTCAGAGV11-0: 5'-TGTAGAGGCTAGCGAT PBS2: 5'-TGGTTTGGCGGCTTTAGAATTCTGTGACAC The DNA library
DNA hybridisation
100 m DNA hybridisation
liquid handling DNA computer robotics detection system sorting module computer control DNA computer control
3.5mm