DNA Solution of the Maximal Clique Problem Cell and Microbial Engineering Laboratory Lee Ji Youn
Introduction Clique Maximal clique problem : defined as a set of vertices in which every vertex is connected to every other vertex by an edge Maximal clique problem : Given a network containing N vertices and M edges, how many vertices are in the largest clique? Finding the size of the largest clique has been proven to be an NP-complete problem
Algorithm Step 1. Make the complete data pool Step 2. For a graph with N vertices, each possible clique is represented by an N-digit binary number 1: a vertex in the clique 0: a vertex out of the clique example) clique (4,1,0) binary number 010011 Step 2. Find pairs of vertices in the graph that are not connected by an edge (0,2) (0,5) (1,5) (1,3) The complementary graph
Step 3. Eliminate from the complete data pool all numbers containing connections in the complementary graph xxx1x1 or 1xxxx1 or 1xxx1x or xx1x1x Step 4. Sort the remaining data pool to find the data containing the largest number of 1’s the clique with the largest number of 1’s tells us the size of the maximal clique
Experiment - Construction of DNA molecules Complete data pool two DNA sections bit’s value (Vi) V0~V5 0 bp when Vi =1 10 bp when Vi =0 position value (Pi) P0~P6 20 bp Longest = 610 + 720 = 200bp (000000) Shortest = 60 + 720 = 140bp (111111) dsDNA
sequence construction - randomly generated to avoid mispairing, avoid accidental homologies longer than 4bp embedded restriction sequences within each Vi =1 POA (parallel overlap assembly)
Experiment - POA POA (parallel overlap assembly) 12 oligonucleotides PiViPi+1 for even i <Pi+1 Vi Pi> for odd i P0V0P1 P2V2P3 P4V4P5 <P2V1P1> <P4V3P3> <P6V5P5> PCR with P0 and <P6> as primers (lane2 in fig3)
Experiment - Digestion with restriction enzyme Break DNA : internal sequence Vi =1 PCR with P0 and <P6> as primers broken sting were not amplified Division of the data pool into two test tube t0 : Alf II cut Vo=1 t1 : Spe I cut V2=1 combine t0 and t1 into test tube t, which did not contain xxx1x1
Experiment - Digestion and PCR amplification Elimination all strings connected by edges xxx1x1, 1xxxx1, 1xxx1x, xx1x1x PCR amplification of remaining data DNA Fig 3, Lane 5: digestion result Lane 6: PCR result
Experiment - Readout the data Reading the size of the largest clique(s) shortest length : 160bp four vertices What is the maximal clique? 6C4 = 15, 15 different strings read the answer by molecular cloning 1) insertion the DNA into M13 bacteriophage through site-directed mutagenesis 2) transfection of the mutagenized M13 phase DNA into E.coli 3) cloning 4) DNA extraction and sequencing
Result correct answer 111100
Discussion - Major error Production of ssDNA during PCR cannot be cut by restriction enzymes solution : digestion of the ssDNA with S1 nuclease before restriction digestion Incomplete cutting by restriction enzymes repetition of digestion-PCR process increase the signal-to-noise
Discussion - Strengths and Weaknesses high parallelism Weaknesses limitation on the number of vertices that this algorithm can handle maximum number of vertices with picomole operations = 27 (36 vertices with nanomole) exponential increase in the size of the pool with the size of the problem Further scale-up becomes impractical New algorithms are needed
Discussion - Future direction Rapid and accurate data access is needed biotin-avidin purification electrophoresis DNA cloning too slow/ too noising biochip is needed to accelerate readout