Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004
Computational Complexity - An Overview We are primarily interested in efficient algorithms Efficient means that the running time of the algorithm is bounded by some polynomial function p(n) The size of the problem is measured by n We use big-oh notation, e.g. O(n 2 ), in which lower order terms are ignored Thus for small problem sizes, an O(n 2 ) algorithm may run slower than an O(n) one
Computational Complexity - An Overview This means that we are talking about asymptotic behavior An inefficient algorithm is one whose asymptotic efficiency is exponential - e.g. O(2 n ) Problems for which efficient algorithms exist belong to a class P Problems for which no efficient algorithms are known to exist belong to class NP
NP-complete Problems An important subset of these problems is called NP-complete The solutions to problems in NP, once found, can be checked in polynomial time NP includes the class P as a subset Any NP-complete problem can be transformed in polynomial time to an instance of any other NP- complete problem So all NP-complete problems are equivalent under polynomial transformation
NP-complete Problems So, if a polynomial time algorithm is found for one NP-complete problem, there are polynomial time algorithms for all NP- complete problems If so, then P=NP Most researchers believe that P NP The model of computation that is used in defining NP-complete problems is the Nondeterministic Turing Machine
NP-hard Problems Classes P and NP include only decision problems - the answer is yes or no An NP-hard problem is one which is at least as hard as NP-complete problems If an NP-hard problem can be solved in polynomial time, then so can all NP-complete problems NP-hard problem is not necessarily a decision problem
NP-hard Problems NP-complete NP-hard Example: does there exist a solution to the Traveling Salesman problem is NP-hard and NP-complete. Find a solution to the Traveling Salesman is NP-hard, but not NP-complete (not decision form) But if we have a polynomial solution for the 2nd, we can use it to solve the 1st (and hence all NP-complete problems)
NP-completeness Initially, several hard problems were shown to solvable in polynomial time on a nondeterministic TM Polynomial time reductions between the problems were also shown Nowadays, to show a problem is NP-complete Verify the problem is in NP (solution can be verified in polynomial time) Show a polynomial time reduction of any NP-complete to your problem
NP-completeness So when faced with an NP-complete or NP-hard problem - what to do? See if a meaningful restriction of the problem can be solved in polynomial time See if the size of the problem in practice is always small Devise a polynomial time approximation algorithm - guaranteed to find a near optimal solution Devise heuristics
Algorithmic Implications We are trying to solve a real-life problem The models we use may give us many solutions, but we want to find the one solution which corresponds to the real ordering of the clones in the target DNA Use the algorithmic results in an iterative fashion with the experimental biologist
Algorithmic Implications A mapping algorithm should Work better with more data, assuming a constant error rate Give a solution which makes it clear how it was obtained and tell which parts of the solution are good and which bad Give all candidate solutions
An algorithm for C1P This algorithm determines whether an n m matrix has the C1P for rows Assume All rows different No row is all zero Let S i be the set of columns of row i with value 1 then i and j we can have S i S j = . S i S j or S j S i S i S j and neither of them is a subset of the other
An algorithm for C1P In the first case, we don’t need to consider the two rows together, so we separate them into two components Deal with them separately For non-empty intersection Suppose there is a row that is either a subset or has empty intersection with every row in the component - move it out of the component
An algorithm for C1P To see if two rows belong to the same component Build a graph G c using M Each vertex of G c will be a row from M There will be an undirected edge from i to j if S i S j and neither of them is a subset of the other So the components we want are the connected components of G c
Basic Algorithm The algorithm will have the following phases Separate rows into components according to above rules Permute the columns of each component to achieve C1P for component Join components together
Example Matrix c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l l2l l3l l4l l5l l6l l7l l8l
Example Graph l1l1 l2l2 l3l3 l5l5 l4l4 l7l7 l6l6 l8l8
Placing Rows in a Component c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l l2l l3l How can the first row (by itself) be arranged? (Keep track of all possibilities) l 1 … … {2, 7, 8} {2, 7, 8} {2, 7, 8} Now add the second row - it can go to right or left of first l 1 … … l 2 … … {5} {2, 7} {2, 7} {8}
Placing Rows in a Component How do we place the third row? In the graph, there are edges for both rows already placed. Let’s place the third with respect to the second Does it go to the right or to the left? If |l 1 l 3 |<min(|l 1 l 2 |, |l 2 l 3 |) - same direction second w.r.t. first, else opposite direction In our case, we have to place in the opposite (right direction) as shown on the next slide
Placing Rows in a Component l 1 … … l 2 … … {5} {2} {7} {8} {1, 4} {1, 4} l 3 … …
Placing Rows in a Component All of the other rows in the component are placed in the same way, using two previously place rows: One which has an edge to the row to be placed in the graph Second has an edge to the previous row in the graph
Joining Components Together For the next part of the solution, we use a graph G M which tells us how the components fit together Each component of the original matrix will be a vertex in G M A directed edge is added between and if the sets S i for all i in are contained in at least one set S j of component
Example Graph
Joining Components Together We process components not contained in any other component first So process the components in the topological order of the graph We may come up with multiple solutions if one or more columns is not constrained to one value The algorithm is polynomial