Chapter 13 NP-Completeness Graphs 4/21/2017 3:27 PM Chapter 13 NP-Completeness x x x x x x x x 1 1 2 2 3 3 4 4 12 22 32 11 13 21 23 31 33 NP-Completeness
Dealing with Hard Problems What to do when we find a problem that looks hard… I couldn’t find a polynomial-time algorithm; I guess I’m too dumb. NP-Completeness
Dealing with Hard Problems Sometimes we can prove a strong lower bound… (but not usually) I couldn’t find a polynomial-time algorithm, because no such algorithm exists! NP-Completeness
Dealing with Hard Problems NP-completeness let’s us show collectively that a problem is hard. I couldn’t find a polynomial-time algorithm, but neither could all these other smart people. NP-Completeness
Polynomial-Time Decision Problems To simplify the notion of “hardness,” we will focus on the following: Polynomial-time as the cut-off for efficiency so O(n10) and 107n are efficient? (Not practically speaking, but don’t occur in practice much.) Decision problems: output is 1 or 0 (“yes” or “no”) Note we change form of problem to make verifying it easier – Want a yes/no answer. Examples: Does a given graph G have an Euler tour? Does a text T contain a pattern P? Does an instance of 0/1 Knapsack have solution of benefit K? Does a graph G have an MST with weight at most K? NP-Completeness
The Complexity Class P Use term “language” to mean problem. Use term “accepted” to mean “determine if we can answer yes to the problem”. A complexity class is a collection of languages (representing problems) P is the complexity class consisting of all languages that are accepted by polynomial-time algorithms. In other words, I can determine membership in the language in polynomial-time. NP-Completeness
The term non-determinism NP is the complexity class consisting of all languages accepted by polynomial-time non-deterministic algorithms. non-determinism does not mean that the results are different each time. non-determinism does mean that you aren’t specifying how the choices are made. Example: Someone asks you to verify that you can fill the knapsack with items of value K or more. You construct a set of items that has that property. The part that could be non-deterministic is how you select the items. NP-Completeness
Other ways of thinking of NP An NP problem is one in which the answer can be checked in polynomial time. In other words, if someone tells you what choice to make whenever you aren’t sure, can you verify their suggestions in polynomial time? Think about having a “fat” machine. Anytime you don’t know which choice to make, try all choices simultaneously. You are allowed to “clone” your resources without penalty. NP-Completeness
Can you think of an example that can’t even be checked in Polynomial time? Example: This number is prime. n=1 (a single number) Example: This TSP (travelling salesperson problem) solution is optimal – Note the problem has not been simplified. If we said, this graph has a TSP <100, we would be fine. Example: Generate all permutations. We say this problem is “poorly formed” as the output itself is exponential. NP-Completeness
The Complexity Class NP We know: P is a subset of NP, but is it a proper subset? Major open question: P=NP? Most researchers believe that P and NP are different. NP-Completeness
An Interesting Problem A Boolean circuit is a circuit of AND, OR, and NOT gates; the CIRCUIT-SAT (circuit satisfiability) problem is to determine if there is an assignment of 0’s and 1’s to a circuit’s inputs so that the circuit outputs a 1 (is satisfied). NP-Completeness
CIRCUIT-SAT is in NP Easy to show a correct solution can be verified in polynomial time. Non-deterministically choose a set of inputs and the outcome of every gate, then test each gate’s output. Finding a correct solution is TOUGH! NP-Completeness
NP-hard A problem L is NP-hard (is the hardest problem in the set NP) if every problem (“the complete set”) in NP can be reduced to L in polynomial time. That is, for each language M in NP, we can take an input x for M, transform it in polynomial time to an input x’ for L such that x is in M if and only if x’ is in L. An NP-hard problem that is also in NP (can be verified in polynomial time) is termed NP-complete This transformation time is important- as otherwise the transformation could be doing all the work. NP-Completeness
NP-hard This transformation time is important- as otherwise the transformation could be doing all the work. Example:The hitting set problem is an NP-complete problem in set theory. For a given list of sets, a hitting set is a set of elements so that each set in the given list is 'touched' by the hitting set. In the hitting set problem, the task is to find a small hitting set. Map finding a hitting set into min(a,b). Map as follows. Let a be the size of the smallest hitting set that includes X. Let b be the size of the smallest hitting set that does not include X. Now the “mapped to” problem, just has to compare two things. Almost all the work happened in the transformation. So we learn nothing about the difficulty of min by this transformation. NP-Completeness
NP-Completeness NP poly-time L We demonstrate NP-hard by mapping known problems to problems of unknown difficulty. NP poly-time L NP-Completeness
Mapping Often students get confused which way to map problems. I have two problems H (hard), and U(unknown), which way do I map the problem to learn something? HU or U H? H U, as when we solve U, we also solve H, making a solution to U at least as hard as H. If you map it the other way, the hard problem could be doing MUCH MORE than the easy problem. The easy problem could be solved as a side effect. NP-Completeness
Cook-Levin Theorem NP poly-time CIRCUIT-SAT is NP-complete. We already showed it is in NP (as we can verify correctness in polynomial time). To prove it is NP-hard, we have to show that every language in NP can be reduced to it. Let M be in NP, and let x be an input for M. Let y be a certificate (certifies there is a solution) that allows us to verify membership in M in polynomial time, p(n), by some algorithm D. Let S be a circuit of size at most O(p(n)2) that simulates a computer (details omitted…) NP poly-time M CIRCUIT-SAT NP-Completeness
Some Thoughts about P and NP CIRCUIT-SAT NP-complete problems live here Some Thoughts about P and NP Belief: P is a proper subset of NP. Implication: the NP-complete problems are the hardest in NP. Why: Because if we could solve an NP-complete problem in polynomial time, we could solve every problem in NP in polynomial time. That is, if any NP-complete problem is solvable in polynomial time, then P=NP. Since so many people have attempted without success to find polynomial-time solutions to NP-complete problems, showing your problem is NP-complete is equivalent to showing that a lot of smart people have worked on your problem and found no polynomial-time algorithm. NP-Completeness
NP Complete. How do we use it? Problem Reduction A language M is polynomial-time reducible to a language L if every instance x for M can be transformed in polynomial time to an instance x’ for L such that x is in M if and only if x’ is in L. Denote this by ML. CIRCUIT-SAT is NP-complete: CIRCUIT-SAT is in NP For every M in NP, M CIRCUIT-SAT. Inputs: 1 Output: NP-Completeness
Transitivity of Reducibility If A B and B C, then A C. A C, since polynomials are closed under composition. NP-Completeness
SAT A Boolean formula is a formula where the variables and operations are Boolean (0/1): (a+b+¬d+e)(¬a+¬c)(¬b+c+d+e)(a+¬c+¬e) OR: +, AND: (times), NOT: ¬ termed CNF: conjunctive normal form: series of subexpressions connected with and, each part of subexpression connected with or. Called “normal” as every boolean formula can be put in this form. Joined similarly DNF: disjunctive normal form is the or of ands. DeMorgan’s law is helpful in switching forms. Two or more alternate forms NP-Completeness
CNF-SAT CNF-SAT: Given a Boolean formula S, is S satisfiable, that is, can we assign 0’s and 1’s to the variables so that S is 1 (“true”)? Easy to see that CNF-SAT is in NP: Non-deterministically choose an assignment of 0’s and 1’s to the variables and then evaluate each clause. If they are all 1 (“true”), then the formula is satisfiable. NP-Completeness
The formula is satisfiable SAT is NP-complete Reduce CIRCUIT-SAT to SAT. Given a Boolean circuit, make a variable for every input and gate. We might be tempted just to create the formula by nesting the gate formulas, but the nesting could be pretty bad. Instead, create a sub-formula for each gate, characterizing its effect. Form the formula as the output variable AND-ed with all these sub-formulas: Example: m((a+b)↔e)(c↔¬f)(d↔¬g)(e↔¬h)(ef↔i)… Inputs: a e h The formula is satisfiable if and only if the Boolean circuit is satisfiable. b k i f c Output: m g j n d NP-Completeness
But not yet in CNF We have the and of clauses. m((a+b)↔e)(c↔¬f)(d↔¬g)(e↔¬h)(ef↔i)… We have the and of clauses. We change each clause to CNF : Build a truth table for the clause eab a b e eab 1 NP-Completeness
Could express e ab either postively or negatively. Positively: (abe+a¬b¬e + ¬ab¬e + ¬a ¬b¬e) Negatively: ¬(ab¬e +a¬be+ ¬abe+ ¬a¬be) Since we want CNF, the opposite of the negatively stated answer is most helpful NP-Completeness
Converting to CNF continued Need to worry about form. Worry about Not. for each 0 in result, show the values (in DNF) then ¬B is the “or” of all the 0 generating choices You can get B by DeMorgan’s law Example c ab ¬B = ab¬c + a¬bc + ¬abc + ¬a¬bc B = (¬a+¬b+c)(¬a+b+¬c)(a+¬b+¬c)(a+b+¬c) Repeat for all clauses and “and” together NP-Completeness
3SAT The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation): (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e) Reduction from SAT (See §13.3.1). NP-Completeness
Map CNF-SAT to 3SAT The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation). 3CNF-SAT Must show we can map general form to restricted form. Map one clause at a time. If clause has a single variable a, map as (a+a+a) If clause has two terms (a+b) we map as (a+b+b) If the clause has three terms, we directly map If the clause has k terms we introduce additional variables which we’ll label as b1…bk-3. (a1+a2+a3+...+ak) = (a1+a2+b1)(¬b1+a3+b2)(¬b2+a4+b3)..(¬bk-3+ak-1+ak) So – we have proven that 3SAT is as difficult as CNF-SAT as we can solve any CNF-SAT problem by converting it to a 3SAT problem NP-Completeness
Reductions in NP-completeness proofs
Vertex Cover: subset of vertices which touch every edge A vertex cover of graph G=(V,E) is a subset W of V, such that, for every edge (a,b) in E, a is in W or b is in W. VERTEX-COVER: Given an graph G and an integer K, is does G have a vertex cover of size at most K? VERTEX-COVER is in NP: Non-deterministically choose a subset W of size K and check that every edge is covered by W. Notice how 3SAT makes mapping easier as few cases! NP-Completeness
Vertex-Cover is NP-complete A vertex cover of graph G=(V,E) is a subset W of V, such that, for every edge (a,b) in E, a is in W or b is in W. We say every edge is “covered” by the set. Reduce 3SAT to VERTEX-COVER – DISSIMILAR PROBLEMS? Pretty complicated mapping. Let S be a Boolean formula in CNF with each clause having 3 literals. For each variable x, create nodes for x and ¬x, and connect these two. For each clause (x+ ¬ y+z), create a triangle with a node representing each term of the clause and connect these three nodes. The fact that they are connected indicated they are from the same clause. x ¬x t1 t2 t3 NP-Completeness
3CNF Vertex Cover: Note how 3CNF aids construction Completing the construction Connect each literal in a clause triangle to its copy in a variable pair. Example: a clause (¬x+y+z) Let n=# of variables Let m=# of clauses Set K=n+2m at least two nodes of every triangle will be needed in the cover at least one of every true/false will be needed in the cover If hope to achieve K, can’t use more than minimum Repeat for each clause” (a+b+c)(¬a+b+¬c)(¬b+¬c+¬d) ¬x x y ¬y z ¬z t1 t3 t2 NP-Completeness
Vertex-Cover is NP-complete Example: (a+b+c)(¬a+b+¬c)(¬b+¬c+¬d) Label nodes (clause number, term number) Graph has vertex cover of size K=4+6=10 iff formula is satisfiable. Will select a or ¬a for all terms - maps to truth assignment a ¬a b ¬b c ¬c d ¬d 1,2 2,2 3,2 clause 3 variable 2 1,1 1,3 2,1 2,3 3,1 3,3 NP-Completeness
Clique (pronounced click or kleek) A clique of a graph G=(V,E) is a subgraph C that is fully-connected (every pair in C has an edge). Think of edges as being “we are friends” CLIQUE: Given a graph G and an integer K, is there a clique in G of size at least K? CLIQUE is in NP: non-deterministically choose a subset C of size K and check that every pair in C has an edge in G. This graph has a clique of size 5 NP-Completeness
gray nodes are the solution, in each problem CLIQUE is NP-Complete Reduction from VERTEX-COVER. A graph G has a vertex cover of size K if and only if it’s complement has a clique of size n-K. An edge exists in G’ iff an edge does not exist in G. G G’ gray nodes are the solution, in each problem NP-Completeness
Why work? – there can be no line between unpicked For the graph below: For the example below: Show that a graph G has a vertex cover of size K if and only if it’s complement has a clique of size n-K. Why work? – there can be no line between unpicked vertices or wouldn’t have a cover without them. NP-Completeness
Some Other NP-Complete Problems SET-COVER: Given a collection of m sets, are there K of these sets whose union is the same as the whole collection of m sets? NP-complete by reduction from VERTEX-COVER SUBSET-SUM: Given a set of integers and a distinguished integer K, is there a subset of the integers that sums to K? NP-Completeness
Some Other NP-Complete Problems 0/1 Knapsack: Given a collection of items with weights and benefits, is there a subset of weight at most W and benefit at least K? (Recall we could do in pseudo-polynomial time with dynamic programming) NP-complete by reduction from SUBSET-SUM Hamiltonian-Cycle: Given an graph G, is there a cycle in G that visits each vertex exactly once? NP-complete by reduction from VERTEX-COVER Traveling Salesperson Tour: Given a complete weighted graph G, is there a cycle that visits each vertex and has total cost at most K? NP-complete by reduction from Hamiltonian-Cycle. NP-Completeness
Approximation Algorithms While the complexity of finding exact solutions of all NP-complete problems is the same up to polynomial factors, this elegant unified picture disappears rather dramatically when one considers approximate solutions. Ex: while vertex cover and independent set are both the same problems for exact solutions, vertex cover has a simple factor 2 approximation (no worse that twice the best time) algorithm but independent set has been shown to be hard to approximate within any reasonable factor. The lack of a unified picture makes the pursuit of understanding and classifying problems based on their approximability an intriguing one with diverse techniques used for studying different problems NP-Completeness
Backtracking and Branch and Bound (BAB) Backtracking: consider all possibilities in logical order. Backtrack when illegal. Example Knapsack: each item; take it or leave it. When backpack gets too full (no legal next choice), backtrack to earlier decision point. In text, choice is made of most promising alternative, but all are tried eventually. Consider list of alternatives: if recursion/stack, get depth first. If queue, get breadth first. NP-Completeness
BAB Don’t stop when one solution is found. Promising, but not guaranteed to be best. Keep going until best. Sometimes known as best-first strategy Keep a “best so far” solution Bound: If the “lower bound” on cost of a current solution is worse that the “best so far”, discard. Lower bound MUST be less than or equal to the cost of a derived solution. If it is too high, you would throw out good choices. NP-Completeness
Example:Pick a career with good $, little stress, little time. Need to have a way of ordering so can try “best first”. Maybe try cheapest tuition first. Maybe know college generally produces better income. Need to have a way of evaluating so can prune the search – as no potential for getting above best so far. Maybe know that taking over the family business would take 12 years but make $100K, so opt not to work at McDonalds as takes a long time. Instead of depth first, or breadth first, gives best first. NP-Completeness
Branch and Bound(problem instance x) {F={(x,Ø)} //frontier set of subproblems b = (+, Ø) // cost,config of best while (F Ø) { select from F the most promising (x,y) // y is certificate expand (x,y) yielding configs (xi,yi) for each new config (xi,yi) if solution found for (xi,yi) if cost,c, of solution < b then b = (c, (xi,yi)) else discard (xi,yi) if deadend or lb((xi,yi)) > b discard (xi,yi) else F = F {(xi,yi)} } return b NP-Completeness
Concerns space is an issue if all possible intermediate states are stored. Iterative deepening – cost is only height of tree, not breadth! NP-Completeness
Approximation Ratios Optimization Problems We have some problem instance x that has many feasible “solutions”. We are trying to minimize (or maximize) some cost function c(S) for a “solution” S to x. For example, Finding a minimum spanning tree of a graph Finding a smallest vertex cover of a graph Finding a smallest traveling salesperson tour in a graph An approximation produces a solution T T is a k-approximation to the optimal solution OPT if c(T)/c(OPT) < k (assuming a min. prob.; a maximization approximation would be the reverse) NP-Completeness
Polynomial-Time Approximation Schemes problem has polynomial-time approximation scheme (PTAS) if it has a polynomial-time (1+)-approximation algorithm, for any fixed >0 ( can appear in the running time). You control how far from optimal you are. Needs to be rescalable: you have an equivalent problem by scaling the cost function. 0/1 Knapsack has a PTAS, as we can just adjust the cost function (recall our dynamic programming problem was polynomial in total cost). Why can’t you scale the cost way down? What happens? (must be integer, so lose accuracy) Has a running time that is O(n3/ ). Please see §13.4.1 in Goodrich-Tamassia for details. NP-Completeness
Vertex Cover A vertex cover of graph G=(V,E) is a subset W of V, such that, for every (a,b) in E, a is in W or b is in W. OPT-VERTEX-COVER: Given an graph G, find a vertex cover of G with smallest size. Even given the size, it is tough! OPT-VERTEX-COVER is NP-hard. NP-Completeness
A 2-Approximation for Vertex Cover Every chosen edge e has both ends in C But e must be covered by an optimal cover; hence, one end of e must be in OPT Thus, there is at most twice as many vertices in C as in OPT. That is, C is a 2-approx. of OPT Running time: O(m) Algorithm VertexCoverApprox(G) Input graph G Output a vertex cover C for G C empty set H G while H has edges e H.removeEdge(H.anEdge()) v H.origin(e) w H.destination(e) C.add(v) C.add(w) for each f incident to v or w H.removeEdge(f) return C NP-Completeness
Simple solution It is simple, but not stupid. Many seemingly smarter heuristics can give a far worse performance, in the worst case. It is easy to complicate heuristics by adding more special cases or details. For example, if you just picked the edge whose endpoints had highest degree, it doesn’t improve the worst case. It just makes it harder to analyze. NP-Completeness
This seems like overkill, but Why not just select one of the endpoints? Can get bad results (not abide by bound). Note what would happen if you picked the outside point for each edge. You would be far greater than a two approximation NP-Completeness
What if you picked the vertex with greatest degree? Doesn’t always work either. In the case of ties (or near ties), you can get an approximation which is O(log n) times optimal (which isn’t a constant) NP-Completeness
Special Case of the Traveling Salesperson Problem OPT-TSP: Given a complete, weighted graph, find a cycle of minimum cost that visits each vertex. OPT-TSP is NP-hard Special case: edge weights satisfy the triangle inequality (which is common in many applications): w(a,b) + w(b,c) > w(a,c) (never get there quicker by going thru an extra point) b 5 4 a c 7 NP-Completeness
A 2-Approximation for TSP Special Case Euler tour P of MST M Algorithm TSPApprox(G) Input weighted complete graph G, satisfying the triangle inequality Output a TSP tour T for G M a minimum spanning tree for G P an Euler tour traversal of M, starting at some vertex s (consider a directional edge in each direction) T empty list for each vertex v in P (in traversal order) if this is v’s first appearance in P then T.insertLast(v) T.insertLast(s) // beg. node of Euler tour return T Output tour T NP-Completeness
Note This may produce a solution which crosses itself Try it with these points: NP-Completeness
A 2-Approximation for TSP Special Case - Proof The optimal tour is a spanning tour; hence |M|<|OPT|. The Euler tour P visits each edge of M twice; hence |P|=2|M| Each time we shortcut a vertex in the Euler Tour we will not increase the total length, by the triangle inequality (w(a,b) + w(b,c) > w(a,c)); hence, |T|<|P|. Therefore, |T|<|P|=2|M|<2|OPT| Output tour T Euler tour P of MST M Optimal tour OPT (at most the cost of P ) (twice the cost of M ) (at least the cost of MST M ) NP-Completeness
TSP Approximation Is there any way to do a Monte Carlo approach? Lots of work. Could use a least squares estimate. NP-Completeness
Set Cover Example: need set of skills to solve a problem and want to hire fewest people. OPT-SET-COVER: Given a collection,F, of m sets, find the smallest number of them whose union is the same as the whole collection of m sets? OPT-SET-COVER is NP-hard Greedy approach produces an O(log n)-approximation algorithm. See §13.4.4 for details. Algorithm SetCoverApprox(G) Input a collection of sets F={S1…Sm} Output a subcollection C with same union F {S1,S2,…,Sm} C empty set U union of S1…Sm while U is not empty Si set in F with most elements in U F.remove(Si) C.add(Si) Remove all elements in Si from U return C NP-Completeness
Show Greedy Set Cover is a polynomial time p(n) approximation, where p(n) = H(max|S|) What is an hamonic number and how is it approviated? Euler proved:H(d) =(i=1..d) 1/i = ln(d) = O(log d) NP-Completeness
Amortized Methods Helps us not to be misled by some cases being really bad We assign 1 to each set selected by our approximation algorithm. We distribute this cost over elements selected for the first time – and then use these costs to derive the relationship between the size of our cover C and the optimal over C*. Fiddle with numbers to try to get a comparison between C and C*. We pay 1 for each set chosen, so each of the k uncovered elements is charged 1/k. Let Si denote the ith subset selected. Given this assignment – consider the sum of all values in a set (not just new ones). What is bound? NP-Completeness
If x is covered for the first time by Si, then cx = 1/|(Si – (S1 + S2 + ... Si-1))| Since each element of C is cost 1 |C| = cx Now, if we were willing to pay for every element each time it occurred in a set of the optimal cover, the cost assigned to the optimal cover C* (S in C*) (x in S) cx Since each x is in one C* (and some occur in several), But we don’t know anything about their repetition, so we just count them all (S in C*) (x in S) cx (each x) cx = |C| (as you might be counting some more than once) NP-Completeness
(x in S) cx H(|S|) = O(log |S|) Assume (for a minute) (x in S) cx H(|S|) = O(log |S|) (Where (H(|S|) is the harmonic number) Then |C| (S in C*) (x in S) cx = (S in C*) H(|S|) C* H(max |Si|) = C*log|S| (so our bound isn’t more than a factor of log |S| too much) NP-Completeness
Show (x in S) cx H(|S|) (if we added up all the elements in S, not just the ones we are covering now) Order elements of S by order it is added to C. The best thing that can happen is that all the elements in S are covered at once, so the total is 1. The worst thing that can happen is that each is uncovered by a different set. If |S| = q, then the first thing added (because of another Si) must have no more cost that 1/q as otherwise S would have been selected not Si. Similarly the next x added must have no more cost than 1/(q-1) You add these all up and you get H(|S|) NP-Completeness
Show (x in S) cx H(|S|) (Formally) Consider each Si in the order it is added to C. Let ui = number of elements needing to be covered after Si has been selected ui =|(S – (S1 + S2 + ... Si))| Thus ui-1 –ui elements are covered for the first time by Si. NP-Completeness
(x in S) cx = (ui-1 –ui) 1/|(Si – (S1 + S2 + ... Si-1))| but |(Si – (S1 + S2 + ... Si-1))||(S – (S1 + S2 + ... Si-1))| = ui-1 (x in S) cx (i=1..k) (ui-1 –ui) 1/ui-1 but we can rewrite a product as a sum for example 4z = (i=1..4) z (z is constant with respect to i) Thus (ui-1 –ui) 1/ui-1 = (j=ui+1,ui-1) 1/ui-1 (remember ui < ui-1) so (i=1..k) (ui-1 –ui) 1/ui-1 =(i=1..k) (j=ui+1,ui-1) 1/ui-1 (i=1..k) (j=ui+1,ui-1)1/j as j ui-1 NP-Completeness
= (i=1..k) ((j=1,ui-1)1/j - (j=1,ui))1/j) (i=1..k) (j=ui+1,ui-1)1/j = (i=1..k) ((j=1,ui-1)1/j - (j=1,ui))1/j) = (i=1..k) H(ui-1 ) – H(ui) = H(u0) – H(uk) (because sum telescopes) but u0 = |S| and uk =0 = H(|S|) (which is what we wanted to prove) NP-Completeness