MA/CSSE 473 Day 37 Student Questions Kruskal data structures

Slides:



Advertisements
Similar presentations
MA/CSSE 473 Day 38 Finish Disjoint Sets Dijkstra.
Advertisements

The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Disjoint Sets Given a set {1, 2, …, n} of n elements. Initially each element is in a different set.  {1}, {2}, …, {n} An intermixed sequence of union.
Union-Find Problem Given a set {1, 2, …, n} of n elements. Initially each element is in a different set. {1}, {2}, …, {n} An intermixed sequence of union.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
NP-Complete Problems Problems in Computer Science are classified into
88- 1 Chapter 8 The Theory of NP-Completeness P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 17 Union-Find on disjoint sets Motivation Linked list representation Tree representation.
Chapter 11: Limitations of Algorithmic Power
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
1 The Theory of NP-Completeness 2 NP P NPC NP: Non-deterministic Polynomial P: Polynomial NPC: Non-deterministic Polynomial Complete P=NP? X = P.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Spring 2015 Lecture 11: Minimum Spanning Trees
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
MA/CSSE 473 Day 38 Problems Decision Problems P and NP.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Disjoint-sets Abstract Data Type (ADT) that contains nonempty pairwise disjoint sets: For all sets X and Y in the container, – X != ϴ and Y != ϴ – And.
Design and Analysis of Algorithms - Chapter 101 Our old list of problems b Sorting b Searching b Shortest paths in a graph b Minimum spanning tree b Primality.
NP-Complete Problems Algorithm : Design & Analysis [23]
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
LIMITATIONS OF ALGORITHM POWER
1 Ch 10 - NP-completeness Tractable and intractable problems Decision/Optimization problems Deterministic/NonDeterministic algorithms Classes P and NP.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
MA/CSSE 473 Day 40 Problems Decision Problems P and NP.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.
MA/CSSE 473 Day 37 Student Questions Kruskal Data Structures and detailed algorithm Disjoint Set ADT 6,8:15.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
The Theory of NP-Completeness
Limitation of Computation Power – P, NP, and NP-complete
The NP class. NP-completeness
NP-Complete Problems.
P & NP.
Chapter 10 NP-Complete Problems.
MA/CSSE 473 Day 36 Student Questions More on Minimal Spanning Trees
Advanced Algorithms Analysis and Design
MA/CSSE 473 Day 38 Problems Decision Problems P and NP
MA/CSSE 473 Days Answers to student questions
Lecture 22 Complexity and Reductions
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
MA/CSSE 473 Day 36 Kruskal proof recap
NP-Completeness Yin Tat Lee
An application of trees: Union-find problem
ICS 353: Design and Analysis of Algorithms
Minimum Spanning Tree Algorithms
Chapter 11 Limitations of Algorithm Power
NP-Complete Problems.
CS 3343: Analysis of Algorithms
Graphs and Algorithms (2MMD30)
Union-Find.
NP-Completeness Yin Tat Lee
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
P, NP and NP-Complete Problems
The Theory of NP-Completeness
P, NP and NP-Complete Problems
An application of trees: Union-find problem
Our old list of problems
Presentation transcript:

MA/CSSE 473 Day 37 Student Questions Kruskal data structures 6,8:15 MA/CSSE 473 Day 37 Student Questions Kruskal data structures Disjoint Set ADT Complexity intro

Data Structures for Kruskal A sorted list of edges (edge list, not adjacency list) Edge e has fields e.v and e.w (#s of its end vertices) Disjoint subsets of vertices, representing the connected components at each stage. Start with n subsets, each containing one vertex. End with one subset containing all vertices. Disjoint Set ADT has 3 operations: makeset(i): creates a singleton set containing vertex i. findset(i): returns the "canonical" member of its subset. I.e., if i and j are elements of the same subset, findset(i) == findset(j) union(i, j): merges the subsets containing i and j into a single subset.

Example of operations What are the sets after these operations? makeset (1) makeset (2) makeset (3) makeset (4) makeset (5) makeset (6) union(4, 6) union (1,3) union(4, 5) findset(2) findset(5) What are the sets after these operations?

Kruskal Algorithm Assume vertices are numbered 1...n (n = |V|) What can we say about efficiency of this algorithm (in terms of n=|V| and m=|E|)? Assume vertices are numbered 1...n (n = |V|) Sort edge list by weight (increasing order) for i = 1..n: makeset(i) i, count, result = 1, 0, [] while count < n-1: if findset(edgelist[i].v) != findset(edgelist[i].w): result += [edgelist[i]] count += 1 union(edgelist[i].v, edgelist[i].w) i += 1 return result

Implement Disjoint Set ADT Each disjoint set is a tree, with the "marked" (canonical) element as its root Efficient representation of these trees: an array called parent parent[i] contains the index of i’s parent. If i is a root, parent[i]=i 4 2 5 8 6 3 7 1 i 1 2 3 4 5 6 7 8 parent[i]

Using this representation def makeset1(i): parent[i] = i def findset1(i): while i != parent[i]: i = parent[i] return i makeset(i): findset(i): mergetrees(i,j): assume that i and j are the marked elements from different sets. union(i,j): assume that i and j are elements from different sets def mergetrees1(i,j): parent[i] = j def union1(i,j): mergetrees1(findset1(i), findset1(j)) Animated slide. Figure each out together and write on the board. before revealing it. 5 7 Write these procedures on the board 1 2 4 3 6 8 i 1 2 3 4 5 6 7 8 parent[i]

Analysis Assume that we are going to do n makeset operations followed by m union/find operations time for makeset? worst case time for findset? worst case time for union? Worst case for all m union/find operations? worst case for total? What if m < n? Write the formula to use min n makeset calls, time n findset worst case: n, union calls findset twice, so O(n) Worst case for total: n + nm If m<n, max height of trees is m, so O(n + m2) In general, O(n + m * min(n,m)) If we can keep the trees from getting so tall, perhaps we can do better. How can we do that?

Can we keep the trees from growing so fast? Make the shorter tree the child of the taller one What do we need to add to the representation? rewrite makeset, mergetrees. findset & union are unchanged. What can we say about the maximum height of a k-node tree? def makeset2(i): parent[i] = i height[i] = 0 def mergetrees2(i,j): if height[i] < height[j]): parent[i] = j elif height[i] > height[j]: parent[j] = i else: parent[i] = j height[j] = height[j] + 1 Before revealing code: We need to add a height array Write makeset2 and mergetrees2 on the board together.

Theorem: max height of a k-node tree T produced by these algorithms is lg k Base case… Induction step: Let T be a k-node tree (k > 1) Induction hypothesis… T is the union of two trees: T1 with k1 nodes and height h1 T2 with k2 nodes and height h2 What can we say about the heights of these two trees? Case 1: h1≠h2. Height of T is Case 2: h1=h2. WLOG Assume k1≥k2. Then k2≤k/2. Height of tree is 1 + h2 <= 1 + lg k2 <= 1 + lg k/2 = 1 + lg k - 1 = lg k Base case: k=1. Induction hypothesis: For all p< k, the height of a p-node tree is at most lg p. Since k > 1, T must be the union of two trees Case 1: height of T is max {h1, h2} <= max {lg k1, lg k2 } <= lg k Case 2: 1 + h2 <= 1 + lg k2 <= 1 + lg k/2 = 1 + lg k - 1 = lg k

Worst-case running time Again, assume n makeset operations, followed by m union/find operations. If m > n If m < n n makesets are still n union and find are m* log n Altogether n + m log n If m < n, we can reduce it to n + m log m

Speed it up a little more Path compression: Whenever we do a findset operation, change the parent pointer of each node that we pass through on the way to the root so that it now points directly to the root. Replace the height array by a rank array, since it now is only an upper bound for the height. Look at makeset, findset, mergetrees (on next slides)

Makeset This algorithm represents the set {i} as a one-node tree and initializes its rank to 0. def makeset3(i): parent[i] = i rank[i] = 0

Findset This algorithm returns the root of the tree to which i belongs and makes every node on the path from i to the root (except the root itself) a child of the root. def findset(i): root = i while root != parent[root]: root = parent[root] j = parent[i] while j != root: parent[i] = root i = j return root

Mergetrees This algorithm receives as input the roots of two distinct trees and combines them by making the root of the tree of smaller rank a child of the other root. If the trees have the same rank, we arbitrarily make the root of the first tree a child of the other root. def mergetrees(i,j) : if rank[i] < rank[j]: parent[i] = j elif rank[i] > rank[j]: parent[j] = i else: rank[j] = rank[j] + 1

Analysis It's complicated! R.E. Tarjan proved (1975)*: Let t = m + n Worst case running time is Ѳ(t α(t, n)), where α is a function with an extremely slow growth rate. Tarjan's α: α(t, n) ≤ 4 for all n ≤ 1019728 Thus the amortized time for each operation is essentially constant time. * According to Algorithms by R. Johnsonbaugh and M. Schaefer, 2004, Prentice-Hall, pages 160-161

Intro to computational Complexity Polynomial-time algorithms Intro to computational Complexity

The Law of the Algorithm Jungle Polynomial good, exponential bad! The latter is obvious, the former may need some explanation We say that polynomial-time problems are tractable, exponential problems are intractable

Polynomial time vs exponential time What’s so good about polynomial time? It’s not exponential! We can’t say that every polynomial time algorithm has an acceptable running time, but it is certain that if it doesn’t run in polynomial time, it only works for small inputs. Polynomial time is closed under standard operations. If f(t) and g(t) are polynomials, so is f(g(t)). also closed under sum, difference, product Almost all of the algorithms we have studied run in polynomial time. Except those (like permutation and subset generation) whose output is exponential.

Decision problems When we define the class P, of “polynomial-time problems”, we will restrict ourselves to decision problems. Almost any problem can be rephrased as a decision problem. Basically, a decision problem is a question that has two possible answers, yes and no. The question is about some input. A problem instance is a combination of the problem and a specific input.

Decision problem definition The statement of a decision problem has two parts: The instance description part defines the information expected in the input The question part states the actual yes-or-no question; the question refers to variables that are defined in the instance description

Decision problem examples Definition: In a graph G=(V,E), a clique E is a subset of V such that for all u and v in E, the edge (u,v) is in E. Clique Decision problem Instance: an undirected graph G=(V,E) and an integer k. Question: Does G contain a clique of k vertices? k-Clique Decision problem Instance: an undirected graph G=(V,E). Note that k is some constant, independent of the problem.

Decision problem example Definition: The chromatic number of a graph G=(V,E) is the smallest number of colors needed to color G. so that no two adjacent vertices have the same color Graph Coloring Optimization Problem Instance: an undirected graph G=(V,E). Problem: Find G’s chromatic number and a coloring that realizes it Graph Coloring Decision Problem Instance: an undirected graph G=(V,E) and an integer k>0. Question: Is there a coloring of G that uses no more than k colors? Almost every optimization problem can be expressed in decision problem form

Decision problem example Definition: Suppose we have an unlimited number of bins, each with capacity 1.0, and n objects with sizes s1, …, sn, where 0 < si ≤ 1 (all si rational) Bin Packing Optimization Problem Instance: s1, …, sn as described above. Problem: Find the smallest number of bins into which the n objects can be packed Bin Packing Decision Problem Instance: s1, …, sn as described above, and an integer k. Question: Can the n objects be packed into k bins?

Reduction Suppose we want to solve problem p, and there is another problem q. Suppose that we also have a function T that takes an input x for p, and produces T(x), an input for q such that the correct answer for p with input x is yes if and only if the correct answer for q with input T(X) is yes. We then say that p is reducible to q and we write p≤q. If there is an algorithm for q, then we can compose T with that algorithm to get an algorithm for p. If T is a function with polynomially bounded running time, we say that p is polynomially reducible to q and we write p≤Pq. From now on, reducible means polynomially reducible.

Classic 473 reduction Moldy Chocolate is reducible to 4-pile Nim

Definition of the class P Definition: An algorithm is polynomially bounded if its worst-case complexity is big-O of a polynomial function of the input size n. i.e. if there is a single polynomial p such that for each input of size n, the algorithm terminates after at most p(n) steps. Definition: A problem is polynomially bounded if there is a polynomially bounded algorithm that solves it The class P P is the class of decision problems that are polynomially bounded Informally (with slight abuse of notation), we also say that polynomially bounded optimization problems are in P

Example of a problem in P Shortest Path Input: A weighted graph G=(V,E) with n vertices (each edge e is labeled with a non-negative weight w(e)), two vertices v and w and a number k. Question: Is there a path in G from v to w whose total weight is ≤ k? How do we know it’s in P? Answer: Dijkstra's algorithm

Example: Clique problems It is known that we can determine whether a graph with n vertices has a k-clique in time O(k2nk). Clique Decision problem 1 Instance: an undirected graph G=(V,E) and an integer k. Question: Does G contain a clique of k vertices? Clique Decision problem 2 Instance: an undirected graph G=(V,E). Note that k is some constant, independent of the problem. Are either of these decision problems in P? The first one is not, the second one is

The problem class NP NP stands for Nondeterministic Polynomial time. The first stage assumes a “guess” of a possible solution. Can we verify whether the proposed solution really is a solution in polynomial time?

More details Example: Graph coloring. Given a graph G with N vertices, can it be colored with k colors? A solution is an actual k-coloring. A “proposed solution” is simply something that is in the right form for a solution. For example, a coloring that may or may not have only k colors, and may or may not have distinct colors for adjacent nodes. The problem is in NP iff there is a polynomial-time (in N) algorithm that can check a proposed solution to see if it really is a solution.

Still more details A nondeterministic algorithm has two phases and an output step. The nondeterministic “guessing” phase, in which the proposed solution is produced. It will be a solution if there is one. The deterministic verifying phase, in which the proposed solution is checked to see if it is indeed a solution. Output “yes” or “no”.