Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets

Slides:



Advertisements
Similar presentations
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
Advertisements

1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 20 Prof. Erik Demaine.
Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm.
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
Andreas Klappenecker [Based on slides by Prof. Welch]
Disjoint-Set Operation
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 13 Minumum spanning trees Motivation Properties of minimum spanning trees Kruskal’s.
Minimum Spanning Tree Algorithms
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
1 7-MST Minimal Spanning Trees Fonts: MTExtra:  (comment) Symbol:  Wingdings: Fonts: MTExtra:  (comment) Symbol:  Wingdings:
Lecture 18: Minimum Spanning Trees Shang-Hua Teng.
1 Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm.
CPSC 411, Fall 2008: Set 7 1 CPSC 411 Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Jennifer Welch Fall 2008.
CPSC 311, Fall CPSC 311 Analysis of Algorithms Disjoint Sets Prof. Jennifer Welch Fall 2009.
Analysis of Algorithms CS 477/677
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 17 Union-Find on disjoint sets Motivation Linked list representation Tree representation.
CSE Algorithms Minimum Spanning Trees Union-Find Algorithm
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CS2420: Lecture 42 Vladimir Kulyukin Computer Science Department Utah State University.
David Luebke 1 9/10/2015 CS 332: Algorithms Single-Source Shortest Path.
Design and Analysis of Computer Algorithm September 10, Design and Analysis of Computer Algorithm Lecture 5-2 Pradondet Nilagupta Department of Computer.
David Luebke 1 9/10/2015 ITCS 6114 Single-Source Shortest Path.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Definition: Given an undirected graph G = (V, E), a spanning tree of G is any subgraph of G that is a tree Minimum Spanning Trees (Ch. 23) abc d f e gh.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
MST Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
2IL05 Data Structures Fall 2007 Lecture 13: Minimum Spanning Trees.
Spring 2015 Lecture 11: Minimum Spanning Trees
D ESIGN & A NALYSIS OF A LGORITHM 06 – D ISJOINT S ETS Informatics Department Parahyangan Catholic University.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Minimum Spanning Trees and Kruskal’s Algorithm CLRS 23.
Homework remarking requests BEFORE submitting a remarking request: a)read and understand our solution set (which is posted on the course web site) b)read.
1 Minimum Spanning Trees. Minimum- Spanning Trees 1. Concrete example: computer connection 2. Definition of a Minimum- Spanning Tree.
1 Greedy Algorithms and MST Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
Disjoint Sets Data Structure (Chap. 21) A disjoint-set is a collection  ={S 1, S 2,…, S k } of distinct dynamic sets. Each set is identified by a member.
Lecture X Disjoint Set Operations
Disjoint Sets Data Structure. Disjoint Sets Some applications require maintaining a collection of disjoint sets. A Disjoint set S is a collection of sets.
CSCE 411H Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 7 1 * Slides adapted from.
Disjoint-Set Operation. p2. Disjoint Set Operations : MAKE-SET(x) : Create new set {x} with representative x. UNION(x,y) : x and y are elements of two.
Chapter 23: Minimum Spanning Trees: A graph optimization problem Given undirected graph G(V,E) and a weight function w(u,v) defined on all edges (u,v)
Finding Minimum Spanning Trees Algorithm Design and Analysis Week 4 Bibliography: [CLRS]- Chap 23 – Minimum.
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
© 2007 Seth James Nielson Minimum Spanning Trees … or how to bring the world together on a budget.
 2004 SDU Lecture 6- Minimum Spanning Tree 1.The Minimum Spanning Tree Problem 2.Greedy algorithms 3.A Generic Algorithm 4.Kruskal’s Algorithm.
Chapter 21 Data Structures for Disjoint Sets Lee, Hsiu-Hui Ack: This presentation is based on the lecture slides from Prof. Tsai, Shi-Chun as well as various.
1 Week 3: Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm.
David Luebke 1 3/1/2016 CS 332: Algorithms Dijkstra’s Algorithm Disjoint-Set Union.
21. Data Structures for Disjoint Sets Heejin Park College of Information and Communications Hanyang University.
MST Lemma Let G = (V, E) be a connected, undirected graph with real-value weights on the edges. Let A be a viable subset of E (i.e. a subset of some MST),
Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Algorithm Design and Analysis June 11, Algorithm Design and Analysis Pradondet Nilagupta Department of Computer Engineering This lecture note.
November 22, Algorithms and Data Structures Lecture XII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Tirgul 12 Solving T4 Q. 3,4 Rehearsal about MST and Union-Find
Lecture ? The Algorithms of Kruskal and Prim
Introduction to Algorithms
CS 3343: Analysis of Algorithms
Minimum Spanning Trees
Lecture 12 Algorithm Analysis
Algorithms and Data Structures Lecture XII
CS200: Algorithm Analysis
Data Structures – LECTURE 13 Minumum spanning trees
Lecture 12 Algorithm Analysis
CSC 413/513: Intro to Algorithms
Minimum Spanning Trees
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Lecture 12 Algorithm Analysis
Finding Minimum Spanning Trees
Presentation transcript:

Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets Algorithm Design and Analysis 2015 - Week 5 http://bigfoot.cs.upt.ro/~ioana/algo/ Bibliography: [CLRS] – Chap 23, Chap 21

General algorithm for growing a MST - review GENERIC-MST 1 A = {}; 2 while A does not form a spanning tree 3 find an edge (u,v) that is safe for A 4 A = A + (u,v) 5 return A Crucial question: How do we find the safe edge needed in line 3 ?

Approaches for growing A 8 A grows as a single tree Loop invariant: Prior to each iteration, A is a subtree of some minimum spanning tree. (Prim’s algorithm) A can start as a forest of trees. Loop invariant: Prior to each iteration, A is a subset of some minimum spanning tree. (Kruskal’s algorithm) 2 1 7 5 6 7 9 4 3 8 2 1 7 5 6 7 9 4 3

Review: Idea of Prim’s algorithm The while loop has |V| iterations Given G = (V, E). Output: a MST A. Randomly select a vertex v AV = {v}; A = {}. While (AV != V) (X) find a vertex u  V-AV that connects to a vertex v  AV such that w(u, v) ≤ w(x, y), for any x  V-AV and y  AV AV = AV U {u}; A = A U (u, v). EndWhile Return A The running time of the algorithm is |V| multiplied with the running time of the operation (X)

Review: Prim’s Algorithm MST-Prim(G, w, r) Q = G.V; for each u  Q key[u] = ; p[u] = null; key[r] = 0; while (Q not empty) u = ExtractMin(Q); for each v  G.Adj[u] if (v  Q and w(u,v) < key[v]) p[v] = u; DecreaseKey(v, w(u,v)); The running time of Prim’s algorithm depends on: How we implement the min-priority queue Q How we implement the graph

Review: Prim’s algorithm - Complexity Queue Graph Distance Array Min-Heap PQ Adjacency Matrix V*V (good if graph is dense) V*V*logV Adjacency Structure V*V+E*logV E*log V (good if graph is sparse: E<<V*V) Other (Graph as one big list of Edges) V*E V*E*log V

Review: Prim’s algorithm - Conclusion A solution to a problem is good, if: The algorithm is correct The implementation uses adequate data structures such that they help make the most out of the algorithm

Idea of Kruskal’s Algorithm Build the MST starting as a forest. Initially, the trees of the forest are the vertexes of the graph (no edges) In each step add the edge with the smallest weight that does not create a cycle (we will prove that this is a safe edge) Continue until the forest is a single tree This is the minimum spanning tree

Example Sorted edges: (g,h) (c,i) (f,g) (a,b) (c,f) (g,i) (c,d) (h,i) (a,h) (b,c) (d, e) (b,h) (d,f)

Kruskal’s algorithm - outline MST-KRUSKAL(G=(V,E), w) A={} (initialize as an empty forest) for each vertex v in V Create-tree(v) and add to forest A sort the edges of G, E, into increasing order by weight w for each edge (u,v) in E taken in increasing order by weight if (TreeOf(u)<>TreeOf(v)) Add (u,v) to A (The two trees are united now into a new tree which replaces them in the forest) return A

Correctness of Kruskal’s algorithm We need to prove that the edge added in each step (the edge with the smallest weight that does not create a cycle) is a safe edge Lemma: Let (V,A) be a subgraph of a MST of G=(V,E) and let e=(u,v) in E-A be an edge such that: (V, A+{e}) has no cycles E has the minimum weight among all edges in E-A such that cond 1 is satisfied Then (V, A+{e}) is also a subgraph of the MST containing (V,A).

Proof of the Lemma Let T be any MST with (V,A) as a subgraph. If e in T, we are done. Suppose that e=(u,v) is not in T There is a unique path from u to v in the MST T, which contains at least one edge e’ in E-A, e’≠e because e not in T but e’ in T (V, A +{e’}) has no cycles, because it is included in T weight(e)<=weight(e’) because e was selected according to condition (2) of lemma Consider the new tree T’=T+{e}-{e’} If weight(e)=weight(e’) then T’ is another MST If weight(e)<weight(e’) then weight(T) >weight(T’), but T is the MST, contradiction. Results that the supposition e is not in T is false => e is in T

Kruskal’s algorithm - outline MST-KRUSKAL(G=(V,E), w) A={} (initialize as an empty forest) for each vertex v in V Create-tree(v) and add to forest A sort the edges of G, E, into increasing order by weight w for each edge (u,v) in E taken in increasing order by weight if (TreeOf(u)<>TreeOf(v)) Add (u,v) to A (The two trees are united now into a new tree which replaces them in the forest) return A Questions for an efficient implementation: How to represent the forest of trees ? How to test whether u,v are in the same tree or not ? How to replace two trees of the forest by their union ?

Kruskal’s algo - Implementation Questions: How to represent the forest of trees ? How to test efficiently whether u,v are in the same tree or not ? How to replace two trees of the forest by their union ? Solution: use an efficient implem of Disjoint-sets We have a collection of disjoint sets that supports following 3 operations: Make-Set(u) Find-Set(u) Union(u,v)

Kruskal’s algorithm KRUSKAL (G=(V,E),w) A = 0; for each vertex v in V MAKE-SET(v) sort the edges E of G into nondecreasing order by weight w for each (u,v) // taken from the sorted list if FIND-SET(u)<>FIND-SET(v) A = A + { (u,v)} UNION (u,v) return A

Kruskal’s algorithm - Analysis KRUSKAL (G=(V,E),w) A = 0; for each vertex v in V MAKE-SET(v) sort the edges E of G into nondecreasing order by weight w for each (u,v) // taken from the sorted list if FIND-SET(u)<>FIND-SET(v) A = A + {(u,v)} UNION (u,v) return A A number of |V| calls of MAKE-SET Sorting can be done in |E| * log (|E|) O(|E|) calls of FIND-SET and UNION The performace of Kruskal’s algorithm is given by the performance of the Union-Find operations !

Disjoint sets Also known as “union-find.” Maintain collection S ={S1;…; Sk} of disjoint dynamic (changing over time) sets. Each set is identified by a representative, which is some member of the set. Doesn’t matter which member is the representative, as long as if we ask for the representative twice without modifying the set, we get the same answer both times.

Operations MAKE-SET(x): make a new set Si ={x}, and add Si to S. UNION(x; y) : makes a new set from the union of set members of Sx and Sy Representative of new set is any member of Sx or Sy, often the representative of one of Sx and Sy. The Union operation destroys Sx and Sy (since sets must be disjoint). FIND-SET(x): return representative of set containing x.

Disjoint sets - Implementation with linked lists Each set is a singly linked list, represented by an object with attributes head: the first element in the list, assumed to be the set’s representative, and tail: the last element in the list. Objects may appear within the list in any order. Each object in the list has attributes for the set member, pointer to the set object, and next.

Disjoint sets - Implementation with linked lists [CLRS – fig 21.2]

Implementing disjoint-sets operations MAKE-SET(x): create a new linked list whose only object is x. FIND-SET(x): follow the pointer from x back to its set object and then return the member in the object that head points to. UNION(x; y) : we can append y’s list onto the end of x’s list. The representative of x’s list becomes the representative of the resulting set. We use the tail pointer for x’s list to quickly find where to append y’s list. Drawback: we must update the pointer to the set object for each object originally on y’s list, which takes time linear in the length of y’s list.

Weighted-Union on lists To improve the situation of UNION, we will always append the shorter list at the end of the longer list - weighted-union heuristic Implementation: each list also includes the length of the list (which we can easily maintain) and we always append the shorter list onto the longer Analysis: A single UNION operation can take O(n) time if both sets have n members. But how much takes a sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations ?

Weighted Union Analysis Theorem With weighted union, a sequence of m operations on n elements takes O(m + n lg n) time. Sketch of proof Each MAKESET and FIND-SET operation takes O(1) time, and there are O(m) of them For UNION: we count how many times can each object’s representative pointer be updated? It must be in the smaller set each time. Since the largest set has at most n members, The total time spent updating object pointers over all n UNION operations is O.(n lg n).

Weighted Union Analysis - Detail For UNION: we count how many times can each object’s representative pointer be updated? Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, the resulting set had at least 2 members. The next time x’s pointer was updated, the resulting set had at least 4 members. The third time x’s pointer was updated, the resulting set had at least 8 members After x’s pointer has been updated k times, the resulting set must have at least 2^k members Finally, the set will have n members. It’s object representative pointer was updated log n times Thus the total time spent updating object pointers over all UNION operations is O(.n * lg n)

Disjoint-sets with linked lists and weighted-union heuristic Upper bounds for every single operation: MAKE-SET(x): O(1) FIND-SET(x): O(1) UNION(x; y) : O(n) But: a sequence of n UNION operations in order to build the maximum possible set of n elements takes O(n lg n) time. Amortized analysis: UNION is an average of O(log n) in this case

Amortized analysis In an amortized analysis, we average the time required to perform a sequence of data-structure operations over all the operations performed. With amortized analysis, we can show that the average cost of an operation is small, if we average over a sequence of operations, even though a single operation within the sequence might be expensive.

Kruskal’s algorithm - Analysis KRUSKAL (G=(V,E),w) A = 0; for each vertex v in V MAKE-SET(v) sort the edges E of G into nondecreasing order by weight w for each (u,v); // taken from the sorted list if FIND-SET(u)<>FIND-SET(v) A = A + {(u,v)} UNION (u,v) return A |V|<=|E|<=|V|2 O(1) A number of |V| calls of MAKE-SET Sorting can be done in |E| * log (|E|) O(1) O(|E|) calls of FIND-SET and O(|V|) calls of UNION O(V * log (V)) Union-Find implemented with linked lists and weighted union

Kruskal’s algorithm - analysis If using Union-Find implemented with linked lists and weighted union, the complexity of Kruskal’s algorithm is O(E*log E) Actually, O(E*log E)=O(E*log V) (because E<=V2) The complexity is given by the sorting of edges There are also better implementations of Union-Find (with Forests), but the implementation with linked lists and weighted union is sufficient for Kruskal’s algorithm An implementation of Union-Find with arrays would be not OK for Kruskal’s algorithm, since it exceeds the complexity of the sorting !

Disjoint sets - Implementation as a forest Disjoint sets can be implemented as a forest of up-trees An up-tree: a set of nodes, every node has a parent, there is one node (the root) which has no parent. The root is the representative of the set. I A J K B C D E F G H

Implementing disjoint-sets operations MAKE-SET(x): Initialize node as root O(1) FIND-SET(x): walk upwards in the tree, starting from x, following parent links, until arriving at the root (height of tree should be small !) O(h) UNION(x; y) : One tree becomes a subtree of the other. O(1) A B C D E F G H I J K

Heuristics to improve Union-Find In order to reduce the height of the trees, following heuristics can be used: Union by rank: in UNION, always make the root of the smaller tree (smaller height) a child of the root of the larger tree. Path compression: Find path = the path formed by nodes visited during FIND-SET on the trip to the root. Make all nodes on the find path direct children of root.

Find-set with path compression Before executing FIND-SET(a) After executing FIND-SET(a)

Implementation of the forest Implementations based on dynamic memory allocation are possible, but not needed. The forest can be implemented with an array 3 2 A B C D E F G H I J K 1 1 1 1 2 4 4 4 9 9 9 1 2 3 4 5 6 7 8 9 10 11 rank p

Disjoint-sets operations with forest MAKE-SET(x) x.p = x x.Rank = 0 UNION(x; y) LINK (FIND-SET(x), FIND-SET(y)) LINK.(x; y) if x.rank > y.rank y.p = x else x.p = y // If equal ranks, choose y as parent and increment its rank. if x.rank == y.rank y.rank = y.rank + 1 FIND-SET(x) if x <> x:p x.p = FIND-SET(x.p) return x:p Union by rank Path compression

Analysis Theorem: A sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank and path compression in worst-case time O(m * alfa (n)) Alfa(n) is a very slowly growing function, it leads to almost linear O(m) The proof (by amortized analysis) in this case exceeds the scope of our lecture (it can be found in [CLRS] – chap 21.4)

Summay Minimum Spanning Trees. Kruskal’s algorithm [CLRS chap 23] Special data structures allow algorithms to be implemented efficiently Disjoint sets (Union-find structures) [CLRS chap 21] Intro to Amortized analysis