Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor.

Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor

Main Purpose The main purpose of the article is developing an optimal cache-oblivious priority queue data structure, supporting insertion, deletion and deletemin operations in a multilevel memory hierarchy. Priority Queues are a critical component in many of the best known cache-aware graph algorithms.

Main Purpose (Cont.) In a cache-oblivious data structure, M (the memory size) and B (the block size) are not used in the description of the structure. The bounds match the bounds of several previously developed cache-aware priority queue data structures, which all rely crucially on knowledge about M and B.

Introduction As the memory systems of modern computers become more complex, it is increasingly important to design algorithms that are sensitive to the structure of memory. One of the essential features of modern memory systems is that they are made up of a hierarchy of several levels of cache, main memory, and disk.

Data Locality In order to amortize the large access time of memory levels far away from the processor, memory systems often transfer data between memory levels in large blocks. Thus it is becoming increasingly important to obtain high data locality in memory access patterns.

The Problem... The standard approach to obtaining good locality is to design algorithms parameterized by several aspects of the memory hierarchy, such as the size of each memory level, and the block size of memory transfers between levels.

The Problem... (Cont.) Unfortunately, this parameterization often leads to complex algorithms that are tuned to particular architectures. As a result, these algorithms are inflexible and not portable.

… and The Solution Recently, the research has aimed at developing memory-hierarchy-sensitive algorithms that avoid any memory-specific parameterization. It has been shown that such cache-oblivious algorithm works optimally on all levels of multilevel memory hierarchy (and not only on a two-level hierarchy, as commonly used for simplicity).

The Solution (Cont.) Cache-oblivious algorithms automatically tune to arbitrary memory architectures. Data locality can be achieved without using the parameters describing the structures of the memory hierarchy.

Memory Hierarchy A typical hierarchy consists of a number of memory levels, with memory level l having size M l and being composed of M l /B l blocks of size B l. In any memory transfer from level l to l-1, an entire block is moved atomically. Each memory level also has an associated replacement strategy, which is used to decide what block to remove in order to make room for a new block being brought into that level of the hierarchy.

Efficiency Efficiency of an algorithm is measured in terms of the number of block transfers the algorithm performs between two consecutive levels in the memory hierarchy (they are called memory transfers). An algorithm has complete control over placement of blocks in main memory and on disk.

Assumptions  M > B 2 (the tall cache assumption).  When an algorithm accesses an element that is not stored in main memory, the relevant block is automatically fetched with a memory transfer. Optimal paging strategy - If the main memory is full, the ideal block in main memory is elected for replacement based on the future characteristics of the algorithm.

Generalization Since an analysis of an algorithm in the two-level model (which was in common use for simulating a multilevel memory hierarchy) holds for any block and main memory size, it holds for any level of the memory hierarchy. As a consequence, if the algorithm is optimal in the two-level model, it is optimal on all levels of the memory hierarchy.

Priority Queue A priority queue maintains a set of elements each with a priority (or key) under the operations insert, delete and deletemin, where a deletemin operation finds and deletes the minimum key element in the queue. Common implementations for a priority queue are heap and B-tree, both support all operations in O(log B N) memory transfers.

The Bounds Sorting N elements requires Θ((N/B)log (M/B) (N/B)) memory transfers. In a B-tree, inset/delmin takes O(log B N) memory transfers, therefore sorting N elements takes O(Nlog B N) memory transfers, which is a factor of (Blog B N)/(log (M/B) (N/B)) from optimal. From now on, we will use sort(N) to denote (N/B)log (M/B) (N/B).

The Results The main result is development of an optimal cache-oblivious priority queue, that supports insert, delete and deletemin operations in O((1/B)log (M/B) (N/B)) amortized memory transfers.

The Results (Cont.)

Priority Queue – Structure The priority queue data structure consists of Θ(log logN) levels whose sizes vary from N to a constant size c. The size of a level corresponds asymptotically to the number of elements that can be stored within it. For example, the i’th level from above has size. The levels from largest to smallest are N, N 2/3, N 4/9,…, X 9/4, X 3/2, X, X 2/3, X 4/9,…, c 9/4, c 3/2, c.

Priority Queue – Structure (Cont.) Smaller levels store elements with smaller keys or elements that were more recently inserted. The minimum key element and the most recently inserted element are always in the smallest (lowest) level c. Both insertions and deletions are initially performed on the smallest level and may propagate up through the levels.

Priority Queue - Figure

Priority Queue - Buffers Elements are stored in a level in a number of buffers, which are also used to transfer elements between levels. Level X consists of one up buffer u X that can store up to X elements, and at most X 1/3 down buffers, each containing between ½X 2/3 and 2X 2/3 elements. Thus the maximum capacity of level X is 3X. The size of a down buffer at one level matches the size (up to a constant factor) of the up buffer one level down.

The Invariants Invariant 1: At level X, elements are stored among the down buffers, that is, elements is d i x have smaller keys than elements in d i+1 x, but the elements within d i x are unordered. The element with largest key in each down buffer is called a pivot element, and is used to mark the boundaries between the ranges of the keys of elements in down buffers.

The Invariants (Cont.) Invariant 2: At level X, the elements in the down buffers have smaller keys than the elements in the up buffer. Invariant 3: The elements in the down buffers at level X have smaller keys than the elements in the down buffers at the next higher level X 3/2.

The Invariants (Cont.) The invariants ensure that the keys of the elements in the down buffers get larger as we go from smaller to larger levels in the structure. At one level, keys of elements in the up buffer are larger than the keys in the down buffers. The keys of an element in an up buffer are unordered relative to the keys of the elements in the down buffer one level up.

Transferring Elements (in general) Up buffers store elements that are “on their way up” – they have yet to be resolved as belonging to a particular down buffer in the next level. Down buffers store elements that are “on their way down” – they have yet to be resolved as belonging to a particular down buffer in the next level down. The element with overall smallest key is in the first down buffer at level c.

Priority Queue - Layout The priority queue is stored in a linear array. The levels are stored consecutively from smallest to largest. Each level occupies 3 times its size, starting with the up buffer (occupying one time its size) followed by the down buffers (occupying two times its size). The total size of the array is O(N).

Operations on Priority Queues We use two general operations – push and pull. Push inserts X elements into level X 3/2, and pull removes the X elements with smallest keys from level X 3/2, returning them in sorted order. Deletemin (insert) corresponds to pulling (pushing) an element from (to) the smallest level.

Operations (Cont.) Whenever an up buffer in level X overflows, we push the X elements in the buffer one level up, and whenever the down buffers in level X become too empty, we pull X elements from one level up. Both the push and the pull operations maintain the three invariants.

Push To insert X elements into level X 3/2, we first sort them using O(1+(X/B)log (M/B) (X/B)) memory transfers and O(Xlog 2 X) time. Next we distribute them into the X 1/2 down buffers of level X 3/2. We append an element to the end of the current down buffer, and advance to the next down buffer as soon as we encounter an element with larger key than the key of the pivot of the current down buffer.

Push - Demonstration.......... Level X 3/2 Level X

Push (Cont.) Elements with keys larger than the keys of the pivot of the last down buffer are inserted in the up buffer........... Level X 3/2 Level X

Push (Cont.) Scanning through the elements takes O(1+X/B) memory transfers and O(X) time. We perform one memory transfer for each of the X 1/2 down buffers (in the worst case), so the total cost of distributing the elements is O(X/B+X 1/2 ) memory transfers and O(X+X 1/2 ) = O(X) time.

Push - The Elements Distribution If a down buffer runs full (contains 2X elements), we split the buffer into two down buffers each containing X elements, by finding the median of the elements in O(1+X/B) memory transfers and O(X) time and partitioning them in a simple scan in O(X) memory transfers. We assume that the down buffer can be stored out of order, so we just have to update the linked list of buffers.

Push – Splitting a Down Buffer.......... Level X 3/2 Level X

Push - The Distribution (Cont.) If the level has already the maximum of X 1/2 down buffers before the split, we remove the last down buffer by inserting its elements into the up buffer........... Level X 3/2 Level X

Push - The Distribution (Cont.) If the up buffer runs full (contains more than X 3/2 elements), then all of these elements are pushed into the next level up......... Level X 3/2 Level X

Push – Total Cost A push of X elements from level X into level X 3/2 can be performed in O(X 1/2 +(X/B)log (M/B) (X/B)) memory transfers and O(Xlog 2 X) time amortized, not counting the cost of any recursive push operations, while maintaining invariants 1-3.

Pull To delete X elements from level X 3/2, we first assume that the down buffers at level X 3/2 contain at least 3/2*X elements each, so the first three down buffers contain between 3/2*X and 6X elements. We find and remove the X smallest elements by sorting them using O(1+(X/B)log (M/B) (X/B)) memory transfers and O(Xlog 2 X) time.

Pull - Demonstration.......... Level X 3/2 Level X

Pull (Cont.) If the down buffers contain less than 3/2*X elements, we first pull the X 3/2 elements with smallest keys from one level up......... Level X 3/2 Level X

Pull (Cont.) Because these elements do not necessarily have smaller keys than the U elements in the up buffer, we first sort the up buffer, and then we insert the U elements with the largest keys to the up buffer and distribute the remaining elements in the down buffers......... Level X 3/2 Level X

Pull (Cont.) Afterwards, we can find the X elements with smallest keys as described earlier and remove them. The total cost of a pull of X elements from level X 3/2 down to level X is O(1+(X/B)log (M/B) (X/B)) memory transfers and O(Xlog 2 X) time amortized, not counting the cost of any recursive push operations, while maintaining invariants 1-3.

Total Cost #1 The insert (deletemin) is done by performing a push (pull) on the smallest level. This may require recursive pushes (pulls) on higher levels. To maintain that the structure uses Θ(N) space, and has Θ(log logN) levels, it is rebuilt after every N/2 operations (the cost of the rebuilding is worth the effort). It is done by using O((N/B)log (M/B) (N/B)) memory transfers and O(Nlog 2 N) time, or O((1/B)log (M/B) (N/B)) memory transfers and O(log 2 N) time per operation.

Total Cost #2 following the previous results, push or pull of X elements to level X use O(X 1/2 +(X/B)log (M/B) (X/B)) memory transfers. We can reduce this cost to O((X/B)log (M/B) (X/B)) by examining the cost for differently sized levels. In any way, pushing or pulling of X elements is charged to level X.

Total Cost #3 case 1: Pushing or pulling X≥B 2 elements into or from level X 3/2. In this case, O(X 1/2 +(X/B)log (M/B) (X/B)) = O((X/B)log (M/B) (X/B)). Proof: O(X 1/2 +(X/B)log (M/B) (X/B)) = O(B+(B 2 /B)log (M/B) (B 2 /B)) = O(B+Blog (M/B) (B 2 /B)) = O(Blog (M/B) (B 2 /B)) = O((X/B)log (M/B) (X/B))

Total Cost #4 case 2: Pushing or pulling X<B 2 elements into or from level X 3/2. In this case, all of the level is kept in memory (since the memory is bigger than B 2, because of the tall cache assumption), and therefore all transfer costs associated with this level would be eliminated (we assume that the optimal paging strategy is able to keep the relevant blocks in memory at all times, and thus eliminate these costs).

Total Cost #5 The total cost of an insert or deletemin operation in the sequence of N/2 such operations is memory transfers amortized, and amortized time.

Priority Queue and Graph Algorithms We will now show how the cache-oblivious priority-queue can be used to develop several cache-oblivious graph algorithms – list ranking, BFS, DFS and minimal spanning tree.

List Ranking In this problem we have a linked list with V nodes, stored as an unordered sequence, each containing the position of the next node in the list (an edge). Each edge has a weight and the goal is to find for each node v the sum of the weights of edges from v to the end of the list. For example, if all the weights are 1, then the goal is to determine the number of edges from v to the end of the list (called the rank of v).

List Ranking – Main Idea An independent set of Θ (V) nodes is found, nodes in the independent set are “bridging out”, the remaining list is recursively ranked, and finally the contracted nodes are reintegrated into the list, while computing their ranks.

List Ranking (Cont.) In this example, we have an independent set (nodes without edges to each other) of size 4.

List Ranking - Explanation The list ranking algorithm in details: Step 1: Finding an independent set. This step is done cache-obliviously (to be discussed later). This step is done in O(sort(V)) memory transfers.

List Ranking – Explanation (Cont.) Step 2: Taking out the independent set nodes by contracting edges incident to these nodes. We first identify a node u with a successor v in the independent set, by sorting the nodes by their successor position. We also identify the successor w of the independent set node v.

List Ranking – Explanation (Cont.) Next we create in a simple scan a new list where the two edges (u,v) and (v,w) have been replaced with an edge (u,w). The new edge has weight equals to the sum of the two old edges. We remove the independent set nodes and store the remaining nodes in a new list. We also create a list that indicates the old position of every node. Finally, we “compress” the remaining nodes.

List Ranking – Explanation (Cont.) Step 3: Ranking the remaining list. Step 4: Repeating steps 1-3 recursively.

List Ranking – Explanation (Cont.) Step 5: Finally, the independent set nodes are reintegrated into the list, while computing their ranks.

Finding the Independent Set As we promised, we will now show how to find the independent set cache-obliviously. The algorithm is based on 3-coloring – every node is colored with one of three colors such that adjacent nodes have different colors. The independent set (of size at least V/3) then consists of the nodes with the most popular color.

Computing 3-Coloring An edge (v,w) is called a forward edge if v appears before w in the sequence of nodes, otherwise it is called a backward edge. First the list is split into two sets – one consists of forward lists and one consists of backward lists. Every node is included in at least one set, when a node which is the head or the tail of a list may be included in both of the sets.

Computing 3-Coloring (Cont.) The nodes in the forward lists are colored red or blue alternatively (the heads are red), and the nodes in the backward lists are colored green or blue alternatively (the heads are green).

Computing 3-Coloring (Cont.) This way every node is colored in one color, except for the heads / tails, which have two colors. If the heads / tails of the lists were colored red and green or red and blue, they are colored red only. If they were colored blue and green, they are colored green only.

3-Coloring and Priority Queue #1 In a single scan the head nodes are identified, and for each such node v an red element is inserted in a cache- oblivious priority-queue with key equals to the position of v in the unordered list. Then the minimal key element e is repeatedly extracted from the queue.

3-Coloring and Priority Queue #2 If e is the node v, then v is colored the same color as e, and the successor of v is inserted in the queue. The inserted element is colored the opposite color of e. After finishing scanning and coloring all the forward lists, the backward lists are scanned and colored the same way.

3-Coloring – Demonstration #1 Starting the coloring:

3-Coloring – Demonstration #2 The head of a forward list is colored red.

3-Coloring – Demonstration #3 Coloring the forward list in red and blue, alternatively.

3-Coloring – Demonstration #6 The head of a backward list is colored green.

3-Coloring – Demonstration #7 Coloring the backward list in green and blue, alternatively.

3-Coloring – Demonstration #9 The head of a forward list is colored red.

3-Coloring – Demonstration #13 The head of a backward list is colored green.

3-Coloring – Demonstration #15 If the head / tail of a list was colored red and green or red and blue, it is colored red only. If it was colored blue and green, it is colored green only.

3-Coloring – Demonstration #16 Now the whole list is 3-colored, and so the next two lists are identically colored.

List Ranking - Complexity Finding the independent set can be done in O(sort(V)) memory transfers. The rest of the steps can be done in O(sort(V)) as well. Therefore, the list ranking problem on a V node list can be solved cache-obliviously in O(sort(V)) memory transfers.

Cache-Oblivious Euler Tour #1 An Euler Tour of a graph is a cycle that traverses each edge exactly once. Not every graph has an Euler Tour, but a tree, where each undirected edge has been replaced with two directed edges, does have an Euler Tour. To compute an Euler Tour of an undirected tree, is actually to compute an ordered list of the edges along the tour.

Cache-Oblivious Euler Tour #2

Cache-Oblivious Euler Tour #3 First, a cyclic order is imposed on the nodes adjacent to each node v in the tree. An Euler Tour is obtained if the tree is traversed such that a visit to v from u through the incoming edge (u,v) is followed by a visit to the node w, following u in the cyclic order, through the outgoing edge (v,w).

Cache-Oblivious Euler Tour #4 This way it is possible to compute the successor edge for each edge e, which is the edge following e in the Euler Tour. This is done by constructing a list of incoming edges to each node v, sorted according to the cyclic order. If two edges (u,v) and (w,v) are stored next to each other in the list, the successor edge for the incoming edge (u,v) is the outgoing edge (v,w).

Cache-Oblivious Euler Tour #5 This way all the successor edges are computed in a scan of the list. The problem we have now is identical to list ranking – we have a list of edges and the successor of each of them. Therefore computing the Euler Tour is done by list ranking the list of edges and then sorting the edges by their rank, what makes it possible to be done using O(sort(V)) memory transfers (the list ranking complexity).

Depth First Search (DFS) numbering Using our Euler Tour algorithm, we can easily compute a depth-first search (DFS) numbering of the nodes of a tree starting at a source node s. First note that if we start a walk of the Euler Tour in the source node s it is actually a DFS tour of the tree. To compute the numbering, we therefore first classify each edge as being either a forward or a backward edge. An edge is a forward edge if it is traversed before its reverse in the tour. Then we assign each forward edge weight 1 and each backward edge weight 0. The DFS number of a node v is then simply the sum of the weights on the edges from s to v.

Depth First Search (DFS) numbering Thus we can obtain the DFS numbering by solving the general version of list ranking. Since we only use Euler Tour computation, list ranking, sorting, and scanning, we in total compute the DFS numbering cache-obliviously in O(sort(V)) memory transfers.

DFS numbering on general graphs The algorithm presented in this paper for the directed graphs DFS numbering problem uses a number of data structures: V buffered priority trees, a buffered repository tree and a stack. (the first two will be presented shortly). This algorithm achieves the task in O((V+(E/B)*log 2 V+sort(E)) memory transfers.

Buffered repository tree (BRT) A buffered repository tree (BRT) maintains O(E) elements with keys in the range [1..V] under operations insert and extract. The insert operation inserts a new element, while the extract operation reports and deletes all elements with a certain key. Our cache-oblivious version of the BRT consists of a binary tree with the keys 1 through V in sorted order in the leaves. A buffer is associated with each node and leaf of the tree. The buffer of a leaf v contains elements with key v and the buffers of the internal nodes are used to perform insertions in a batched manner. We perform an insertion simply by inserting the new element into the root buffer.

Buffered repository tree (BRT) To perform an extraction of elements with key v we traverse the path from the root to the leaf containing v. At each node u on this path we scan the associated buffer and report and delete elements with key v. During the scan we also distribute the remaining elements among the two buffers associated with the children of u, we distribute an element with key w to the buffer of the child of u on the path to w. We place elements inserted in a buffer during the same scan consecutively in memory (but not necessarily right after the other elements in the buffer). This way the buffer of a node u can be viewed as consisting of a linked list of buckets of elements in consecutive memory locations.

Buffered repository tree (BRT) To avoid performing a memory transfer on each insertion, we implement the root buffer slightly different, namely as doubling array. The optimal paging strategy can keep the last block of the array in main memory and we obtain O(1/B) amortized root buffer insertion bound.

17 25 20 29 10 14 1519 21 17 13 2730 5 1 Buffered repository tree (BRT)

17 25 20 29 10 14 1519 21 17 13 2730 5 1 Inserting the element :21 21

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 Inserting the element :30

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21, 15,13,7,7 Inserting the element : 15,13,7,7

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21, 15,13,7,7 Deleting the element : 15

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13,7,7 Deleting the element : 15

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13 7,7 Deleting the element : 15

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13 7,7

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13 7,7 Inserting the elements : 1,13,15,21,19 1,13,15,21,19

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13 7,7 Deleting the element : 7 1,13,15,21,19

17 25 20 29 10 14 1519 21 17 13 2730 5 1 30, 21 13 7,7 Deleting the element : 7 19,21 1,13,15

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 7,7 Deleting the element : 7 19,21 113,15

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 Deleting the element : 7 19,21 13,15 1

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 19,21 13,15 1

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 19,21 13,15 1 Inserting the elements : 1,7,30 1,7,30

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 19,21 13,15 1 Deleting the elements : 27 1,7,30

17 25 20 29 10 14 1519 21 17 13 2730 5 30, 21 13 19,21 13,15 1 Deleting the elements : 27 30 1,7

17 25 20 29 10 14 1519 21 17 13 2730 5 19,21,21 13 13,15 1 Deleting the elements : 27 30,30 1,7

17 25 20 29 10 14 1519 21 17 13 2730 5 19,21,21 13 13,15 1 Deleting the elements : 27 30 1,7

17 25 20 29 10 14 1519 21 17 13 2730 5 19,21,21 13 13,15 1 30 1,7

Buffered repository tree (BRT) Lemma: A cache-oblivious buffered repository tree uses Θ(B) main memory space and supports insert and extract operations in O((1/B)log 2 V) and O(log 2 V) memory transfers amortized, respectively.

Proof: An insertion in the root buffer requires O(1/B) memory transfers amortized. During an extract operation,we use O(X/B+K) memory transfers to empty the buffer of a node U containing X elements in K buckets (since we access each element and each bucket). We charge the X/B term to the insert operations that inserted the X elements in the BRT. Since each element is charged at most once on each level of the tree, an insert is charged O((1/B)log 2 V) transfers. We charge the K term to the extract operation that created the K buckets. Since an extract operations creates 2 buckets on each level of the tree, it is charged a total of O(log 2 V) memory transfers.

Buffered priority tree (BPT) A buffered priority tree is initially constructed on O(E v ) elements with keys in the range [1..V], and maintains these elements under buffered delete operations. Given E’ elements currently in the structure, a buffered delete operation deletes the E’ elements and reports and deletes the minimal element in the structure. The buffer priority tree is implemented similarly to the buffered repository tree. It consists of a binary tree with keys 1 through V in sorted order in the leaves, with each node and leaf having a buffer associated with it. Initially, the O(E v ) elements are stored in the buffers of the leaves.

Buffered priority tree (BPT) Unlike the buffered repository tree, the buffers are used to store elements intended to be deleted from the structure, and we maintain that each node v contains a count of the number of (live) elements stored in the tree rooted in v. To perform a buffered delete we first insert the E’ elements in the buffer of the root and update the counter of the root. To find the minimal element, we then follow the leftmost path such that each node on the path has at least one live element beneath it. We can determine if a node is root in an empty tree using its counter.

Buffered priority tree (BPT) In each node we also scan the associated buffer and distribute the elements among the two buffers associated with its children, exactly as in the buffered repository tree. During the distribution, we also update the counters in the children. When we reach a leaf u, we report and delete one of the elements stored in u’s buffer. We also decrement the counter of the nodes along the path from the root to u.

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 9 5

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 9 5 מחיקה של האלמנטים : 7,15,21,21

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 5 מחיקה של האלמנטים : 7,15,21,21 7,15,21,21 9 5

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 9 5 מחיקה של האלמנטים : 7,15,21,21 7,15,21,21 4

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 9 5 מחיקה של האלמנטים : 7,15,21,21 7,15 21,21 4 2 3

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 4 מחיקה של האלמנטים : 7,15,21,21 7,15 21,21 94 21 5 3

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 31 3 2 מחיקה של האלמנטים : 7,15,21,21 21,21 94 15 7 4 21 0 2 5 3

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 3 2 4 מחיקה של האלמנטים : 7,15,21,21 21,21 94 15 7 1 3 2 1 0 1 5 3

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 3 2 4 מחיקה של האלמנטים : 7,15,21,21 21,21 94 15 1 1 1 0 3 2 1 5 3

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 2010311 3 3 2 4 מחיקה של האלמנטים : 7,15,21,21 21,21 94 15 1 1 1 1 5 3 1 0

17 25 20 29 10 14 1519 21 17 13 2730 5 1 7 157 21 2730 12010311 3 3 2 4 מחיקה של האלמנטים : 7,15,21,21 21,21 94 15 1 1 1 5 3 1 0

17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21 4 15 1

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21 4 15 1

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21 4 15 1 30 3

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21 4 15 1 30 3 2

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21,30 4 15 1 3 2 2

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 10 3 2 3 21,21,30 4 15 1 3 2 2 0

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 01010311 0 3 2 3 21,21,30 4 15 1 3 2 2 0 10

מחיקה של האלמנט : 30 17 25 20 29 10 14 1519 21 17 13 2730 5 157 21 2730 0010311 0 3 2 3 21,21,30 4 15 1 3 2 2 0 10 10

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 21,21,30 15 2 2 0 0 0

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 21,21,30 15 2 2 0 0 0 מחיקה של האלמנט : 27 27 1

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 21,21,30 15 2 2 0 0 0 מחיקה של האלמנט : 27 27 1 0

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 21,21,27,30 15 2 2 0 0 0 מחיקה של האלמנט : 27 1 0 1

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 21,21,27,30 15 2 2 0 0 0 מחיקה של האלמנט : 27 1 0 1 0

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 15 2 2 0 0 0 מחיקה של האלמנט : 27 1 0 1 0 27,30 21,21 0 1

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 15 2 2 0 0 0 מחיקה של האלמנט : 27 1 0 1 0 27,30 21,21 0 10

17 25 20 29 10 14 1519 21 17 13 2730 5 15 21 2730 0010311 0 3 2 15 2 2 0 0 0 מחיקה של האלמנט : 27 1 0 1 0 27,30 0 10 1

17 25 20 29 10 14 1519 21 17 13 2730 5 152730 001011 0 15 0 0 0 0 0 27,30 0 0 0

Buffered priority tree (BPT) Lemma: A cache-oblivious buffered priority tree supports buffered deletes of E’ elements in O(((E’/B) + 1)*log 2 V) memory transfers. It can be constructed in O(sort(E v )) memory transfers.

Buffered priority tree (BPT) Proof: The number of transfers needed to construct the tree is dominated by the O(sort(E v )) memory transfers needed to sort the O(E v ) elements and construct the leaf buffers. After that the tree can be constructed in O(V/B) transfers level-by-level bottom-up. The amortized cost of a buffered delete is equal to the cost of inserting E’ elements into a BRT, plus the cost of extracting an element from a BRT, that is, O(((E’/B)+1)*log 2 v) memory transfers.

DFS algorithm As mentioned, the directed DFS numbering algorithm presented in this paper utilizes a number of data structures: a stack S containing vertices on the path from the root of the DFS tree to the current vertex. a buffered priority tree P v for each vertex v containing edges (v,w) connecting v with a possibly unvisited vertex w. one buffered repository tree D containing edges (v,w) incident to a vertex w that has already been visited but where (v,w) is still present in P v. The key of an edge (v,w) in D is v and the key of an edge (v,w) in P is w.

DFS algorithm- continue Initially each P v is constructed on the E v edges of the form (v,w) incident to v, the source vertex is placed on the stack S, and the BRT D is empty. To compute the DFS numbering, the vertex u on the top of the stack is repeatedly considered. All edges in D of the form (u,w) are extracted and deleted from P u using a buffered delete operation. If P u is now empty, so that no minimal element (edge (u,v)) is returned, all neighbors of u have already been visited and u is popped off S. Otherwise vertex v is visited next, it is numbered and pushed on S, and all edges (w,v) incident to it ate inserted in D (since v is now visited).

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Y,U) s D (W,P) (W,V) (Y,Z) (Z,Q) (Z,W) (U,V) (U,W) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) (Y,U) s D (W,P) (W,V) (Y,Z) (U,V) (U,W) Y 1 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (W,P) (W,V) (Y,Z) (U,V) (U,W) Y 1 U 2 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (W,P) (W,V) (Y,Z) (U,V) (U,W) Y 1 2 U DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (W,P) (W,V) (Y,Z) (U,W) Y 1 U 2 V 3 (W,V) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (W,P) (W,V) (Y,Z) (U,W) Y 1 U 2 3 (W,V) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (W,P) (W,V) (Y,Z) Y 1 U 2 3 (W,V) W 4 (Z,W) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 U 2 3 (W,P) W 4 (Z,W) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 U 2 3 (W,P) W 4 (Z,W) P 5 (Q,P) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 U 2 3 W 4 (Z,W) P 5 (Q,P) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 U 2 3 W 4 (Z,W) 5 (Q,P) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 U 2 3 4 (Z,W) 5 (Q,P) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 2 3 4 (Z,W) 5 (Q,P) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D (Y,Z) Y 1 2 3 4 (Z,W) 5 (Q,P) Z 6 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) (Z,Q) (Z,W) s D Y 1 2 3 4 5 (Q,P) Z 6 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) s D Y 1 2 3 4 5 Z 6 (Z,Q) DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) s D Y 1 2 3 4 5 Z 6 (Z,Q) Q 7 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P (Q,P) s D Y 1 2 3 4 5 Z 6 Q 7 DFS algorithm- example

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P s D Y 1 2 3 4 5 Z 6 Q 7

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P s D Y 1 2 3 4 5 Z 6 7

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P s D Y 1 2 3 4 5 6 7

W Q U Y V Z PQPQ PUPU PVPV PWPW PYPY PZPZ P P s D 1 2 3 4 5 6 7

DFS algorithm - analysis To analyze the algorithm we first note that each vertex is considered on the top of S a number of times equal to one greater than its number of children in the DFS tree. Thus the total number of times we consider a vertex on top of S is 2V-1. When considering a vertex u we first perform a stack operation on S, an extract operation on D, and a buffered delete operation on P v. The stack operation requires O(1/B) memory transfers (since the optimal paging strategy can keep the last block of the stack in the main memory) and the extraction requires O(log 2 V) transfers. Thus these costs add up to O(Vlog 2 V).

DFS algorithm – analysis (continue) The buffered delete operation requires O((1+(E’/B)(*log 2 V) memory transfers if E’ is the number of deleted elements (edges). Each edge is deleted once, so overall the buffered deletes costs add up to O((V+(E/B)(*log 2 V). Next insert the, say E’’, edges incident to v in D. This requires O(1+(E’’/B)*log 2 V( memory transfers, or O(V+ (E/B)*log 2 V) Transfers over the whole algorithm. The initial construction of the buffered priority trees requires O(sort(E)) memory transfers. Thus overall the algorithm uses O((V+(E/B))*log 2 V + sort(E)) memory transfers.

Minimal Spanning Forest (MSF) In the I/O model, a sequence of algorithms have been developed for the problem of Computing the minimal spanning forest of an undirected weighted graph, culminating in an algorithm using O(sort(E)*log 2 log 2 ((V*B)/E)) memory transfers developed by Arge et al. This algorithm consists of two phases, In the first phase an edge contraction algorithm is used to reduce the number of vertices to O(E/B). In the second phase a modified version of Prim’s algorithm is used to finish the MFS computation.

Using our cache-oblivious priority queue we can relatively easily modify both of the phases to work cache-obliviously. However, since we can’t decide cache-obliviously when the first phase has reduced the number of vertices to O(E/B), we are not able to combine the two phases as effectively as in the I/O model. Minimal Spanning Forest - continue

Phase 1 – An edge contraction algorithm The basic edge contraction based MSF algorithm proceeds in stages. In each stage the minimum weight edge incident to each vertex is selected and output as part of the MSF, and the vertices connected by the selected edges ate contracted into super vertices (that is, the connected components of the graph of selected edges are contracted).

W Q U Y V Z P 8 R S N 4 7 3 5 5 1 12 8 3 5 4 4 2 1 6 Phase 1 – An edge contraction algorithm

W Q U Y V Z P 8 R S N 4 7 3 5 1 12 8 3 4 4 2 1 6 5 5 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1

W Q U Y V Z P R S N 4 3 1 3 2 1 W Q U Y V Z P R S N 4 3 1 3 2 1

W Q U Y V Z P R S N 4 3 1 3 2 1 W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 8 5 5 6 12 4 4 8 5 W Q U Y V Z P 8 R S N 4 7 3 5 1 8 3 4 4 2 1 6 5 5 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 8 5 5 6 12 4 4 8 5 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 8 5 5 6 12 4 4 8 5 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 8 5 5 6 12 4 4 8 5 4 5 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 8 5 5 6 12 4 4 8 5 5 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 S’ U’ W’ R’ 5 4 8 5 6 12 4 8 5 W’’ R’’ 5 4 Phase 1 – An edge contraction algorithm

S’ U’ W’W’ R’ 8 5 5 6 12 4 4 8 5 W Q U Y V Z P R S N 4 3 1 3 2 1 W’’ R’’ 8 12 5 8 5 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 W’’ R’’ 8 12 5 8 5 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 W’’ R’’ 8 12 5 8 5 5 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P R S N 4 3 1 3 2 1 W’’ R’’ 8 12 5 8 5 5 W’’ 4 Phase 1 – An edge contraction algorithm

W Q U Y V Z P 8 R S N 4 7 3 5 5 1 12 8 3 5 4 4 2 1 6 W Q U Y V Z P R S N 4 3 5 1 3 5 4 2 1 Phase 1 – An edge contraction algorithm

Lemma: The minimal spanning forest of an undirected weighted graph can be computed cache – obliviously in O(sort(E)*log 2 V) memory transfers.

Phase 1 – An edge contraction algorithm Proof: We can select the minimum weight edges in O(sort(E)) memory transfers using few scans and sorts. To perform the contraction, we select a leader vertex in each connected component of the graph G s of selected edges and replace every edge (u,v) in G with the edge (leader(u),leader(v)), this can be done in O(sort(E)) memory transfers using a few sort and scan steps on the vertices and edges.

Phase 1 – An edge contraction algorithm Proof: (continue) Since each contraction stage reduces the number of vertices by a factor of two, and since a stage is performed in O(sort(E)) memory transfers, we can reduce the number of vertices to V’=V/2 i in O(sort(E)*log 2 (V/V’)) memory transfers by performing I stages after each other. Thus we obtain an O(sort(E)*log 2 V) algorithm by continuing the contraction until we are left with no edges. (Arge et al. showed how to improved this bound to O(sort(E)*log 2 log 2 V), the extra steps involved in their improvement can be made cache-oblivious).

Phase 2 – Modified version of Prim’s algorithm W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 Reminder: Prim’s algorithm.

Phase 2 – Modified version of Prim’s algorithm Like Prim’s algorithm, this algorithm grows the MSF iteratively. During the algorithm, we maintain a priority queue P containing (at least) all edges connecting vertices in the current MSF with vertices not in the tree. P can also contain edges between two vertices in the MSF. Initially it contains all edges incident to the source vertex. In each step of the algorithm we extract the minimum weight edge (u,v) from P, if v is already in the MSF we discard the edge, otherwise we include v in the MSF and insert all edges incident to v, except (v,u), in the priority queue.

Phase 2 – Modified version of Prim’s algorithm We can efficiently determine if v is already in the MST, since if u and v are both already in the MSF then (u,v) must be in the priority queue twice, thus if the next edge we extract from P is also (u,v) then v is already in the MSF.

W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 We assume that all edge weights are distinct. Phase 2 – Modified version of Prim’s algorithm

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 Phase 2 – Modified version of Prim’s algorithm

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,U,1) (Y,Z,15) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (Y,U,1) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,U,1) (Y,Z,15) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,U,1) (Y,Z,15) (U,V,3) (U,R,17) (U,W,18) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,V,3) (U,R,17) (U,W,18) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,V,3) (U,R,17) (U,W,18) (V,W,11) (V,N,6) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (V,N,6) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (V,N,6) (N,P,8) (N,R,4) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (N,P,8) (N,R,4) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (N,P,8) (N,R,4) (R,U,17) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (N,P,8) (R,U,17) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (N,P,8) (R,U,17) (P,W,2) (P,Q,19) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (R,U,17) (P,W,2) (P,Q,19) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (R,U,17)(P,W,2) (P,Q,19) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (V,W,11) (R,U,17) (P,W,2) (P,Q,19) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,W,2) (P,Q,19) (V,W,11) (W,U,18) (W,S,14) (W,Z,10) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Y,Z,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,Q,19) (V,W,11) (W,U,18) (W,S,14) (W,Z,10) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,Q,19) (V,W,11) (W,U,18) (W,S,14) (W,Z,10) (Z,S,9) (Z,Q,12) (Y,Z,15) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,Q,19) (V,W,11) (W,U,18) (W,S,14) (Z,S,9) (Z,Q,12) (Y,Z,15) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,Q,19) (V,W,11) (W,U,18) (S,W,14) (Z,S,9) (Z,Q,12) (Y,Z,15) (W,S,14) (S,Q,16) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (W,V,11) (R,U,17) (P,Q,19) (V,W,11) (W,U,18) (S,W,14) (Z,Q,12) (Y,Z,15) (W,S,14) (S,Q,16) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (R,U,17) (P,Q,19) (W,U,18) (S,W,14) (Z,Q,12) (Y,Z,15) (W,S,14) (S,Q,16) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (R,U,17) (P,Q,19) (W,U,18) (S,W,14) (Z,Q,12) (Y,Z,15) (W,S,14) (S,Q,16) (Q,P,19) (Q,S,16) Phase 2- continue

P W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 (Z,Y,15) (U,R,17) (U,W,18) (R,U,17) (P,Q,19) (W,U,18) (S,W,14) (Y,Z,15) (W,S,14) (S,Q,16) (Q,P,19) (Q,S,16) Phase 2- continue

W Q U Y V Z P 15 R S N 9 16 12 14 10 1 18 11 2 8 17 3 6 4 19 W Q U Y V Z P R S N 9 12 10 1 2 8 3 6 4 Phase 2- continue

Phase 2 – Modified version of Prim’s algorithm Lemma: The minimal spanning forest of an undirected weighted graph can be computed cache-obliviously in O(V+sort(E)) memory transfers.

Phase 2 – Modified version of Prim’s algorithm Proof: During the algorithm we access the edges in the adjacency list of each vertex v once (when v is included in the MSF) for total of O(V+E/B) memory transfers. We also perform O(E) priority queue operations, for total of O(sort(E)) memory transfers. Thus the algorithm uses O(V+sort(E)) memory transfers.

Combined algorithm In the I/O model, an O(sort(E)*log 2 log 2 (V*B/E)) MSF algorithm can be obtained by running the phase 1 algorithm until the number of vertices have been reduced to V’=E/B using O((sort(E)*log 2 log 2 (V*B/E)) memory transfers, and then finishing the MSF in O(V’+sort(E)) = O(sort(E)) memory transfers using the phase 2 algorithm. As mentioned, we can’t combine the two phases as effectively in the cache-oblivious model. In general however, we can combine the two algorithms to obtain an O(sort(E)*log 2 log 2 (V/V’)+V’) algorithm for any V’ independent of B and M.

Conclusions

Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor.

Similar presentations

Presentation on theme: "Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor.

Similar presentations

Presentation on theme: "Cache-Oblivious Priority Queue and Graph Algorithm Applications Presented by: Raz Berman and Dan Mor."— Presentation transcript:

Similar presentations

About project

Feedback