A Survey of Techniques for Designing I/O-Efficient Algorithm S.Fahimeh Moosavi Fall 1389
Basic Techniques Scanning -N/B I/Os while linear scanning the whole array. Sorting -O((N/B)log M/B N/B) I/Os. 2
Simulation of Parallel Algorithms in External Memory 3
PRAM [Parallel Random Access Machine] p processors, each with local memory Each processor has unique id in range 0-(p-1) Shared memory reads and writes 4
At each unit of time, a processor is either active or idle (depending on id) At each time step, all processors may execute different instructions on different data. constant time! Note: any pair of processor P i P j can communicate in constant time! P i writes the message in cell x at time t P j reads the message in cell x at time t+1 Measures of performance: 1. Running time. 2. Amount of work it performs. 5
PRAM-algorithm A : Uses: N processor O(N) space Run time: O(T(N)) Assumption: every computation step of a processor consists of a constant number of write/read accesses to shared memory. 6
Simulation one step of algorithm A 1. Scan the list of processor context (read accesses read requests). 2. Sort the resulting list of read request by the memory locations they access. 3. Scan the sorted list of read request and memory representation. 4. Sort the list of read request again, by the issuing processor. 5. Scan the sorted list of read request and the list of processor context to transfer the requested operands to each processor. 7
O(1) scans of list of processor context. O(1) scans of the representation of the shared memory. A constant number of times scanning and sorting the list of read/write request. All this lists have size O(N). Consequence: Simulation one step of algorithm A takes O(sort(N)) I/Os. Theorem 3.2. A PRAM algorithm that uses N processors and O(N) space and runs in time T(N) can be simulated in O(T(N).sort(N)) I/Os. 8
Time-Forward Processing 9
Evaluating a DAG G L: an assignment of labels L(v) to the vertices of G. Goal: compute another labelling S of the vertices of G so that for every vertex vG, S(v) be computed from L(v) and S(u 1 ),..., S(u k ) (u 1, …, u k : in-neighbors of v). 10
Expression-tree evaluation Input: a binary tree T whose leaves store real number and internal vertices store binary operation. If v is leaf then val(v)=number stored at v. If v is internal vertex with label o, left child x, right child y then val(v) = val(x) o val(y). 11
Evaluate a DAG G I/O-efficiently Two assumption: the vertices of G have to be stored in topologically sorted order. label S(v) has to computable from labels L(v) and S(u 1 ),..., S(u k ) in O(sort(k)) I/Os. 12
Insertion and deletemin operations on Q (priority queue) be performed in O((1/B).log (|E|/B) M/B ). Total number of priority queue operations: O(|E|) (Every edge inserted into and deleted from Q exactly once). Consequenc: all updates of priority queue takes O(sort(|E|)) I/Os. 13
Note: Vertex set + adjacency lists scanned: O(scan(|V| + |E|)) I/Os. Computation labels S(v) from L(v) and S(u 1 ),..., S (u k ), for all v G, takes O(sort(|E|)). Theorem 3.3. given a DAG G =(V,E) whose vertices are stored in topologically sorted order, graph G can be evaluated in O(sort(|V|+|E|)) I/Os, provided that the computation of the label of every vertex v G can be carried out in O(sort(deg - (v))) I/Os, where deg - (v) is the in-degree of vertex v. 14
Greedy Graph Algorithm 15
A vertex labelling algorithm A call: single-pass: if it compute the desired labelling of vertices of the graph by visiting every vertex exactly once. local: if label L(v) can be computed in O(sort(k)) I/Os from labels L(u 1 ),..., L (u k ), where u 1,...,u k the neighbors of v whose labels are computed before L(v). Presortable: if there is an algorithm that take O(sort(|V|+|E|)) I/Os to compute an order of the vertices of the graph so that A produces a correct result if it visits the vertices of the graph in this order. 16
Main Problems at Make Algorithm A I/O-efficient determine an order in which algorithm A should visit the vertices of graph. devise a mechanism that provides every vertex v with the labels of its previously visited neighbors. 17
Theorem 3.4. Every graph problem P that can be solved by a presortable local single-pass vertex labelling algorithm can be solved in O(sort(|V|+(|E|)) I/Os. Proof: A: presortable local single-pass vertex labelling algorithm. L: labelling of vertices of a graph G=(V,E). A ́ : an algorithm that takes O(sort(|V|+(E)) I/Os to compute an order of the vertices of G (numbering the vertices) G ́́ : a derived DAG from G by directing every edge from the with smaller number to the vertex with larger number. Hence, labelling L can be computed using time- forward processing. 18
Computing a Maximal Independent Set In internal memory: Process the vertices in an arbitrary order. When a vertex v V is visited, add it to S if none of its neighbors is in S. Translate into a labelling problem: X s :V →{0,1} If v S then X s (v)=1, If v S then X s (v)=0. Theorem 3.5. Given an undirected graph G=(V,E), a maximal independent set of G can be found in O(sort(|V|+(E)) I/Os and linear space. 19
Coloring Graphs of Bounded Degree In internal memory: Process the vertices in an arbitrary order. When a vertex vV is visited, assign a color c(v) {1, …, Δ +1} to vertex v that has not been assigned to any neighbor of v. Theorem 3.6. Given an undirected graph G=(V,E) whose vertices have degree at most Δ, a ( Δ +1)-coloring of G can be found in O(sort(|V|+(|E|)) I/Os and linear space. 20
choose an arbitrary order. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors
22 Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors
23 -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors
Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it.
choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors
choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors
List Ranking and the Euler Tour Technique 27
List Ranking List ranking problem: computing distance from head of linked list L to x i, for every vertex of L (the number of edges on the path from head of L to x i). Solving in internal memory: Starting at the head of the list, follow successor pointers and number the vertices of the list from 0 to N- 1 in the order they are visited. 28
Generalization of the List Ranking(prefix product) Input: λ : {x 1,…,x N } → X : X×X → X Output: Ø(x i ) For each vertex x i of L such that 1. Ø( x σ(1) )=λ( x σ(1) ) 2. Ø( x σ(i) )= Ø( x σ(i-1) ) λ( x σ(i) ) (1< i ≤ N) Where σ=[1,N] → [1,N] is a permutation, And x σ(1) is the head of L, And succ( x σ(i) )= x σ(i+1). 29 I/O complexity: O(sort(N))
Example List ranking: Generalization:
Internal memory algorithm is not I/O- efficient The internal memory algorithm spends (N) I/Os in the worst case. 31
An Efficient List Ranking Algorithm 1. Find an independent set I of L so that |I|= Ω (N). 2. Remove elements of I from L. for every element x I with predecessor y and successor z in L let succ(y)=z. The label of x multiplied with the label of z, and result assigned to z. 3. Apply this algorithm recursively to the compressed list. 4. Compute the ranks of elements in I by multiplying their labels with the ranks of their predecessors in L. 32
Example
I/O-Complexity Every step, except the recursive invocation, takes O(sort(N)) I/Os. Total I/O-complexity: Ι(N)=Ι(cN)+O(sort(N)) (0<c<1). Solution of this recurrence: O(sort(N)). Theorem 3.7. A list of length N can be ranked in O(sort(N)) I/Os. 34
The Euler Tour Technique Euler tour of a tree: a traversal of T that traverses every edge exactly twice, once in each direction. Tour is represented as a linked list L whose elements are the edges in the set {(v,w),(w,v):{v,w} E} so that for any two consecutive edges e 1 and e 2, the target of e 1 is the source of e r
Define an Euler tuor Choose a circular order of the edges incident to each vertex. Let {v,w 1 }, …, {v,w k } be the edges incident to vertex v. then let succ((w i,v))=(v,w i+1 ) for 1 ≤i<k and succ((w k, v))=(v,w 1 ). Now by choosing an edge (v,r) with succ((v,r))=(r,w), setting succ((v,r))=null, and choosing (r,w) as the first edge of the traversal. 36
Computing List L Input: a tree T=(V,E) Output: an tour L 37 Lemma 3.8. an Euler tour L of a tree with N vertices can be computed in O(sort(N)) I/Os. 1. Scan set E to replace every edge {v,w} with two directed edge (v,w) and (w,v). 2. Sort the resulting set of directed edges by their target vertices. 3. Scan the sorted edge list to compute the successor of every edge in L.
Rooting Tree 38 Rooting tree T= computing for every edge {v,w} who is the parent and who is the child. Definition: for every pair of opposite edges (u,v), (v,u) in the ranked euler tour, we call: Forward edge: the edge with the lower rank. Back edge: the other.
Algorithm 39 Input: an unrooted (and undirected) tree T and a special vertex r. Output: For each vertex v r, the parent p(v) of v in the tree rooted at r. 1. Construct an euler tour starting at an edge (r,v). 2. Compute the rank of every edge in the list. For every pair of adjacent vertices x and p(x), edge (p(v),x) is a forward edge, and edge (x, p(v)) is a back edge.
I/O Complexity Constructing euler tour starting at r: O(sort(N)) I/Os. Ranking euler tour: O(sort(N)) I/Os. Extracting the set of forward edge: O(sort(N)) I/Os. 40 I/O complexity: O(sort(N))
Computing a Preorder Numbering A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os preorder#(r) = 1 preorder#(v) = rank((p(v),v))+1 41
Computing Subtree Sizes The nodes of T can be labelled with their subtree sizes in O(sort(N)) I/Os
Graph Blocking 43
Blocking Graph Goal: laying out graphs on disk so that traversals of paths in this graphs cause as few page faults as possible. Assumptions: 1. Graph to be stored on disk is static. 2. The paths are traversed in an online fashion. Measures of performance: 1. Number of page faults incurred by a path traversal in the worst case. 2. Amount of space used by the graph representation. 44
Notes: 1. In order to store a Graph with N vertices at least N/B blocks are required. 2. The traversal of a path with length L causes at least L/B page faults Definition: storage blow-up a graph blocking to be β if it uses βN/B blocks of storage to store the graph on disk. 45
Blocking List Natural Approach, β = 1 Simple traversal in direction 1.. N, With the traversal a path of length L only L/B page faults occur. More complicated traversal, alternatives if M ≥ 2B, Keep last block in Memory so a page fault occurs every B-1step. With the traversal a path of length L at most L/B page faults occur N
The Pathological Situation M=B An adversary can choose a path that causes a page fault every single step by choosing a path p=(v, w, v, w, …) Whenever vertex v is visited, the block containing v is brought into main memory, thereby overwriting the block containing w. 47 wv vw
Thwarting the adversary’s strategy: In a second array stores a copy of the array with an offset B/2. This implies that the visited vertex v is at least B/2-1steps from the next page fault away, since the page handler alternates between the two arrays every time a page fault occurs. Result: Traversing a path of L now incurs at most 2L/B page faults. 48 Choose β = 2,
Blocking Trees To blocking trees needs some more restrictions on the tree or the type of traversal. Consider a tree with internal degree M, then for any vertex v at most M-1 of its neighbors can reside in memory at the same time as v. So an adversary could always choose the missing neighbor to cause a page fault. Result: For unrestricted traversals, a good blocking of a tree can be achieved if the degree of the vertices of the tree is bounded by some constant d. 49
Construct Layout Choose one vertex r of T as the root, construct two partitions with layers of height log d B. i-th layer contains: Partition 1: all vertices have distance (i-1)log d B... ilog d B-1 Partition 2: all vertices have distance (i-1/2)log d B... (i+1/2)log d B-1 50 log d B
Each layer in both partitions consists of subtrees of size at most B, so that each subtree can be stored in a block. Small subtrees can be packed into blocks so that no block is less than half full. Result: both partitions together uses at most 4N/B block, and the storage blow-up is at most four. 51
Paging Algorithm The paging algorithm alternates between the two of the possible partitions. If v be a vertex that causes a page fault, all vertices that can be reached from v without page fault are at most (log d B)/2-1 steps away. Traversing a path of length L causes at most 2L/(log d B) page faults. 52
Special Case If all traversed path are restricted to travel away from the root of T: 1. the storage blow-up can be reduced to two, 2. the number of page faults can be reduced to L/(log d B). Because only the first of the above partitions is needed. Traversal towards the root: 1. Using O(N/B) disk blocks, 2. A page fault occurs every Ω (B) steps, 3. A path of length L can be traversed in O(N/B) I/Os. 53
Blocking Grids List are equivalent to 1D grids (in both cases, the grid is covered with subgrids of size B). In 2D case subgrids have dimension √Bx√B (such a covering calls tessellation). 54
A blocking that consists of three tessellation guarantees that a page fault occurs only every ω(1) steps. 55 Blocking 2D Grids √B Subgrids of dimension √Bx√B are blocked Tessellation of 2 is not sufficient if M == B
56 If M == B then a storage blow-up of 3 is required to have a Page fault at most every √B/6 step Tessellations have an offset of √B/3 in x and y direction Paging algorithm brings appropriate grid into memory
57 If M >= 2B then a storage blow-up of 2 is sufficient to achieve a page fault every √B/4 Tessellations have an offset of √B/2 in x and y direction
58 If M >= 3B then a storage blow-up of 1 ensures at most 4L/ √B page faults where L is the length of the path Proof: within √B/2 steps can occur at most 2 page faults. If v causes a page fault after visiting u we can reach all vertices in distance √B/2 of v without another page fault uv
Blocking Planar Graphs 59 Graphs must have bounded degree. With constant storage blow-up an upper bound of 4L/(log d B) page faults can be guaranteed for a path with length L. Where d is the maximal degree of a vertex.
We divide the graph in regions and call vertices interior if they are connected only to vertices in the same region, vertices that are part of at least two regions are called boundary vertices. A B-Division is a covering of the Graph by O(N/B) regions. Each region has at most B vertices. 60
The total number of boundary vertices is due to Frederikson O(N/√B). For every planar graph G exists a Set S of O(N/ √B) boundary vertices so that no region is larger than B. 61
We ensure every region is stored in a single block and if it is at least than half full we pack regions together into a block so we do not have blocks that are less than half full. This representation uses at most 2N/B blocks which is in O(N/B) In a next step we block the neighborhood of a boundary vertex v. 62
building the neighborhood of all vertices reachable from v in (log d B)/2 steps. This vertices fit √B times into a single block since d^(log d B/2) = √B. We divide all boundary nodes in subsets of √B vertices and store them with their neighborhood in a block. So we get O(((N√B)/√B)/B) = O(N/B) blocks. therefore the whole storage blow-up is O(1). 63
Page faults: As long as the path remains in the same region we do not get any page faults. If the path leaves a region and therefore causes a page fault have to sit at a boundary vertex so if we bring the block containing the boundary vertex and its neighborhood in to memory we can make (log d B)/2 steps before another page fault occurs. Theorem 3.9: A planar graph with N vertices of degree at most d can be stored in O(N/B) blocks so that any path of length L can be traversed in O(L/log d B) I/O’s. 64
End. 65