Presentation is loading. Please wait.

Presentation is loading. Please wait.

I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.

Similar presentations


Presentation on theme: "I/O-Algorithms Lars Arge University of Aarhus February 21, 2005."— Presentation transcript:

1 I/O-Algorithms Lars Arge University of Aarhus February 21, 2005

2 Lars Arge I/O-Algorithms 2 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem I/O: Movement of block between memory and disk D P M Block I/O

3 Lars Arge I/O-Algorithms 3 Fundamental Bounds [AV88] Internal External Scanning: N Sorting: N log N Permuting –List rank Searching:

4 Lars Arge I/O-Algorithms 4 B-tree B-tree with branching parameter b and leaf parameter k (b,k≥8) –All leaves on same level and contain between 1/4k and k elements –Except for the root, all nodes have degree between 1/4b and b –Root has degree between 2 and b B-tree with leaf parameter –O(N/B) space –Height – amortized leaf rebalance operations – amortized internal node rebalance operations B-tree with branching parameter B c, c≥1, and leaf parameter B –Space O(N/B), updates, queries

5 Lars Arge I/O-Algorithms 5 Secondary Structures When secondary structures used, a rebalance on v often require O(w(v)) I/Os (w(v) is weight of v) –If inserts have to be made below v between operations  O(1) amortized split bound  amortized insert bound Nodes in standard B-tree do not have this property  tree

6 Lars Arge I/O-Algorithms 6 Weight-balanced B-tree Idea: Combination of B-tree and BB[  ]-tree –Weight constraint on nodes instead of degree constraint –Rebalancing performed using split/fuse as in B-tree Weight-balanced B-tree with parameters b and k (b>8, k≥8) –All leaves on same level and contain between k/4 and k elements –Internal node v at level l has w(v) < –Except for the root, internal node v at level l have w(v)> –The root has more than one child level l-1 level l

7 Lars Arge I/O-Algorithms 7 Weight-balanced B-tree Weight-balanced B-tree with branching parameter b and leaf paramater k=Ω(B) –O(N/B) space –Height – rebalancing operations after update –Ω(w(v)) updates below v between consecutive operations on v Weight-balanced B-tree with branching parameter B c and leaf parameter B –Updates in and queries in I/Os Construction bottom-up in I/O level l-1 level l

8 Lars Arge I/O-Algorithms 8 Persistent B-tree Update current version (getting new version) Query all versions N is number of updates performed –O(N/B) space – update – query in any version

9 Lars Arge I/O-Algorithms 9 Persistent B-tree Idea: Elements augmented with “existence interval” and stored in one structure Persistent B-tree with parameter b (>16): –Directed graph *Nodes contain elements augmented with existence interval *At any time t, nodes with elements alive at time t form B-tree with leaf and branching parameter b –B-tree with leaf and branching parameter b on indegree 0 node  If b=B: –Query at any time t in I/Os

10 Lars Arge I/O-Algorithms 10 B-tree Construction In internal memory we can sort N elements in O(N log N) time using a balanced search tree: –Insert all elements one-by-one (construct tree) –Output in sorted order using in-order traversal Same algorithm using B-tree use I/Os –A factor of non-optimal As discussed we could build B-tree bottom-up in I/Os –But what about persistent B-tree? –In general we would like to have dynamic data structure to use in algorithms  I/O operations

11 Lars Arge I/O-Algorithms 11 Main idea: Logically group nodes together and add buffers –Insertions done in a “lazy” way – elements inserted in buffers. –When a buffer runs full elements are pushed one level down. –Buffer-emptying in O(M/B) I/Os  every block touched constant number of times on each level  inserting N elements (N/B blocks) costs I/Os. Buffer-tree Technique B B M elements fan-out M/B

12 Lars Arge I/O-Algorithms 12 Definition: –B-tree with branching parameter and leaf parameter B –Size M buffer in each internal node Updates: –Add time-stamp to insert/delete element –Collect B elements in memory before inserting in root buffer –Perform buffer-emptying when buffer runs full Basic Buffer-tree $m$ blocks M B

13 Lars Arge I/O-Algorithms 13 Basic Buffer-tree Note: –Buffer can be larger than M during recursive buffer-emptying *Elements distributed in sorted order  at most M elements in buffer unsorted –Rebalancing needed when “leaf-node” buffer emptied *Leaf-node buffer-emptying only performed after all full internal node buffers are emptied $m$ blocks M B

14 Lars Arge I/O-Algorithms 14 Basic Buffer-tree Internal node buffer-empty: –Load first M (unsorted) elements into memory and sort them –Merge elements in memory with rest of (already sorted) elements –Scan through sorted list while *Removing “matching” insert/deletes *Distribute elements to child buffers –Recursively empty full child buffers Emptying buffer of size X takes O(X/B+M/B)=O(X/B) I/Os $m$ blocks M

15 Lars Arge I/O-Algorithms 15 Basic Buffer-tree Buffer-empty of leaf node with K elements in leaves –Sort buffer as previously –Merge buffer elements with elements in leaves –Remove “matching” insert/deletes obtaining K’ elements –If K’<K then *Add K-K’ “dummy” elements and insert in “dummy” leaves Otherwise *Place K elements in leaves *Repeatedly insert block of elements in leaves and rebalance Delete dummy leaves and rebalance when all full buffers emptied K

16 Lars Arge I/O-Algorithms 16 Basic Buffer-tree Invariant: Buffers of nodes on path from root to emptied leaf-node are empty  Insert rebalancing (splits) performed as in normal B-tree Delete rebalancing: v’ buffer emptied before fuse of v –Necessary buffer emptyings performed before next dummy- block delete –Invariant maintained v v v’ v v’’

17 Lars Arge I/O-Algorithms 17 Basic Buffer-tree Analysis: –Not counting rebalancing, a buffer-emptying of node with X ≥ M elements (full) takes O(X/B) I/Os  total full node emptying cost I/Os –Delete rebalancing buffer-emptying (non-full) takes O(M/B) I/Os  cost of one split/fuse O(M/B) I/Os –During N updates *O(N/B) leaf split/fuse * internal node split/fuse  Total cost of N operations: I/Os

18 Lars Arge I/O-Algorithms 18 Basic Buffer-tree Emptying all buffers after N insertions: Perform buffer-emptying on all nodes in BFS-order  resulting full-buffer emptyings cost I/Os empty non-full buffers using O(M/B)  O(N/B) I/Os  N elements can be sorted using buffer tree in I/Os $m$ blocks M B

19 Lars Arge I/O-Algorithms 19 Using buffer technique persistent B-tree built in I/O Insert and deletes on buffer-tree takes I/Os amortized –Alternative rebalancing algorithms possible (e.g. top-down) One-dim. rangesearch operations can also be supported in I/Os amortized –Search elements handle lazily like updates –All elements in relevant sub-trees reported during buffer-emptying –Buffer-emptying in O(X/B+T’/B), where T’ is reported elements Buffer-tree Technique $m$ blocks

20 Lars Arge I/O-Algorithms 20 Basic buffer tree can be used in external priority queue To delete minimal element: –Empty all buffers on leftmost path –Delete elements in leftmost leaf and keep in memory –Deletion of next M minimal elements free –Inserted elements checked against minimal elements in memory I/Os every O(M) delete  amortized Buffered Priority Queue B

21 Lars Arge I/O-Algorithms 21 Other External Priority Queues Buffer technique can be used on other priority queue structures –Heap –Tournament tree Priority queue supporting update often used in graph algorithms – on tournament tree –Major open problem to do it in I/Os Worst case efficient priority queue has also been developed –B operations require I/Os

22 Lars Arge I/O-Algorithms 22 Other Buffer-tree Technique Results Attaching  (B) size buffers to normal B-tree can also be use to improve update bound Buffered segment tree –Has been used in batched range searching and rectangle intersection algorithm Has been used on String B-tree to obtain I/O-efficient string sorting algorithms

23 Lars Arge I/O-Algorithms 23 Problem: –Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently As in (one-dimensional) B-tree case we are interested in – space – update – query Interval Management x

24 Lars Arge I/O-Algorithms 24 Interval Management: Static Solution Sweep from left to right maintaining persistent B-tree –Insert interval when left endpoint is reached –Delete interval when right endpoint is reached Query x answered by reporting all intervals in B-tree at “time” x – space – query – construction using buffer technique x

25 Lars Arge I/O-Algorithms 25 Base tree on endpoints – “slab” X v associated with each node v Interval stored in highest node v where it contains midpoint of X v Intervals I v associated with v stored in –Left slab list sorted by left endpoint (search tree) –Right slab list sorted by right endpoint (search tree)  Linear space and O(log N) update (assuming fixed endpoint set) Internal Interval Tree

26 Lars Arge I/O-Algorithms 26 Query with x on left side of midpoint of X root –Search left slab list left-right until finding non-stabbed interval –Recurse in left child  O(log N+T) query bound x Internal Interval Tree

27 Lars Arge I/O-Algorithms 27 Externalizing Interval Tree Natural idea: –Block tree –Use B-tree for slab lists Number of stabbed intervals in large slab list may be small (or zero) –We can be forced to do I/O in each of O(log N) nodes

28 Lars Arge I/O-Algorithms 28 Externalizing Interval Tree Idea: –Decrease fan-out to  height remains – slabs define multislabs –Interval stored in two slab lists (as before) and one multislab list –Intervals in small multislab lists collected in underflow structure –Query answered in v by looking at 2 slab lists and not O(log N) multislab

29 Lars Arge I/O-Algorithms 29 Base tree: Weight-balanced B-tree with branching parameter and leaf parameter B on endpoints –Interval stored in highest node v where it contains slab boundary Each internal node v contains: –Left slab list for each of slabs –Right slab lists for each of slabs – multislab lists –Underflow structure Interval in set I v of intervals associated with v stored in –Left slab list of slab containing left endpoint –Right slab list of slab containing right endpoint –Widest multislab list it spans If < B intervals in multislab list they are instead stored in underflow structure (  contains ≤ B 2 intervals) External Interval Tree v $m$ blocks v

30 Lars Arge I/O-Algorithms 30 External Interval tree Each leaf contains <B/2 intervals (unique endpoint assumption) –Stored in one block Slab lists implemented using B-trees – query –Linear space *We may “wasted” a block for each of the lists in node *But only internal nodes Underflow structure implemented using static structure – query –Linear space  Linear space v

31 Lars Arge I/O-Algorithms 31 External Interval Tree Query with x –Search down tree for x while in node v reporting all intervals in I v stabbed by x In node v –Query two slab lists –Report all intervals in relevant multislab lists –Query underflow structure Analysis: –Visit nodes –Query slab lists –Query multislab lists –Query underflow structure $m$ blocks v

32 Lars Arge I/O-Algorithms 32 External Interval Tree Update – ignoring base tree update/rebalancing: –Search for relevant node –Update two slab lists –Update multislab list or underflow structure Update of underflow structure in O(1) I/Os amortized –Maintain update block with ≤ B updates –Check of update block adds O(1) I/Os to query bound –Rebuild structure when B updates have been collected using I/Os (Global rebuilding)  Update in I/Os amortized v

33 Lars Arge I/O-Algorithms 33 External Interval Tree Note: –Insert may increase number of intervals in underflow structure for some multislab to B –Delete may decrease number of intervals in multislab to B  Need to move B intervals to/from multislab/underflow structure We only move –Intervals from multislab list when decreasing to size B/2 –Intervals to multislab list when increasing to size B  O(1) I/Os amortized used to move intervals

34 Lars Arge I/O-Algorithms 34 Base Tree Update Before inserting new interval we insert new endpoints in base tree using I/Os –Leads to rebalancing using splits  Boundary in v becomes boundary in parent(v)  Intervals need to be moved Move intervals (update secondary structures) in O(w(v)) I/Os  O(1) amortized split bound (weight balanced B-tree)  amortized insert bound v v’’ v’

35 Lars Arge I/O-Algorithms 35 Splitting Interval Tree Node When v splits we may need to move O(w(v)) intervals –Intervals in v containing boundary –Intervals in parent(v) with endpoints in X v containing boundary Intervals move to two new slab and multislab lists in parent(v)

36 Lars Arge I/O-Algorithms 36 Splitting Interval Tree Node Moving intervals in v in O(w(v)) I/Os –Collected in left order (and remove) by scanning left slab lists –Collected in right order (and remove) by scanning right slab lists –Removed multislab lists containing boundary –Remove from underflow structure by rebuilding it –Construct lists and underflow structure for v’ and v’’ similarly

37 Lars Arge I/O-Algorithms 37 Splitting Interval Tree Node Moving intervals in parent(v) in O(w(v)) I/Os –Collect in left order by scanning left slab list –Collect in right order by scanning right slab list –Merge with intervals collected in v  two new slab lists –Construct new multislab lists by splitting relevant multislab list –Insert intervals in small multislab lists in underflow structure

38 Lars Arge I/O-Algorithms 38 External Interval Tree Split in O(1) I/Os amortized –Space: O(N/B) –Query: –Insert: I/Os amortized Deletes in I/Os amortized using global rebuilding: –Delete interval as previously using I/Os –Mark relevant endpoint as deleted –Rebuild structure in after N/2 deletes Note: Deletes can also be handled using fuse operations $m$ blocks v

39 Lars Arge I/O-Algorithms 39 External Interval Tree External interval tree –Space: O(N/B) –Query: –Updates: I/Os amortized Removing amortization: –Moving intervals to/from underflow structure –Delete global rebuilding –Underflow structure update –Base node tree splits v Perform operations/construction lazily Move lazily – complicated: Interference Queries

40 Lars Arge I/O-Algorithms 40 Summary: Interval Management Main problem in designing structure: –Binary  large fan-out Large fan-out resulted in the need for –Multislabs and multislab lists –Underflow structure to avoid O(B)-cost in each node General solution techniques: –Filtering: Charge part of query cost to output –Bootstrapping: *Use O(B 2 ) size structure in each internal node *Constructed using persistence *Dynamic using global rebuilding –Weight-balanced B-tree: Split/fuse in amortized O(1)

41 Lars Arge I/O-Algorithms 41 Diagonal Corner Queries Interval management corresponds to simple form of 2d range search –Diagonal corner queries We obtained the same bounds as for the 1d case –Space: O(N/B) –Query: –Updates: I/Os (x,x) (x 1,x 2 ) x x1x1 x2x2

42 Lars Arge I/O-Algorithms 42 Next Time Next time we will consider two dimensional problems –3-sided queries –Full (4-sided) queries q3q3 q2q2 q1q1 (x,x) q3q3 q2q2 q1q1 q4q4


Download ppt "I/O-Algorithms Lars Arge University of Aarhus February 21, 2005."

Similar presentations


Ads by Google