I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang
2 Problem Definition: Range Max Queries Range-aggregate queries: range-count, range-sum, range-max N points in R d Each point p is associated with a weight w(p) Query rectangle Q Compute max{w(p) | p Q} Static and dynamic
3 Problem Definition: Stabbing Max Queries N hyper-rectangles in R d Each rectangle γ is associated with a weight w(γ) Query point q Compute max{w(γ) | q γ}
4 Model I/O Model –N: Elements in structure –B: Elements per block –M: Elements in main memory –n = N/B Assumptions – M>B 2 – Each word holds log 2 N bits –Any coordinate or weight can be stored in one word D P M Block I/O
5 Related Work & Our Results: Range Queries 1D range queries are easy: B-tree *O(n) space, O(log B n) query & update 2D range queries: –Poly-logarithmic query: CRB-tree [AAG03] *O(nlog B n) space, O(log 2 B n) query –Linear space: kdB-tree, cross-tree, O-tree * query, O(log B n) update Our results:
6 Related Work & Our Results: Stabbing Queries 1D stabbing queries –SB-tree [YW01] *O(n) space, O(log B n) query & insert *Does not allow deletions! 2D stabbing queries –No structures with worst-case guarantee Our results:
7 2D Range Max Queries The external version of Chazelle’s structure [C88] –Linear space, –Static: O(log 1+ε N) query –Dynamic: O(log 3 N log log N) query & update Overall structure –A normal B-tree Φ on y-coordinates of all the points –A Fan-out base B-tree T on x-coordinates *P v : all points stored in the subtree of v *Each internal node v stores two secondary structures C v, M v storing information about P v in a compressed manner *C v and M v of size O(|P v | / log B n) → linear size in total *Weights of points stored at leaves explicitly
8 2D Range Max Queries C v borrowed from CRB-tree –Compute the ranks of the points one level down in O(1) I/Os –Identify the weight of a point explicitly in O(log B n) I/Os M v computes the maximum weight in a multislab in O(log B n) I/Os Answering a query: –Use Φ to compute the ranks in the root of T –Use M v to compute maximum at each level –For a total of O(log 2 B n) I/Os v v1v1 v2v2 v3v3 v4v4 v5v5 v6v6
9 2D Range Max Queries: M v Divide P v into chunks of Blog B N Divide each chunk into minichunks of size B Three-level structures –M v =(Ψ 1, Ψ 2, Ψ 3 ) –each of size O(|P v | / log B n) v
10 2D Range Max Queries: M v Basic idea: encode the range max information in a compressed manner, identify the maximum point using C v once its rank is found Ψ 3 [l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk –Find the rank of the maximum-weight point in O(1) I/Os; –Identify it in O(log B N) I/Os. Ψ 2 [k]: for each chunk, encode a Cartesian tree on the O(log B N) minichunks for each of the O(B) multislabs –Find the minichunk containing the maximum-weight point in O(1) I/Os; –Use Ψ 3 to find the exact point in O(log B N) I/Os; Ψ 1 : A fanout B-tree on the O(|P v | / (Blog B n)) chunks –Find the maximum-weight point in O(log B N) I/Os.
11 2D Range Max Queries Static structures –O(n) size, O(log 2 B N) query, O(nlog B N) construction –O(n) size, O(log B 1+ε N) query, O(Nlog B N) construction Dynamization: –Throw away Ψ 2 and expandΨ 3 –O(nlog B log B N) size –O(log 3 B N) query, worst case –O(log 2 B N log M/B log B N) insert, amortized –O(log 2 B N) delete, amortized Extending to d-dimension –Standard technique –Pay an extra O(log d-2 B N) factor to all these bounds
12 1D Stabbing Max Queries Modify the external interval tree [AV96] to support max Fan-out base B-tree on x-coordinates –Interval stored in highest node v where it contains slab boundary –In one left (right) slab structure and the multislab structure Answering a query –Search down tree and visit O(log B N) nodes –Compute the maximum weight in left (right) slab structure and the multislab structure v
13 1D Stabbing Max Queries Slab structures are implemented using B-trees –Query and update: O(log B N) I/Os Multislab structure: Fan-out B-tree –At each internal node, we store the maximum weight for each of the slabs and for each of the children –Query: O(1) I/Os (only look at the root) –Update: O(log B N) I/Os Rebalancing the base tree: O(log B N) I/Os –Weight-balanced B-trees Overall cost: size O(n), query O(log 2 B N), update O(log B N).
14 1D Stabbing Max Queries Space-time tradeoff: –O(nlog B ε N) size –O(nlog B 2-ε N) query Can handle the general semigroup queries –A semigroup (S, +) –Each weight w(γ) S –Want to compute ∑ q γ w(γ) Ideas can also be used to improve the internal memory algorithm –Linear size, O(log 2 N / log log N) query and update
15 2D Stabbing Max Queries Extend our 1D stabbing query structure Use our 2D range query structure as a building block Extending to d-dimension –Standard technique –Pay an extra O(log d-2 B N) factor to all these bounds
16 Conclusions and Open Problems In this project, we developed I/O-efficient –linear space structures with poly-logarithmic query cost for the static 2D range max queries –near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries –linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries –near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries Open problems –Linear size dynamic structures for the 2D range & stabbing max queries? –General semigroup queries?
THE END Thank you!