Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo
Dynamic Orthogonal Range Counting A fundamental geometric query problem Definitions Data sets: a set P of n points in the plane Query: given an axis-aligned query rectangle R, compute the number of points in P∩R Update: insertion or deletion of a point Applications Geometric data processing (GIS, CAD) Databases
Example
Classic Solutions and Our Result SpaceQueryUpdate Chazelle (1988)O(n)O(lg n) JáJá (2004)*O(n)O(lg n / lglg n) Chazelle (1988)O(n)O(lg 2 n) Nekrich (2009)O(n)O((lg n / lglg n) 2 )O(lg 4+ε n) (0<ε<1) Our resultO(n)O((lg n / lglg n) 2 ) Matches the lower bound under the group model Pătraşcu (2007) * For integer coordinates.
Background: Succinct Data Structures What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric A novel and unusual way of using succinct data structures (this paper) Matching the storage cost of standard data structures Improving the time efficiency
Dynamic Range Sum Data A 2D array A[1..r, 1..c] of numbers Operations range_sum(i 1, j 1, i 2, j 2 ): the sum of numbers in A[i 1..i 2, i 2.. j 2 ] modify(i, j, δ): A[i, j] ← A[i, j] + δ insert(j): insert a 0 between A[i, j-1] and A[i, j] for i = 1, 2, …, r. delete(j): delete A[i, j] for for i = 1, 2, …, r. To perform this, A[i, j] must be 0 for all i. Restrictions on r, c and δ and operations supported may apply.
Dynamic Range Sum: An Example range_sum(2, 3, 3, 6) =25insert(6) delete(6)range_sum(2, 3, 3, 7) = 30 modify(2, 6, 5) modify(2, 6, -5)
Dynamic Range Sum in a small 2D Array Assumptions and restrictions Word size w: Ω(lg n) Each number: nonnegative, O(lg n) bits rc = O(lg λ n), 0 < λ < 1 modify(i, j, δ): |δ| ≤ lg n insert and delete: no support Our solution Space: O(lg 1+λ n) bits, with an o(n)-bit universal table Time: modify and range_sum in O(1) time Generalization of the 1D array version (Raman et al. 2001) Deamortization is interesting
Range Sum in a Narrow 2D Array Assumptions and restrictions b = O(w): number of bits required to encode each number “Narrow”: r = O(lg γ c), 0 < λ < 1 |δ| ≤ lg c Our results Space: O(rcb + w) bits, with an O(c lg c)-bit buffer Operations: O(lg c / lg lg c) time A generalization of the solution to CSPSI problem based on B trees (He and Munro 2010), using our small 2D array structure on each B-tree node
Range Counting in Dynamic Integer Sequences Notation Integer range: [1..σ] Sequence: S[1..n] Operations: access(x): S[x] rank( α, x): number of occurrences of α in S[1..x] select( α, r): position of the r th occurrence of α in S range_count(p 1, p 2, v 1, v 2 ): number of entries in S[p 1.. p 2 ] whose values are in the range [v 1.. v 2 ]. insert( α, i): insert α between S[i-1] and S[i] delete(i): delete S[i] from S
Range Counting in Integer Sequences: An Example S = 5,5,2,5,3,1,3,4,7,6,4,1,2,2,5,8 rank(5, 8) =3 select(2, 3) =14 range_count(6, 12, 2, 6) = 4
Range Counting in Sequences of Small Integers Restrictions σ = O(lg ρ n) for any constant 0 < ρ < 1 Our result Space: nH 0 + o(n lg σ) + O(w) bits Time: O(lg n / lglg n) This is achieved by combining: Our solution to range sum on narrow 2D arrays A succinct dynamic string representation (He and Munro 2010 )
Dynamic Range Counting: An Augmented Red Black Tree T x : A red black tree storing all the x-coordinates Each node also stores the number of its descendants Purpose: conversions between real x- coordinates and rank space in O(lg n) time
Dynamic Range Counting: A Range Tree T y : A weight balanced B-tree (Arge and Vitter 2003) constructed over all the y-coordinates Branching factor d = Θ(lg ε n) for constant 0 < ε < 1 Leaf parameter: 1 The levels are numbered 0, 1, … from top to bottom Essentially a range tree Each node represents a range of y-coordinates Choice of weight balanced B-tree: amortizing a rebuilding cost
Dynamic Range Counting: A Wavelet Tree Ideas from generalized wavelet trees (Ferragina et al. 2006) For each node v of T y, construct a sequence S v : Each entry of S v corresponds to a point whose y-coordinate is in the range represented by node v S v [i] corresponds to the point with the i th smallest x-coordinate among all these points S v [i] indicates which child of v contains the y-coordinate of the above point For each level m, construct a sequence L m [1..n] of integers from [1..4d] by concatenating the all the S v ’s constructed at level m L m : stored as dynamic sequences of small integers Space: O(n lg d + w) bits per level, O(n) words overall
Range Counting Queries Query range: [x 1..x 2 ] × [y 1..y 2 ] Use T x to convert the query x-range to a range in rank space Perform a top-down traversal to locate the (up to two) leaves in T y whose ranges contain y 1 and y 2 Perform range_count on S v for each node v visited in the above traversal Sum up the query results to get the answer Time: O(lg n / lglg n) per level, O(lg n / lglg n) levels
Insertions and Deletions More complicated: splits and merges; changes to child ranks The choice of storing T y as weight balanced B- tree allows us to amortize the updating cost of subsequences of L m ’s Additional techniques supporting batch updating of integer sequences are also developed
Our Results Dynamic Orthogonal Range Counting Space: O(n) words Time: O((lg n / lglg n) 2 ) Points on a U×U grid Space: O(n) words Time (worst-case): O(lg n lg U / (lg lg n) 2 ) Succinct representations of dynamic integer sequences Space: nH 0 + o(n lg σ) + O(w) bits Time (including range_count): O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n
Conclusions Results The best result for dynamic orthogonal range counting Same problem for points on a grid The first succinct representations of dynamic integer sequences supporting range counting Two preliminary results on dynamic range sum Techniques The first that combines wavelet trees with range trees Deamortization on 2D arrays Future work Lower bound Use techniques from succinct data structures to improve standard data structures
Thank you!