I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton University Norbert Zeh, Dalhousie University
Background: Succinct Data Structures What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric
Background: External Memory Model Parameters N: number of elements in the problem instance M: size of the internal memory B: size of a disk block Cost: number of I/O’s (block transfers) between internal memory and external memory Aggarwal and Vitter 1988 CPU Internal Memory Block External Memory
Our Contributions Our goal is to design data structures that are both succinct and efficient in the External Memory setting Our results A succinct representation of bounded-degree planar graphs that supports I/O-efficient path traversal A succinct representation of triangulated terrains that supports various geometric queries
Notation N: number of vertices of the given graph G d: maximum degree of vertices q: number of bits required to encode the key of each vertex K: the length of the path
Two-Level Partition A tool: graph separator (Frederickson 1987) Size of each subgraph (region): r Number of regions: Θ(N/r) Number of boundary vertices: O(N/(r 1/2 )) Two-level partition Subdivide G into regions of fixed maximum size Subdivide each region into sub-regions of smaller fixed maximum size Types of vertices for each region / subregion Interior vertices Boundary vertices
α-Neighbourhood Definition Beginning with a given vertex v, we perform a breadth-first search in G and select the first α vertices encountered The α-neighbourhood of v is the subgraph of G induced by these vertices Internal and terminal vertices Property: The distance between v and any terminal vertex in its α- neighbourhood is at least log d α In our representation, we store α-neighbourhood of each boundary vertex. If a sub-region boundary vertex is interior to a region, we add an additional constraint that its α-neighbourhood cannot be extended beyond the region
Overview of Labeling Scheme Labels at three levels for the same vertex Graph-label (unique) Region-label (one or more) Subregion-label (one or more) Assign the labels for bottom up
Sub-Region Labels Encoding subregion R i,j using any succinct representation for planar graphs This induces a permutation of the vertices in R i,j Subregion-label: the k th vertex in the above permutation has subregion-label k in R i,j
Region-Labels and Graph-Labels 1, 2, 3, 4, 5, 61, 2, 3, 4, 51, 2, 3, 4, 5, 6, 7 R 1,1 R 1,2 R 1,3 R1R1 1, 2, 3, 4, 5, 6 7, 8, 9, 10, 11, 12,13,14,15 … The assignment of graph-labels are similar Succinct structures of o(n) bits are constructed to support conversion between labels at different levels in O(1) I/O’s
Data Structures Denote by A the maximum number of vertices that may be stored in a block, and this is our maximum sub-region size Choose Alg 3 N to be the maximum size of each region We only encode sub-regions and α-neighbourhoods of boundary vertices as components Encode the graph structure of each component in a succinct fashion Information is encoded so that we can retrieve the graph labels of the internal vertices in an α-neighbourhood without requiring additional I/O’s
Space Analysis We assume B = Ω(lg N) A = (B lg N) / (c + q) c: number of bits per vertex required to the sub-graph structure and boundary bit vector Choose α = A 1/3 Intuitively, our structures are space-efficient because: Region boundary vertices are few enough, so that information such as the graph labels of the vertices in their α- neighbourhoods do not occupy too much space The number of sub-region boundary vertices is larger, but information such as region-labels uses fewer bits (lg (Alg 3 N)) Total space: O(N) + Nq + o(Nq) bits
Traversal Algorithm Load either a sub-region or the α-neighbourhood of a boundary vertex Traverse the above component until a boundary/terminal vertex is encountered Load the next component from external memory and traversal continues
I/O Efficiency Observations When encountering a terminal/boundary vertex, the next component can be loaded in O(1) I/O’s Given a component, the graph labels of all interior/internal vertices can be reported without incurring any additional I/O’s By loading a constant number of components, we can visit Ω(lg B) vertices along the path I/O complexity: O(K / lg B)
Main Result A succinct representation of bounded- degree planar graph: Space: O(N) + Nq + o(Nq) bits I/O complexity for path traversal: O(K / lg B)
Terrains Modeled as Triangular- Irregular Network Notation N: number of points Φ: number of bits required to store the coordinates of each point Space: NΦ + O(N) + o(NΦ) bits I/O complexity: Reporting a path crossing K faces: O(K / lg B)
Queries on Triangulated Terrains Point location: O(log B N) I/O’s Terrain profile: O(K / lg B) I/O’s Trickle path: O(K / lg B) I/O’s Connected component O(K / lg B) I/O’s if the component is convex Can be generalized to components that are not convex, though the result is more complex
Conclusions We designed a succinct representation of bounded-degree planar graphs that supports I/O-efficient path traversal, and applied this to terrains modeled as TIN to support queries This provides solutions to modern applications that process very large data Future work: combining succinct data structures and external memory data structures for other problems
Thank you!