Download presentation
Presentation is loading. Please wait.
Published byAshley Harper Modified over 9 years ago
1
2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor Lee (City University of Hong Kong, Hong Kong) Distance Indexing on Road Networks
2
2 Modeling Road Networks Network -> Undirected weighted graph Road junction -> Vertex (node) Road segment -> Edge Distance -> Edge weight Data object and query point -> On node only objectsquery point
3
3 Query Processing on Road Networks Queries: Window query kNN, continuous kNN Processing methods: Network Expansion [Papadias VLDB03] Use Euclidean distance for preliminary pruning Indexing the objects by spatial index Precomputed Index [Kolahdouzan VLDB04] Voronoi Network Nearest Neighbor (VN 3 ) NN list: precompute and store the kNNs for some large-degree nodes 5
4
4 Problems and Disadvantages Distance computation is still tough By Dijkstra's single-source shortest path algorithm: Maintain nodes whose distances are not finalized Pick the node with the shortest distance and finalize it Relax all not-yet-finalized distances Repeat until all distances are finalized Limitations: Must visit nodes in the ascending order of distances Running time O(NlgV) Precomputed indexes cannot suit all queries Return k nearest neighbor Return the actual shortest path Precomputed indexes are costly to store and update
5
5 Our Solution at a Glance Distance signature --- the first general-purposed index on road networks that Categorizes the distances of a node to all objects Supports both rough and exact distance computation Accelerates processing of common query types Reduces the storage and maintenance cost Is orthogonal to other query optimization techniques
6
6 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion
7
7 Distance Signature Basic Idea: Precomputing distances is a good trade-off between having no indexing and solution space indexing Maintain the approximate distance between objects and nodes How rough is the approximation? Apply rough approximation to faraway objects Queries are always interested in local objects Faraway objects are more than local objects We use an exponential sequence of categories In the form of [0, T), [T, cT), [cT, c 2 T), [c 2 T, c 3 T),... T and c are constant parameters E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24),... 362412 Cat 0Cat 1Cat 2Cat 3
8
8 Distance Signature (Cont'd) For each node n, signature component S(n)[i] denotes the category of dist(n,i) S(n)[i].link denotes the next node from n in the shortest path to i Signature S(n) is the whole set of components S(n)[i]
9
9 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion
10
10 Distance Operations on Signatures Principle: trace back the link until the distance range is accurate enough ExactApproximate Retrieval (distance between node and object) Trace back through the link from node to object Terminate once the distance range does not partially overlap with input Comparison (distances from node n to objects a and b) Trace back until the two distance ranges don’t overlap SortingFirst apply approximate sorting, then apply bubble sort using exact comparison Quick sort using approximate comparison 11 4 n2 n3 n6 11 p1 p2 p1p2: possible positions of n4
11
11 Approximate Distance Comparison What and Why? Compare the distances of two objects based on one signature Avoid accessing the signatures of other nodes Used to get a rough result of distance sorting How? Example: compare dist(n 4,n 2 ) with dist(n 4,n 6 ) Select an observer n 3 Embed objects n 2,n 3,n 6 into Euclidean space n 3 tells if n 2 or n 6 is closer to n 4 If n 4 is on the perpendicular bisector, is it possible for n 3 to find n 4 within distance range s(n 4 )[n 3 ]? Let multiple observers vote
12
12 kNN Search on Signatures Procedures Read signature s(q) of query node q Categories tell the approximate distances between q and other objects Get k closest objects according to their category values If no need to know the distances or order, return objects based on category ranges To find the ordering: Sort objects within each category To find exact distances: Perform exact distance retrieval for each knn
13
13 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion
14
14 Smart Choice of Distance Categories Exponential categories [0, T), [T, cT), [cT, c 2 T],... How to determine c and T? Factors: Dataset density, distribution Query type, load (metric: spreading) Storage availability Simplifications The road network is a uniform grid Spreading is uniformly distributed in [0, SP] Unlimited disk storage Theorem The optimal c = e, T = (SP/e) 0.5
15
15 Signature Construction Basic procedures Allocate storage for signatures Build shortest path spanning tree for each object (Dijkstra) Fill in s(n)[i] when the tree of object i is spanned to node n Variable length encoding Observation the number of objects in each category is not even # of objects 1 unit, 2 units, 3 units,... away: 4, 8, 12,... Use fewer bits for larger categories
16
16 Variable Length Encoding Reverse zero coding Based on Huffman encoding scheme Under assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal [0, T) [T, cT) [cT, c 2 T) [c 2 T, c 3 T) [c 3 T, ∞) Average code length is approximately : 1 01 001 0001 0000 Reverse coding 000 001 010 011 100 Fixed coding
17
17 Signature Compression Idea: Many objects share the same link u v n If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by 1-bit flag not compressedin memory
18
18 Signature Update Requirement The shortest path spanning trees of all objects A reverse index for each edge of trees that comprise this edge limit the number of trees affected by the change of this edge How (suppose edge (a,b) is updated) : Find those affected spanning trees For each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller) Propagate to adjacent nodes until no more updates
19
19 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion
20
20 Experiment Settings Statistics 183K nodes 351K edges Random edge weights from 1 to 10 Page size: 4K bytes kNN Competitors Signature indexing Full indexing (NN list for all nodes) Network Voronoi Diagram (NVD) from VN 3 Tuning parameters p: object density T, c, k Comparison metrics: page access (I/O cost), CPU time
21
21 Index Construction Cost Good for medium and sparse datasets
22
22 KNN Search Performance Moderate performance over various k
23
23 Robustness The choice of parameters does not make large difference
24
24 Conclusion Our Contributions The first index for distance computation on road networks Speed up general query processing Optimal choice of distance categories and category encoding Future work Cross-node signature compression The signatures of nearby nodes are similar Derivation of optimal distance categories for a wider range of network topologies and object distributions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.