Presentation is loading. Please wait.

Presentation is loading. Please wait.

2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.

Similar presentations


Presentation on theme: "2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor."— Presentation transcript:

1 2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor Lee (City University of Hong Kong, Hong Kong) Distance Indexing on Road Networks

2 2 Modeling Road Networks Network -> Undirected weighted graph Road junction -> Vertex (node) Road segment -> Edge Distance -> Edge weight Data object and query point -> On node only objectsquery point

3 3 Query Processing on Road Networks Queries: Window query kNN, continuous kNN Processing methods: Network Expansion [Papadias VLDB03] Use Euclidean distance for preliminary pruning Indexing the objects by spatial index Precomputed Index [Kolahdouzan VLDB04] Voronoi Network Nearest Neighbor (VN 3 ) NN list: precompute and store the kNNs for some large-degree nodes 5

4 4 Problems and Disadvantages Distance computation is still tough By Dijkstra's single-source shortest path algorithm: Maintain nodes whose distances are not finalized Pick the node with the shortest distance and finalize it Relax all not-yet-finalized distances Repeat until all distances are finalized Limitations: Must visit nodes in the ascending order of distances Running time O(NlgV) Precomputed indexes cannot suit all queries Return k nearest neighbor Return the actual shortest path Precomputed indexes are costly to store and update

5 5 Our Solution at a Glance Distance signature --- the first general-purposed index on road networks that Categorizes the distances of a node to all objects Supports both rough and exact distance computation Accelerates processing of common query types Reduces the storage and maintenance cost Is orthogonal to other query optimization techniques

6 6 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion

7 7 Distance Signature Basic Idea: Precomputing distances is a good trade-off between having no indexing and solution space indexing Maintain the approximate distance between objects and nodes How rough is the approximation? Apply rough approximation to faraway objects Queries are always interested in local objects Faraway objects are more than local objects We use an exponential sequence of categories In the form of [0, T), [T, cT), [cT, c 2 T), [c 2 T, c 3 T),... T and c are constant parameters E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24),... 362412 Cat 0Cat 1Cat 2Cat 3

8 8 Distance Signature (Cont'd) For each node n, signature component S(n)[i] denotes the category of dist(n,i) S(n)[i].link denotes the next node from n in the shortest path to i Signature S(n) is the whole set of components S(n)[i]

9 9 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion

10 10 Distance Operations on Signatures Principle: trace back the link until the distance range is accurate enough ExactApproximate Retrieval (distance between node and object) Trace back through the link from node to object Terminate once the distance range does not partially overlap with input Comparison (distances from node n to objects a and b) Trace back until the two distance ranges don’t overlap SortingFirst apply approximate sorting, then apply bubble sort using exact comparison Quick sort using approximate comparison 11 4 n2 n3 n6 11 p1 p2 p1p2: possible positions of n4

11 11 Approximate Distance Comparison What and Why? Compare the distances of two objects based on one signature Avoid accessing the signatures of other nodes Used to get a rough result of distance sorting How? Example: compare dist(n 4,n 2 ) with dist(n 4,n 6 ) Select an observer n 3 Embed objects n 2,n 3,n 6 into Euclidean space n 3 tells if n 2 or n 6 is closer to n 4 If n 4 is on the perpendicular bisector, is it possible for n 3 to find n 4 within distance range s(n 4 )[n 3 ]? Let multiple observers vote

12 12 kNN Search on Signatures Procedures Read signature s(q) of query node q Categories tell the approximate distances between q and other objects Get k closest objects according to their category values If no need to know the distances or order, return objects based on category ranges To find the ordering: Sort objects within each category To find exact distances: Perform exact distance retrieval for each knn

13 13 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion

14 14 Smart Choice of Distance Categories Exponential categories [0, T), [T, cT), [cT, c 2 T],... How to determine c and T? Factors: Dataset density, distribution Query type, load (metric: spreading) Storage availability Simplifications The road network is a uniform grid Spreading is uniformly distributed in [0, SP] Unlimited disk storage Theorem The optimal c = e, T = (SP/e) 0.5

15 15 Signature Construction Basic procedures Allocate storage for signatures Build shortest path spanning tree for each object (Dijkstra) Fill in s(n)[i] when the tree of object i is spanned to node n Variable length encoding Observation the number of objects in each category is not even # of objects 1 unit, 2 units, 3 units,... away: 4, 8, 12,... Use fewer bits for larger categories

16 16 Variable Length Encoding Reverse zero coding Based on Huffman encoding scheme Under assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal [0, T) [T, cT) [cT, c 2 T) [c 2 T, c 3 T) [c 3 T, ∞) Average code length is approximately : 1 01 001 0001 0000 Reverse coding 000 001 010 011 100 Fixed coding

17 17 Signature Compression Idea: Many objects share the same link u v n If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by 1-bit flag not compressedin memory

18 18 Signature Update Requirement The shortest path spanning trees of all objects A reverse index for each edge of trees that comprise this edge limit the number of trees affected by the change of this edge How (suppose edge (a,b) is updated) : Find those affected spanning trees For each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller) Propagate to adjacent nodes until no more updates

19 19 Roadmap Background Distance Signature Overview Operations on Signatures Query Processing on Signatures Smart Choice of Distance Categories Construction and Maintenance Experimental Results Conclusion

20 20 Experiment Settings Statistics 183K nodes 351K edges Random edge weights from 1 to 10 Page size: 4K bytes kNN Competitors Signature indexing Full indexing (NN list for all nodes) Network Voronoi Diagram (NVD) from VN 3 Tuning parameters p: object density T, c, k Comparison metrics: page access (I/O cost), CPU time

21 21 Index Construction Cost Good for medium and sparse datasets

22 22 KNN Search Performance Moderate performance over various k

23 23 Robustness The choice of parameters does not make large difference

24 24 Conclusion Our Contributions The first index for distance computation on road networks Speed up general query processing Optimal choice of distance categories and category encoding Future work Cross-node signature compression The signatures of nearby nodes are similar Derivation of optimal distance categories for a wider range of network topologies and object distributions


Download ppt "2006-09-15 VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor."

Similar presentations


Ads by Google