Download presentation
Presentation is loading. Please wait.
Published byBathsheba Elliott Modified over 9 years ago
1
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer Science Department University of Southern California Los Angeles, CA 90089 COMPSAC 2004
2
Roger ZimmermannCOMPSAC 2004, September 30 Outline Motivation Introduction to DHTs (CAN) Technical Approach Results Conclusions and Future Research
3
Roger ZimmermannCOMPSAC 2004, September 30 Motivation Spatial data sets are used for many applications, e.g., GIS, CAD, … P2P systems provide a distributed platform that is very scalable. Pros: –Scalability, no central point of failure Cons: –Very dynamic (unreliable), topology maintenance required
4
Roger ZimmermannCOMPSAC 2004, September 30 Motivaton (cont.) Question: how to use P2P systems for spatial data sharing. Query Challenges: –Unstructured P2P systems: querying by flooding is not efficient –Structured P2P systems based on DHTs (Chord, CAN): only efficient exact match queries are supported E.g., search files based on their names/titles put(key, value); get(key) return value
5
Roger ZimmermannCOMPSAC 2004, September 30 Motivation (cont.) Spatial queries are usually range queries –Intersect, overlap –Nearest neighbor(s) (kNN) DHTs are not suitable without modification
6
Roger ZimmermannCOMPSAC 2004, September 30 Distributed Hash Tables (DHT) DHT systems: Content Addressable Network (CAN), Chord, Pastry, etc. Using DHT to allocate large data sets to many nodes with no central control Data objects are near uniformly distributed through a hash function, resulting in superb scalability and load balance Each node only maintains a small routing table to know its neighbors Locating a particular data object requires O(logN) search steps on average
7
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (CAN) A scalable indexing mechanism in a P2P network Creates a logical d-dimensional Cartesian coordinate space Divides the space into zones, where each zone is controlled by a node in the system Zones are dynamically partitioned or merged as nodes join and leave Each Zone is addressed with a Virtual Identifier (VID), which is deterministically calculated from the location of the zone
8
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (CAN) Example: A 2-D space partitioned into 7 CAN zones
9
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (cont) Node Operations Node Operations (e.g., Insertion) (e.g., Insertion) 1.Find a bootstrap node first
10
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (cont) 2. Randomly choose a point in the CAN plane and route the new node from the bootstrap node to the chosen location
11
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (cont) 3. The new node arrives at the destination zone covering that point. The destination zone is split into two zones, each controlled by one node (old and new)
12
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (cont) 4. Update the neighborhood zone routing information
13
Roger ZimmermannCOMPSAC 2004, September 30 Content Addressable Network (CAN) Data Object Operation (e.g. Insertion) 1.Generate a key based on the object identification and insert data object as a pair 2.Map the key into a point P in the CAN plane by using a uniform hash function 3.Store the pair at the node which owns the zone within which the point P is located 4.To retrieve the value, the same hash function is applied to the key in order to regenerate the point P and find the zone owns that point, the zone will return the value to the client
14
Roger ZimmermannCOMPSAC 2004, September 30 Storing Spatial Data w/ DHTs Hash function distributes data objects evenly within the space to achieve a balanced load Spatial locality information needs to be preserved for range queries. Applying a hash function to spatial data will destroy locality Related work explored storing R-tree or Quad- tree based index on DHT –Harwood et al. Hashing Spatial Content over Peer-to- Peer Networks –Mondal et al. P2PR-tree: An R-tree-based Spatial Index for Peer-to-Peer Environments
15
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems Mapping a physical space to a CAN space –Propose a new hash function to map spatial data objects onto nodes over a modified CAN system –Purpose: allow efficient spatial data query execution while at the same time considering load balance –Calculating the location of zones in the logical space – Virtual Identifier (VID) tree for mapping purpose
16
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems Approach: –Object key is generated with three different components (a) Scatter region address: based on the spatial locality of the object; preserves spatial locality. (b) Zone address: randomized; achieves load balance (c) Object identifier (hashed) –The scatter region size is fixed and predetermined
17
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) –The value of zone bit string is decided randomly and the object identifier is the data content hash result –The VID tree is created with its height determined by the scatter region size –The maximum number of zones is 2 (a+b) –The relationship between data locality and load balance can be determined along a spectrum
18
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Scatter region (11000) e.g.: a=5 bits 00 10 01 000001010011100101110111 11
19
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Zones e.g.: b=4 bits 00 10 01 000001010011100101110111 11
20
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) System Operation and Spatial Range Query –Node Operation Bootstrap mechanism Node join mechanism Zone split and the search threshold –Balance the number of data objects in each zone –The zone being selected must be larger than the minimum zone size (1/2 (a+b) ) –The threshold is the upper bound on the number of search hops to find a zone to split –Data Object Insertion –Data Object Deletion –Spatial Range Query
21
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Spatial Range Query Step 1: The querying node launches a spatial range query.
22
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Spatial Range Query Step 2: The node determines the overlapping scatter regions.
23
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Spatial Range Query Step 3: The node multicasts the query to the overlapping scatter regions.
24
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Range Query Design for P2P Systems (cont.) Step 4: –The range query is multicast within all overlapping scatter regions (M-CAN). –Recall: data is randomized within each scatter region, so an exhaustive search is necessary –Choice of scatter region size Large: good load balance; uniform within a scatter region Small: exhaustive search covers less area
25
Roger ZimmermannCOMPSAC 2004, September 30 Conclusions and Future Research Directions –We proposed a hash function to preserve both spatial locality information and constrained load balance –The proposed mechanism works will with CAN P2P architecture –We are currently running simulations to test our approach
26
Roger ZimmermannCOMPSAC 2004, September 30 Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.