P2PR-tree: An R-tree-based Spatial Index for P2P Environments ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo.
PRESENTATION OUTLINE Motivating Spatial Applications on Motivating Spatial Applications on P2P systems P2P systems Existing Spatial Indexes Existing Spatial Indexes Our proposal: The P2PR-tree Our proposal: The P2PR-tree Performance Analysis Performance Analysis Conclusion and Future Work Conclusion and Future Work
Spatial Applications on P2P systems Spatial data occurs in several important and diverse applications Geographic Information Systems (GIS) Computer-aided design (CAD) Resource management Development planning, emergency planning and scientific research. Unprecedented growth of available spatial data at geographically distributed locations. Trend of increased globalization. Popularity of P2P data sharing Efficient global sharing of distributively owned spatial data in P2P systems
Application example Searching for Real Estate information in Tokyo Query MBR QueryResults
Existing Spatial Indexes Centralized spatial indexes R-tree, R*-tree, R+-tree Distributed spatial indexes M-Rtree MC-Rtree
MC-Rtree R-tree which indexes the covering MBRs of the data stored at the clients Each client has its own R-tree for managing its own data Master client Centralization Centralization Designed for clusters. Designed for clusters. Optimize disk I/Os. Optimize disk I/Os.
Why can’t we use existing R-tree-based approaches? They use centralized mechanisms They use centralized mechanisms → not scalable. → not scalable. All updates must pass through Master Node All updates must pass through Master Node All searches need to be routed by the Master Node All searches need to be routed by the Master Node → Performance bottleneck at the Master Node They do not optimize communication time. They do not optimize communication time.
GRID-Related Projects GRID Physics Network and European DataGrid Improving scientific research which require efficient distributed handling of data in the petabyte range, Earth Systems GRID (ESG) aims at facilitating detailed analysis of huge amounts of climate data by a geographically distributed community via high bandwidth networks. NASA Information Power GRID (IPG) improve existing systems in NASA for solving complex scientific problems efficiently
How our proposal differs from GRID-related spatial works? GRID Restrict data sharing only among scientific and research organizations Individual nodes are usually dedicated and expected to be available most of the time. Some amount of centralized control is possible by collaborations between organizations. Our proposal Allow normal users to share/upload data. Individual nodes may join/leave anytime. Distributively owned peers, hence centralized control practically challenging.
Existing Search mechanisms in P2P systems Broadcast (Gnutella) Broadcast (Gnutella) Centralized (Napster) Centralized (Napster) Routing indices (RIs) Routing indices (RIs) Distributed hash tables (Chord,CAN,Tapestry) Distributed hash tables (Chord,CAN,Tapestry) Existing works on P2P systems mostly address file-sharing.
P2PR-tree (Peer-to-Peer R-tree) A distributed R-tree-based indexing scheme designed for P2P systems Parts of the distributed indexes are built autonomously by each peer. Hierarchical and performs efficient pruning. Completely decentralized Highly Scalable
Block 1Block 2 Block 3 Block 4 Dividing the Universe P5 P6 P1 P2 P4 P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P Level 2 B1B2B3B4G1G2G3G4 P5P6P3 P1P2P20P3P4 SG1SG2 Level 0 Level 1 Level 3 ….. P20P3
Definitions Unit: A Block, Group, Subgroup at any level, or a peer Unit: A Block, Group, Subgroup at any level, or a peer UnitMBR: Minimum Bounding Rectangle of a Unit UnitMBR: Minimum Bounding Rectangle of a Unit Router: In order to route messages to a Unit X, a peer A needs to know at least one peer (say peer B) which belongs to Unit X. We define peer B as Peer A’s Router to Unit X. Router: In order to route messages to a Unit X, a peer A needs to know at least one peer (say peer B) which belongs to Unit X. We define peer B as Peer A’s Router to Unit X. UnitRouterInfo: The addresses of routers to a Unit UnitRouterInfo: The addresses of routers to a Unit UnitInfo: UnitMBR and UnitRouterInfo of a Unit UnitInfo: UnitMBR and UnitRouterInfo of a Unit ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in the P2PR-tree ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in the P2PR-tree
Data Structure at a peer A Peer of Level L can be specified as maintains the following information where
Example of Data Structure Level 2 Units B1B2B3B4G1G2G3G4P5P6P3 P1P2P20P3P4 SG1SG2 Level 0 Units Level 1 Units Level 3 Units... P2 can be specified as Peer( ) G1G2G3G4 P11P12P21P33P66 SG1SG2
B1B2B3B4G1G2G3G4 P5P6P3 P1P2P4P3 Level 0 Level 1 Level 2 ….. Maintaining information Peer Level = 2, (B1,B2,B3,B4) (G1,G2,G3,G4) (P6,P3) P5 P6 P1 P2 G1G2 G3 G4 P9 P10 P8 P2PR-tree P3 P4 Block 1 BlockMBR information stored at every peer
B1B2B3B4G1G2G3G4 P5P6P3 P1P2P4P3 Level 0 Level 1 Level 2 ….. P5 P6 P1 P2 G1G2 G3 G4 P9 P10 P8 P4 P3 P2PR-tree Block 1 Maintaining information Peer Level = 2, (B1,B2,B3,B4) (G1,G2,G3,G4) (P6,P3) BlockMBR information stored at every peer
B1B2B3B4G1G2G3G4 P5P6P3 P1P2P3P4 BlockMBR information stored at every peer Level 0 Level 1 Level 2 ….. Maintaining information Peer Level = 2, (B1,B2,B3,B4) (G1,G2,G3,G4) (P2,P3,P4) P30 P5 P6 P1 P2 P20 SG1 SG2 G1G2 G3 G4 P9 P10 P8 P4 P30 P3 Peer Join operation in P2PR-tree Block 1
P5 P6 P1 P2 P20 SG1 SG2 G1G2 G3 G4 P9 P10 P8 P4 P30 P3 Level 2 B1B2B3B4G1G2G3G4 P5P6P3P30 P1P2P20P3P4 SG1SG2 BlockMBR information stored at every peer Level 0 Level 1 Level 3 ….. Maintaining information Peer Level = 3, (B1,B2,B3,B4) (G1,G2,G3,G4), (SG1,SG2), (P2,P20) Peer Join operation in P2PR-tree Block 1
Routing Issues Assumption: A peer initially knows at least N routers for a Unit. Assumption: A peer initially knows at least N routers for a Unit. Piggybacking to refresh routers for each peer. Piggybacking to refresh routers for each peer. During piggybacking, a peer sends the addresses and reliability information of other peers in its own Unit. During piggybacking, a peer sends the addresses and reliability information of other peers in its own Unit. Each peer maintains most reliable R routers for Units based on reliability. Each peer maintains most reliable R routers for Units based on reliability. What if all routers that a peer knows in a specific Unit are unavailable? What if all routers that a peer knows in a specific Unit are unavailable? Peer contacts the peers in other blocks to find out new routers for that block. Peer contacts the peers in other blocks to find out new routers for that block.
Example of refreshing routers P5 P6 P1 P2 G1G2 G3 G4 P9 P10 P8 P4 P3 Block 1 P11 P9,P15→G4 P10,P12→G4 P12 P15 P9,P15→G4 P10,P12→G4 P9,P15→G4 N=2, R=4
Example of refreshing routers P5 P6 P1 P2 G1G2 G3 G4 P9 P10 P8 P4 P3 Block 1 P11 P9,P15→G4 P10,P12→G4 P12 P15 P9,P15→G4 P10→G4 P10,P12→G4 P9→G4 N=2, R=3
Level 2 B1B2B3B4 G1G2G3G4 P5P6P3P30 P1P2P20P3P4 SG1SG2 BlockMBR information stored at every peer Level 0 Level 1 Level 3 … Maintaining Information Peer Level = 2 (P5→B1, P25→B2, P35→B3, B4) (P41→G1, G2, P43→G3, P49→G4) (P45, P46) Searching the P2PR-tree P5 P6 P1 P2 P20 SG1 SG2 G1 G2 G3 G4 P9 P10 P8 P4 P30 P3 Block 1 P45 P46 P41 P42 G1 G2 G3 G4 P49 P40 P48 P44 P60 P43 Block 4 G1G2G3G4 P45P46P60 Query Level = 0 Query comes to P60 Maintaining Information Peer Level = 2 (P5→B1, P25→B2, P35→B3, B4) (P41→G1, G2, P43→G3, P49→G4) (P45, P46) B1
Level 2 B1B2B3B4 G1G2G3G4 P5P6P3P30 P1P2P20P3P4 SG1SG2 BlockMBR information stored at every peer Level 0 Level 1 Level 3 … Maintaining Information Peer Level = 2 (B1, P26→B2, P36→B3, P42→B4) (P4→G1, G2, P8→G3, P9→G4) (P6, P30) Searching the P2PR-tree P5 P6 P1 P2 P20 SG1 SG2 G1 G2 G3 G4 P9 P10 P8 P30 P3 Block 1 P45 P46 P41 P42 G1 G2 G3 G4 P49 P40 P48 P44 P60 P43 Block 4 G1G2G3G4 P45P46P60 Query Level = 1 Query comes to P60 G1 Maintaining Information Peer Level = 2 (B1, P26→B2, P36→B3, P42→B4) (P4→G1, G2, P8→G3, P9→G4) (P6, P30) P4
Level 2 B1B2B3B4 G1G2G3G4 P5P6P3P30 P1P2P20P3P4 SG1SG2 BlockMBR information stored at every peer Level 0 Level 1 Level 3 … Searching the P2PR-tree P5 P6 P1 P2 SG1 SG2 G1 G2 G3 G4 P9 P10 P8 P30 P3 Block 1 P45 P46 P41 P42 G1 G2 G3 G4 P49 P40 P48 P44 P60 P43 Block 4 G1G2G3G4 P45P46P60 Query Level = 2 Query comes to P60 P4 Maintaining Information Peer Level = 3 (B1, P27→B2, P37→B3, P43→B4) (G1, P6→G2, P8→G3, P10→G4) (P20→SG1, SG2) (P3) Maintaining Information Peer Level = 3 (B1, P27→B2, P37→B3, P43→B4) (G1, P6→G2, P8→G3, P10→G4) (P20→SG1, SG2) (P3) SG1 P20
Level 2 B1B2B3B4 G1G2G3G4 P5P6P3P30 P1P2 P20P3P4 SG1SG2 BlockMBR information stored at every peer Level 0 Level 1 Level 3 … Searching the P2PR-tree P5 P6 SG1 SG2 G1 G2 G3 G4 P9 P10 P8 P30 P3 Block 1 P45 P46 P41 P42 G1 G2 G3 G4 P49 P40 P48 P44 P60 P43 Block 4 G1G2G3G4 P45P46P60 Query Level = 3 Query comes to P60 P4 Maintaining Information Peer Level = 3 (B1, P28→B2, P38→B3, P45→B4) (G1, P30→G2, P8→G3, P9→G4) (SG1, P3→SG2) (P1,P2) P20 Maintaining Information Peer Level = 3 (B1, P28→B2, P38→B3, P45→B4) (G1, P30→G2, P8→G3, P9→G4) (SG1, P3→SG2) (P1, P2) P1P2 P1 P2
Performance Evaluation Investigates the following Investigates the following Effect of variations in workload skew Effect of variations in workload skew Performance metric: Performance metric: Average Response Time Average Response Time Comparison with Centralized MC-Rtree Comparison with Centralized MC-Rtree 1000 data providing peers 1000 data providing peers
Effect of variations in workload skew when the query interarrival rate was fixed at 20 queries/second
Effect of variations in workload skew when the query interarrival rate was fixed at 100 queries/second
Conclusion Investigation of the problem of spatial indexing in P2P environments. Proposal of the P2PR-tree (Peer-to-Peer R-tree). Scalable decentralized P2P data structure Efficient routing scheme
Future Scope of Work Detailed simulation Detailed simulation Replication Replication Availability Availability Load-balancing Load-balancing