Download presentation
Presentation is loading. Please wait.
1
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03
2
Shawn Jeffery PeerDB2 Overview A P2P “database” system Allows content-based search No global schema Utilizes mobile agents Provides flexibility and extensibility Dynamically adjusts topology
3
11/05/03Shawn Jeffery PeerDB3 Background: P2P vs Distributed Databases P2PDDBMS MembershipAd-hocControlled SchemaNo global schema Shared (or at least some way to mediate) Query result setIncompleteComplete Content location“Word of mouth” Shared catalog
4
11/05/03Shawn Jeffery PeerDB4 BestPeer Generic P2P platform Mobile Agents Carry code and data Collect stats Security issues? Dynamic Reconfiguration How does this compare to Gia? Location Independent Global Names Lookup (LIGLO) Servers Small number Provides a global identity for peers and peer status Why not use a DHT/KBR/DOLR?
5
11/05/03Shawn Jeffery PeerDB5 BestPeer Security Private and sharable data Agents only able to access sharable data Does this adequately restrict the power of mobile agents? Communications on the wire also encrypted What’s missing?
6
11/05/03Shawn Jeffery PeerDB6 Architecture Sharable Data Local Data Database
7
11/05/03Shawn Jeffery PeerDB7 Schema “Mediation” Problems with supporting SQL queries: No global schema information Different nodes could name the same table/attribute differently (“len”, “length”) Solution: User supplies metadata for each relation name and attribute Users expected to do a lot Formula based on matching relation keywords and attribute keywords to determine if a query matches a table What about other schema mediation work (such as Piazza)?
8
11/05/03Shawn Jeffery PeerDB8 Local Query Processing – Phase I “Master Agent” coordinates the entire affair Check Local Dictionary for matching relations Use the relation matching strategy even for the local DB Create “Relation Matching Agents” and flood to all neighbors Wait for responses Display results to user as they arrive
9
11/05/03Shawn Jeffery PeerDB9 Local Query Processing – Phase II User selects the relations he/she wants Create a “Data Retrieval Agent” Rewrite query in terms of new relations If local, submit SQL to local db Contact remote nodes directly to access the data Creates remote join plans locally - optimization?
10
11/05/03Shawn Jeffery PeerDB10 Remote Query Processing Phase I: Find relations Relation Matching Agents flood with TTL Check Export Dictionary for a match Return matches directly Phase II: Get data Data Retrieval Agent submits SQL to DBMS Return data to the requesting node directly Run further data processing before returning Again, security issues
11
11/05/03Shawn Jeffery PeerDB11 Statistics Master Agents monitor stats in the network Keywords for some relations returned during Phase I Update metadata Number of objects returned for selected relations Can be used for topology change decisions Use most recently returned results as metric to determine who to connect with Frequent updates – might need to change neighbors after each result returned
12
11/05/03Shawn Jeffery PeerDB12 Caches Cache all query results locally Soft state LRU replacement Users choose which copy they want Only provided with peer id and an indication of which is the source What about timestamp, etc? Again, user heavily involved
13
11/05/03Shawn Jeffery PeerDB13 Relation Matching Performance Significant tradeoff between precision and recall Which is more important? Is their approach acceptable?
14
11/05/03Shawn Jeffery PeerDB14 Experimental Methodology Compare P2P Model vs Client/Server model CS returns via the search path (?) Compare static vs reconfigurable networks Compare agent vs message based approach 32 Nodes Is this enough?
15
11/05/03Shawn Jeffery PeerDB15 Evaluation Scenarios (Metrics?) Fixed set of nodes Easily test P2P protocols, Reconfiguration strategies Latency Quality and Quantity What else is important?
16
11/05/03Shawn Jeffery PeerDB16 Performance As you increase the amount of storage on each node, latency decrease Due to caching In general, reconfiguration performs better Response times O(1 Minute) Is this acceptable? Agent based shown to be better What if agent produces more data than it processes?
17
11/05/03Shawn Jeffery PeerDB17 Discussion: A P2P DBMS? PeerDB represents a tiny step towards a P2P DB (also PIER, Piazza) What does it do right? What else is needed? Is it ideal to have a P2P DB? Is it feasible?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.