Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

Similar presentations


Presentation on theme: "A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science."— Presentation transcript:

1 A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science and Automation, TU Ilmenau 2 Department of Computer Science, University of Magdeburg

2 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 2 Distributed Hash Tables  Examples: CAN, CHORD, PASTRY, etc.  Advantages of P2P systems, e.g., No SPOF, shared infrastructure costs, censorship-resistance  Manage huge sets of (key, value)-pairs  Cope with large numbers of parallel transactions  Efficient query processing: Greedy forward routing, But only simple exact-match queries on unstructured data sets

3 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 3 Extended Queries in DHT  Some extensions: Trigrams - text retrieval beethoven: bee eet eth tho hov ove ven Bloom filters - hash-based AND Feature vectors - multimedia documents  But: Extensions are application-specific No universal query algebra  Idea: Relational data sets, SQL-like queries Applications: management of genom data, semantic web, distributed indexes

4 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 4 Relational Data in DHT?  Storing relational data in DHT Fragmentation scheme? Accessing secondary keys?  Support for SQL-like query processing Distribution scheme for complex queries? Join operations? Full-table scan without flooding?  Exploiting the P2P nature No central instance, no global knowledge Parallel processing Problems with availability and failures

5 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 5 Outline of Our Approach  Use Content-Addressable Networks (CAN)  Locality-aware hash function Preserving neighborhood of similar tuples Space-filling curve  API Extension Multicast Temporary re-hashing  Distributed query plan operators (POP) Selection, join, grouping/aggregation POP distribution scheme

6 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 6 Content-Addressable Networks  Proposed by S. Ratnasamy (2001)  Keys: d-dimensional points  Key space is a torus in d dimensions  Example: d=2

7 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 7 Zones and Neighbors in CAN  Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone  Each peer knows the neighbors of its zone  Random assignment of peers to zones at startup  Overloading of zones, multiple realities,...

8 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 8 Greedy Forward Routing in CAN  get(k): 1.Forward request to that neighbor whose zone is closest to k 2.Repeat until the peer responsible for k is reached (k,v) get(k)

9 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 9 Managing Relational Data: Simple Approach  Relation r  R, Tuple t  r, t = {a k, a 1,..., a n } Key k‘ = h(a k )  Problems: 1.Tuples are irregularly disseminated over the key space, i.e., only exact-match queries are supported 2.No search for attributes other than primary key x x x x x x x x σ 5<a k <10 (r) ? σ a b =20 (r) ?

10 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 10 Fragmentation Scheme  Reverse bit interleaving (z-curve) Tuple t  r, t = {a k, a 1,..., a n } Two hash functions: Key k‘ = h r (r) ° h k (a k ) (RelationID, Key Value) RelationIDKey 00010100 00010010 hrhr hkhk Dimension #1 Dimension #2 (1,2) Key k‘ = h(ak)

11 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 11 Two Hash Functions  Key k‘ = h r (r) ° h k (a k )  h r (r): RelationID determines the placement of the space-filling curve  h k (a k ): primary key determines the position on the curve, locality-awareness a k = 0, ra,ra, rb,rb, rcrc 1,2,3,4,...

12 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 12 Additional API Primitives  Standard operations: put(k, v), v=get(k)  Only two additional operations needed for our query algebra: put_temp(), multicast() put_temp(k, v, t) Re-hashing of a given relation Temporary put-operation Allows indexed access to other attributes than the primary key

13 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 13 Additional API Primitives (Cont.) multicast(z min, z max, POP) Sends a message to a group of peers Peers are identified by an interval of the z- curve Example: σ 3<a k <6 (r) multicast(3,6, POP) send(σ ak=3 ) send(σ 4<ak<6 )

14 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 14 Query Plan Operators (POP)  Hash-based implementation for selection, join, grouping, aggregation  Distributed query processing  Operator Trees R S T

15 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 15 Selection  Selection POP On the primary key:  Example: σ 3<ak<6 (r)  Determine the interval on the z-curve  Send selection operator via multicast On other attributes:  Example: σ 3<a5<6 (r)  Perform full-table scan, e.g., multicast( min(a 5 ), max(a 5 ), POP)

16 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 16 Join  Nested Loop Join POP, Symmetric Hash Join POP On the primary key:  Perform join immediately On other attributes:  Re-hash the relation using put_temp first  Perform join as above

17 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 17 Example: Symmetric Hash Join shjoin(R,S) put_temp(h(t R ),t R,x) put_temp(h(t S ),t S,x) R2R2 R1R1 S1S1 S2S2 RS 1 RS 2 RS

18 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 18 Sorting/Aggregation  Central grouping POP: One peer iterates over the z-curve, performs central sorting/aggregation  Hash group POP: Re-distribute the relation using a hash function on the attribute to be sorted/aggregated “Aggregation Peers” are responsible for sorting/aggregation of incoming attribute values

19 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 19 Query Evaluation  Input Left-handed POP trees  Design Principles Stateless evaluation Blocking operations: delivery of intermediate data (early aggregation) R S T

20 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 20 r1r1 Query Evaluation: Example P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 r 2a r 2b r ra r rb r r1 r2

21 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 21 Conclusion  Current state: Prototype is fully implemented Execution of plans like (shjoin a1=a2 (scan a3>42 REL1) (scan REL2)) First experiments in small CAN (100 Peers) are promising

22 E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 22 Conclusion (cont.)  Future topics: Experiments with large data sets and many nodes (100,000 nodes, 10 mio. queries, test data from the TCP-H benchmark) Optimization of the different POP implementations Efficient range queries Dynamic query operations


Download ppt "A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science."

Similar presentations


Ads by Google