Download presentation
Presentation is loading. Please wait.
1
A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science and Automation, TU Ilmenau 2 Department of Computer Science, University of Magdeburg
2
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 2 Distributed Hash Tables Examples: CAN, CHORD, PASTRY, etc. Advantages of P2P systems, e.g., No SPOF, shared infrastructure costs, censorship-resistance Manage huge sets of (key, value)-pairs Cope with large numbers of parallel transactions Efficient query processing: Greedy forward routing, But only simple exact-match queries on unstructured data sets
3
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 3 Extended Queries in DHT Some extensions: Trigrams - text retrieval beethoven: bee eet eth tho hov ove ven Bloom filters - hash-based AND Feature vectors - multimedia documents But: Extensions are application-specific No universal query algebra Idea: Relational data sets, SQL-like queries Applications: management of genom data, semantic web, distributed indexes
4
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 4 Relational Data in DHT? Storing relational data in DHT Fragmentation scheme? Accessing secondary keys? Support for SQL-like query processing Distribution scheme for complex queries? Join operations? Full-table scan without flooding? Exploiting the P2P nature No central instance, no global knowledge Parallel processing Problems with availability and failures
5
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 5 Outline of Our Approach Use Content-Addressable Networks (CAN) Locality-aware hash function Preserving neighborhood of similar tuples Space-filling curve API Extension Multicast Temporary re-hashing Distributed query plan operators (POP) Selection, join, grouping/aggregation POP distribution scheme
6
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 6 Content-Addressable Networks Proposed by S. Ratnasamy (2001) Keys: d-dimensional points Key space is a torus in d dimensions Example: d=2
7
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 7 Zones and Neighbors in CAN Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone Each peer knows the neighbors of its zone Random assignment of peers to zones at startup Overloading of zones, multiple realities,...
8
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 8 Greedy Forward Routing in CAN get(k): 1.Forward request to that neighbor whose zone is closest to k 2.Repeat until the peer responsible for k is reached (k,v) get(k)
9
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 9 Managing Relational Data: Simple Approach Relation r R, Tuple t r, t = {a k, a 1,..., a n } Key k‘ = h(a k ) Problems: 1.Tuples are irregularly disseminated over the key space, i.e., only exact-match queries are supported 2.No search for attributes other than primary key x x x x x x x x σ 5<a k <10 (r) ? σ a b =20 (r) ?
10
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 10 Fragmentation Scheme Reverse bit interleaving (z-curve) Tuple t r, t = {a k, a 1,..., a n } Two hash functions: Key k‘ = h r (r) ° h k (a k ) (RelationID, Key Value) RelationIDKey 00010100 00010010 hrhr hkhk Dimension #1 Dimension #2 (1,2) Key k‘ = h(ak)
11
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 11 Two Hash Functions Key k‘ = h r (r) ° h k (a k ) h r (r): RelationID determines the placement of the space-filling curve h k (a k ): primary key determines the position on the curve, locality-awareness a k = 0, ra,ra, rb,rb, rcrc 1,2,3,4,...
12
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 12 Additional API Primitives Standard operations: put(k, v), v=get(k) Only two additional operations needed for our query algebra: put_temp(), multicast() put_temp(k, v, t) Re-hashing of a given relation Temporary put-operation Allows indexed access to other attributes than the primary key
13
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 13 Additional API Primitives (Cont.) multicast(z min, z max, POP) Sends a message to a group of peers Peers are identified by an interval of the z- curve Example: σ 3<a k <6 (r) multicast(3,6, POP) send(σ ak=3 ) send(σ 4<ak<6 )
14
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 14 Query Plan Operators (POP) Hash-based implementation for selection, join, grouping, aggregation Distributed query processing Operator Trees R S T
15
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 15 Selection Selection POP On the primary key: Example: σ 3<ak<6 (r) Determine the interval on the z-curve Send selection operator via multicast On other attributes: Example: σ 3<a5<6 (r) Perform full-table scan, e.g., multicast( min(a 5 ), max(a 5 ), POP)
16
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 16 Join Nested Loop Join POP, Symmetric Hash Join POP On the primary key: Perform join immediately On other attributes: Re-hash the relation using put_temp first Perform join as above
17
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 17 Example: Symmetric Hash Join shjoin(R,S) put_temp(h(t R ),t R,x) put_temp(h(t S ),t S,x) R2R2 R1R1 S1S1 S2S2 RS 1 RS 2 RS
18
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 18 Sorting/Aggregation Central grouping POP: One peer iterates over the z-curve, performs central sorting/aggregation Hash group POP: Re-distribute the relation using a hash function on the attribute to be sorted/aggregated “Aggregation Peers” are responsible for sorting/aggregation of incoming attribute values
19
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 19 Query Evaluation Input Left-handed POP trees Design Principles Stateless evaluation Blocking operations: delivery of intermediate data (early aggregation) R S T
20
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 20 r1r1 Query Evaluation: Example P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 r 2a r 2b r ra r rb r r1 r2
21
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 21 Conclusion Current state: Prototype is fully implemented Execution of plans like (shjoin a1=a2 (scan a3>42 REL1) (scan REL2)) First experiments in small CAN (100 Peers) are promising
22
E. BuchmannA Physical Query Algebra for DHT- based P2P Systems 22 Conclusion (cont.) Future topics: Experiments with large data sets and many nodes (100,000 nodes, 10 mio. queries, test data from the TCP-H benchmark) Optimization of the different POP implementations Efficient range queries Dynamic query operations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.