Download presentation
Presentation is loading. Please wait.
1
ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California at Santa Barbara
2
ICDE 2004 2 Outline Motivation Range mapping System overview Experimental results Conclusion and future work
3
University of California at Santa Barbara ICDE 2004 3 Motivation All queries are answered by the server Server is overloaded Scalability, availability Same/similar queries are evaluated multiple times Clients Central Data Server
4
University of California at Santa Barbara ICDE 2004 4 Motivation Users share their cached answers Server is contacted only if the P2P layer cannot find an answer Clients Central Data Server P2P Cache
5
University of California at Santa Barbara ICDE 2004 5 P2P Systems File sharing: Napster, Gnutella, KaZaA, … Central index or flooding Structured P2P systems: CAN, Chord, Pastry, Tapestry, … DHT/DOLR Efficient Routing: logarithmic/sublinear
6
University of California at Santa Barbara ICDE 2004 6 CAN Uses a d-dimensional virtual space for routing and object location Virtual space is partitioned into zones and each zone is maintained by a peer Every peer is responsible for the objects that are hashed into its zone 2-dimensional CAN
7
University of California at Santa Barbara ICDE 2004 7 Extending DHT functionality DHTs are designed for exact-match queries Piazza [Univ. of Washington], Hyperion [Univ. of Toronto], PIER [UC Berkeley] Extend DHTs for supporting range queries Selection of ranges is a primary operation for any kind of data analysis Main Goal: Utilize a DHT in order to materialize and locate cached answers of range queries
8
University of California at Santa Barbara ICDE 2004 8 Range Queries Given a range query, find the cached answers that can be used to compute the query answer Example: If the result of is already cached in the system, then the query can be answered using the cached result is subsumed by ; so the cached result is the super-set of the answer
9
University of California at Santa Barbara ICDE 2004 9 Use range string as key: Query: Hash string: “ ” DHTs for locating ranges Can we use original DHTs? Finds exact answers but not the similar ones!
10
University of California at Santa Barbara ICDE 2004 10 Extending CAN For single attribute, the virtual space is a 2- dimensional CAN The boundaries are determined by the domain of the range attribute x y 2080 20 x 80 20 Virtual space when attribute domain is [20,80]
11
University of California at Santa Barbara ICDE 2004 11 Mapping Scheme Range is mapped to point (x,y) Super-ranges are only in the upper- left region (40,60) (20,20)(80,20) (80,80)(20,80) 40 60 (40,60) (30,70) (30,50) Start value End value
12
University of California at Santa Barbara ICDE 2004 12 Space Partitioning Virtual space is partitioned into rectangular zones Each zone is assigned to an active peer With this mapping, the data source is responsible for the top-left zone
13
University of California at Santa Barbara ICDE 2004 13 Space Partitioning Active/Passive peers Passive Peers Active Peers S S Data Source
14
University of California at Santa Barbara ICDE 2004 14 S Space Partitioning Active/Passive peers Passive Peers Active Peers
15
University of California at Santa Barbara ICDE 2004 15 S Space Partitioning Active/Passive peers Passive Peers Active Peers
16
University of California at Santa Barbara ICDE 2004 16 S Space Partitioning Active/Passive peers Passive Peers Active Peers Each active peer keeps a list of passive peers Passive peers register with active peers
17
University of California at Santa Barbara ICDE 2004 17 Zone Split An active peer splits its zone when it is overloaded Load can be due to storage or bandwidth, etc. Split line is selected by the owner of the zone Even partitioning of the zone and the cached results New zone is assigned to a passive peer
18
University of California at Santa Barbara ICDE 2004 18 Routing Same as in CAN (Greedy routing) Each zone passes the message to the neighbor closest to the destination (50,55) Query:
19
University of California at Santa Barbara ICDE 2004 19 A Sharing cached answers Map the range to a point and send a notification message towards that point Destination peer keeps the index information P 1) caches 2) notify 55 50.. : P.. Local index at A 3) insert to index
20
University of California at Santa Barbara ICDE 2004 20 A Querying Map the range to a point and send a query message towards that point Destination peer searches the local index C requires 1) query 55 50.. : P.. Local index at A 2) return P P 3) transfer
21
University of California at Santa Barbara ICDE 2004 21 Forwarding The zones on the upper- left region may have super-ranges Destination zone forwards the request to upper-left zones (50,55) If no result is found at the destination, then…
22
University of California at Santa Barbara ICDE 2004 22 Acceptable Fit (50,55) How far to forward? Forwarding is controlled by a parameter: AcceptableFit It is a real value between [0,1]: offset = AcceptableFit x |domain| Acceptable range for a range query is then: offset
23
University of California at Santa Barbara ICDE 2004 23 Forwarding Schemes Two schemes for forwarding: Flooding: Flood to all candidate zones Directed Forwarding: Iteratively forward to a single neighbor, that has the largest overlap with the acceptable region Stop if a result is found or a certain number of peers are contacted (DirectedLimit)
24
University of California at Santa Barbara ICDE 2004 24 Flooding vs. Directed Forwarding FloodingDirected Forwarding (Directed Limit=2)
25
University of California at Santa Barbara ICDE 2004 25 Updates Tuple with value 40 is updated! (40,40) Go to the corresponding point, (40,40), and flood to the upper-left region Costly, so we need better solutions Batching updates
26
University of California at Santa Barbara ICDE 2004 26 Forwarding Decreasing coordinates along odd dimensions Increasing coordinates along even dimensions Multiple range attributes Each attribute maps to two dimensions A range query over k attributes is mapped to a point in 2k-dimensional CAN ( 20<A<40, 50<B<60 ) (20,40,50,60) ( 10<A<50, 40<B<70 ) (10,50,40,70) ( -, -, -, - )
27
University of California at Santa Barbara ICDE 2004 27 Experiment Settings Single attribute with domain [0,500] The system is initially empty Range queries are selected uniformly at random For every zone: Split Point=5, Routing Threshold=3
28
University of California at Santa Barbara ICDE 2004 28 Flooding vs. Directed Forwarding Performance with flood forwardingPerformance with directed forwarding
29
University of California at Santa Barbara ICDE 2004 29 Routing is scalable Visited zones with Flood forwardingVisited zones with Directed forwarding
30
University of California at Santa Barbara ICDE 2004 30 Load Distribution 1000 peers, 10000 queries
31
University of California at Santa Barbara ICDE 2004 31 Conclusion and Future Work We presented a simple yet powerful mapping for ranges which allows us to leverage DHT infrastructure for range queries Limitations/Future Work Number of attributes should be fixed Does not work with other DHTs Assumes the existence of passive peers for load balancing
32
University of California at Santa Barbara ICDE 2004 32 Questions? odsahin@cs.ucsb.eduhttp://www.cs.ucsb.edu/~dsl/gaia.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.