Download presentation
Presentation is loading. Please wait.
Published byPhilip Carr Modified over 9 years ago
1
Beehive: Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions Venugopalan Ramasubramanian (Rama) and Emin Gün Sirer Cornell University
2
introduction caching is widely-used to improve latency and to decrease overhead passive caching caches distributed throughout the network store objects that are encountered not well-suited for a large-class applications
3
problems with passive caching no performance guarantees heavy-tail effect large percentage of queries to unpopular objects ad-hoc heuristics for cache management introduces coherency problems difficult to locate all copies weak consistency model
4
overview of beehive general replication framework for structured DHTs decentralization, self-organization, resilience properties high performance: O(1) average lookup time scalable: minimize number of replicas and reduce storage, bandwidth, and network load adaptive: promptly respond to changes in popularity – flash crowds
5
0122 prefix-matching DHTs object 0121 2012 0021 0112 log b N hops several RTTs on the Internet
6
key intuition tunable latency adjust number of objects replicated at each level fundamental space- time tradeoff 2012 0021 0112 0122
7
analytical model optimization problem minimize: total number of replicas, s.t., average lookup performance C configurable target lookup performance continuous range, sub one-hop minimizing number of replicas decreases storage and bandwidth overhead
8
analytical model zipf-like query distributions with parameter number of queries to r th popular object 1/r fraction of queries for m most popular objects (m 1- - 1) / (M 1- - 1) level of replication nodes share i prefix-digits with the object i hop lookup latency replicated on N/b i nodes
9
optimization problem minimize (storage/bandwidth) x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 such that (average lookup time is C hops) K – (x 0 1- + x 1 1- + x 2 1- + … + x K-1 1- ) C and x 0 x 1 x 2 … x K-1 1 b: base K: log b (N) x i : fraction of objects replicated at level i
10
optimal closed-form solution d j (K’ – C) 1 + d + … + d K’-1 1 1 - [ ] x* i =, 0 i K’ – 1 where, d = b (1- ) / , K’ i K K’ is determined by setting (typically 2 or 3) x* K’-1 1 d K’-1 (K’ – C) / (1 + d + … + d K’-1 ) 1 1
11
latency - overhead trade off
12
beehive: system overview estimation popularity of objects, zipf parameter local measurement, limited aggregation replication apply analytical model independently at each node push new replicas to nodes at most one hop away
13
beehive replication protocol 0 1 2 * home node EL 3 0 1 * EBIL 2 0 * ABCDEFGHI L 1 object 0121
14
mutable objects leverage the underlying structure of DHT replication level indicates the locations of all the replicas proactive propagation to all nodes from the home node home node sends to one-hop neighbors with i matching prefix-digits level i nodes send to level i+1 nodes
15
implementation and evaluation implemented using Pastry as the underlying DHT evaluation using a real DNS workload MIT DNS trace (zipf parameter 0.91) 1024 nodes, 40960 objects compared with passive caching on pastry main properties evaluated lookup performance storage and bandwidth overhead adaptation to changes in query distribution
16
evaluation: lookup performance passive caching is not very effective because of heavy tail query distribution and mutable objects. beehive converges to the target of 1 hop
17
evaluation: overhead Bandwidth Storage average number of replicas per node Pastry40 PC-Pastry420 Beehive380
18
evaluation: flash crowds lookup performance
19
evaluation: zipf parameter change
20
Cooperative Domain Name System (CoDoNS) replacement for legacy DNS secure authentication through DNSSEC incremental deployment path completely transparent to clients uses legacy DNS to populate resource records on demand deployed on planet-lab
21
advantages of CoDoNS higher performance than legacy DNS median latency of 7 ms for codons (planet- lab), 39 ms for legacy DNS resilience against denial of service attacks self configuration after host and network failures fast update propagation
22
conclusions model-driven proactive caching O(1) lookup performance with optimal replicas beehive: a general replication framework structured overlays with uniform fan-out high performance, resilience, improved availability well-suited for latency sensitive applications www.cs.cornell.edu/people/egs/beehive
23
evaluation: zipf parameter change
24
evaluation: instantaneous bandwidth overhead
25
lookup performance: target 0.5 hops
26
lookup performance: planet-lab
27
typical values of zipf parameter MIT DNS trace: = 0.91 Web traces: traceDecUPisaFuNetUCBQuestNLANR 0.830.84 0.830.880.90
28
comparative overview of structured DHTs DHT lookup performance CANO(dN 1/d ) Chord, Kademlia, Pastry, Tapestry, Viceroy O(logN) de Bruijn graphs (Koorde)O(logN/loglogN) Kelips, Salad, [Gupta, Liskov, Rodriguez], [Mizrak, Cheng, Kumar, Savage] O(1)
29
O(1) structured DHTs DHT lookup performance routing state SaladdO(dN 1/d ) [Mizrak, Cheng, Kumar, Savage] 2 NN Kelips1 N ( N replication) [Gupta, Liskov, Rodriguez] 1N
30
security issues in beehive underlying DHT corruption in routing tables [Castro, Druschel, Ganesh, Rowstrom, Wallach] beehive misrepresentation of popularity remove outliers application corruption of data certificates (ex. DNS-SEC)
31
Beehive DNS: Lookup Performance CoDoNSLegacy DNS median6.56 ms38.8 ms 90 th percentile281 ms337 ms
32
introduction distributed peer-peer overlay networks decentralized self-organized distributed hash tables (DHTs) store – lookup interface unstructured DHTs Freenet, Gnutella, Kazaa bad lookup performance: accuracy and latency
33
example b = 32 C = 1 = 0.9 N = 10,000 M = 1,000,000 x* 0 = 0.001102 = 1102 objects x* 1 = 0.0519 = 51900 objects x* 2 = 1 total storage = 3700 objects per node total storage for Kelips = M/ N = 10,000 objects per node
34
structured overlays distributed peer-to-peer overlays decentralized, self-organized, resilient structured DHTs object storage and retrieval bounded average, worst-case latency latency sensitive applications domain name service (DNS) and web access need sub one hop latency
35
analytical model configurable target lookup performance continuous range even better with proximity routing minimizing number of replicas provides storage as well as bandwidth efficiency k’ is a upper bound on lookup performance of successful query assumptions homogeneous object sizes infrequent updates
36
beehive replication protocol periodic packets to nodes in routing table asynchronous and independent exploit structure of underlying DHT replication packet sent by node A to each node B in level i of routing table node B pushes new replicas to A and tells A which replicas to remove fluctuations in estimated popularity aging to prevent sudden changes hysteresis to limit thrashing
37
evaluation: DNS application DNS survey queried 594059 unique domain names TTL distribution: 95% < 1 day rate of change of entries: 0.13% per day MIT DNS trace: 4 ~ 11 december 2000 4 million queries for 300,000 distinct names zipf parameter: 0.91 setup simulation mode on single node 1024 nodes, 40960 distinct objects 7 queries per sec from MIT trace 0.8% per day rate of change
38
introduction lookup latency and storage CANChordPastryKelips latencyO(dN 1/d )O(log 2 N)O(log b N)O(1) 1,000,000 nodes 39.8 hops d = 10 19.93 hops4.98 hops b = 16 ~1 hop 86.4 ms/hop 3.4 sec1.7 sec0.43 sec0.0864 sec storageO(1/N) O(1/ N)
39
improving lookup latency passive caching of lookup results not effective for heavy-tail query distributions no guaranteed performance updates invalidate cached results O(1) lookup performance trade off storage and bandwidth for performance Kelips: O( N) replicas per object [GLR2003]: complete routing table
40
differential replication 37420***** ***** level 0 3**** 3**** level 1 37***level 2 obj id
41
optimal storage: C = 1 hop
42
summary and useful properties constant average lookup latency the constant is configurable popular objects have shorter lookup times? upper bounded by K’ (2 for = 0.8) optimal overhead the storage and bandwidth requirements can be estimated overhead decreases with increasing high availability for popular objects mitigates flash crowd effect proactive replication supports mutable objects more benefits can be derived by using proximity optimizations
43
a peer-peer DNS why p2p? iterative queries name-server mis-configurations lots of failures and increased traffic less availability chain of NS records update problem (Akamai) why BeeHive? constant lookup time upper bound given by K’ (~2 or 3 hops) comparable or better performance better availability due to replication support for mutability
44
DNS over beehive: distributed cooperative active cache deploy incrementally non-uniform rate of change scale popularity metric proportionately lookup failure negative caching reverse iterative resolution lookup x.y.com, then y.com, then com… fetches NS records locality inverse queries
45
DNS over BeeHive: security DNSSEC public key cryptography, signature chains namespace management big sizes of key and sig records cache chain of key records for authentication popularity(y.com) > popularity(x.y.com) avoid duplicate key records authenticated denial reverse iterative resolution
46
other potential applications translation services semantic free lookup P6P p2p file sharing text based search anti-social applications? web widely varying object sizes dynamic content
47
conclusions BeeHive: p2p system based on differential replication goals efficient: constant lookup time with minimal overhead robust: self-organization and resilience against the vagaries of the network secure: resilience against malicious elements CoDoNS: Cooperative DNS on BeeHive
48
selective bibliography traces and zipf distributions web caching and zipf-like distributions: evidence and implications Breslau, Cao, Fan, Phillips, and Shenker [infocom’99] popularity of gnutella queries and their implications on scalability Sripanidkulchai [2002] caching and replication replication strategies in unstructured p2p networks Cohen and Shenker [sigcomm’02] cup: controlled update propagation in p2p networks Roussopoulos and Baker [usenix’03] DNS development of DNS system Mockapetris [sigcomm’88] DNS performance and effectiveness of caching Jung, Sit, Balakrishnan, and Morris [sigmetrics’01] serving DNS using a peer-to-peer lookup service Cox, Muthitacharoen, and Morris [iptps’02]
49
notations b:base of the DHT system N:number of nodes (b K ) M:number of objects :alpha of zipf-like query distribution x j :fraction of objects replicated at level j or lower x 2 = fraction of objects replicated at levels 0, 1, and 2 0 x 0 x 1 x 2 … x K-1 x K = 1
50
storage and bandwidth per node storage required at level j = M(x j – x j-1 )/b j total per node storage = Mx 0 + M(x 1 – x 0 )/b + M(x 2 – x 1 )/b 2 + … + M(x K – x K-1 )/b K = M [(1 – 1/b)(x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 ) + 1/b K ] total bandwidth = M b K [(1 – 1/b)(x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 )]
51
lookup latency fraction of queries for Mx j objects ((Mx j ) 1- - 1) / (M 1- - 1) x j 1- average lookup time at level j j (x j 1- - x j-1 1- ) average lookup time (x 1 1- - x 0 1- ) + 2(x 2 1- - x 1 1- ) + … + K(x K 1- - x K-1 1- ) K – (x 0 1- + x 1 1- + x 2 1- + … + x K-1 1- )
52
optimization problem minimize x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 such that (x 0 1- + x 1 1- + x 2 1- + … + x K-1 1- ) K – C and x 0 x 1 x 2 … x K-1 1 and x K-1 1
53
solution minimize x 0 + x 1 /b + x 2 /b 2 + … + x K’-1 /b K ’ -1 such that (x 0 1- + x 1 1- + x 2 1- + … + x K’-1 1- ) K’ – C solution using lagrange multiplier technique d j (K’ – C) 1 + d + … + d K’-1 1 1 - [ ] x* j = 0 j K’ – 1 d = b (1- ) / x* j = 1 K’ j K K’ is determined by setting x* K’-1 1 d K’-1 (K’ – C) / (1 + d + … + d K’-1 ) 1
54
constant lookup time systems kelips and scuttlebutt trade off bandwidth and storage for performance one-hop lookup latency resilience against transient nodes O(√N) replication of all objects expensive update propagation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.