Beehive: Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions Venugopalan Ramasubramanian (Rama) and Emin Gün Sirer Cornell.

Slides:

Advertisements

Similar presentations

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Advertisements

Peer to Peer and Distributed Hash Tables

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.

Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:

Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.

GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.

CoDoNs A High-Performance Alternative for the Domain Name System Emin Gün Sirer Venugopalan Ramasubramanian Computer Science, Cornell University.

A Peer-to-Peer DNS Ilya Sukhar Venugopalan Ramasubramanian Emin Gün Sirer Cornell University.

Information-Centric Networks03c-1 Week 3 / Paper 3 The design and implementation of a next generation name service for the Internet –Venugopalan Ramasubramanian.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Applications over P2P Structured Overlays Antonino Virgillito.

FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,

Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.

P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.

Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.

Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.

Object Naming & Content based Object Search 2/3/2003.

Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.

Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.

Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.

Wide-area cooperative storage with CFS

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.

Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.

INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.

Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.

Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications Abhishek Bhattacharya, Zhenyu Yang & Deng Pan IEEE International Symposium.

1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.

PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.

Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.

The Design and Implementation of a Next Generation Name Service for the Internet Leo Bhebhe

Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.

Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)

Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,

Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.

Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.

Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.

Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06.

Information-Centric Networks Section # 3.3: DNS Issues Instructor: George Xylomenos Department: Informatics.

Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.

Sharp Hybrid Adaptive Routing Protocol for Mobile Ad Hoc Networks

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Bruce Hammer, Steve Wallis, Raymond Ho

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

Malugo – a scalable peer-to-peer storage system..

The Design and Implementation of a Next Generation Name Service for the Internet V. Ramasubramanian, E. Gun Sirer Cornell Univ. SIGCOMM 2004 Ciprian Tutu.

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

TRUST Self-Organizing Systems Emin G ü n Sirer, Cornell University.

Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.

CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.

Plethora: Infrastructure and System Design

Early Measurements of a Cluster-based Architecture for P2P Systems

Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

Presentation transcript:

Beehive: Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions Venugopalan Ramasubramanian (Rama) and Emin Gün Sirer Cornell University

introduction caching is widely-used to improve latency and to decrease overhead passive caching caches distributed throughout the network store objects that are encountered not well-suited for a large-class applications

problems with passive caching no performance guarantees heavy-tail effect large percentage of queries to unpopular objects ad-hoc heuristics for cache management introduces coherency problems difficult to locate all copies weak consistency model

overview of beehive general replication framework for structured DHTs decentralization, self-organization, resilience properties high performance: O(1) average lookup time scalable: minimize number of replicas and reduce storage, bandwidth, and network load adaptive: promptly respond to changes in popularity – flash crowds

0122 prefix-matching DHTs object log b N hops several RTTs on the Internet

key intuition tunable latency adjust number of objects replicated at each level fundamental space- time tradeoff

analytical model optimization problem minimize: total number of replicas, s.t., average lookup performance  C configurable target lookup performance continuous range, sub one-hop minimizing number of replicas decreases storage and bandwidth overhead

analytical model zipf-like query distributions with parameter  number of queries to r th popular object  1/r  fraction of queries for m most popular objects  (m 1-  - 1) / (M 1-  - 1) level of replication nodes share i prefix-digits with the object i hop lookup latency replicated on N/b i nodes

optimization problem minimize (storage/bandwidth) x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 such that (average lookup time is C hops) K – (x 0 1-  + x 1 1-  + x 2 1-  + … + x K-1 1-  )  C and x 0  x 1  x 2  …  x K-1  1 b: base K: log b (N) x i : fraction of objects replicated at level i

optimal closed-form solution d j (K’ – C) 1 + d + … + d K’  [ ] x* i =, 0  i  K’ – 1 where, d = b (1-  ) / , K’  i  K K’ is determined by setting (typically 2 or 3) x* K’-1  1  d K’-1 (K’ – C) / (1 + d + … + d K’-1 )  1 1

latency - overhead trade off

beehive: system overview estimation popularity of objects, zipf parameter local measurement, limited aggregation replication apply analytical model independently at each node push new replicas to nodes at most one hop away

beehive replication protocol * home node EL * EBIL 2 0 * ABCDEFGHI L 1 object 0121

mutable objects leverage the underlying structure of DHT replication level indicates the locations of all the replicas proactive propagation to all nodes from the home node home node sends to one-hop neighbors with i matching prefix-digits level i nodes send to level i+1 nodes

implementation and evaluation implemented using Pastry as the underlying DHT evaluation using a real DNS workload MIT DNS trace (zipf parameter 0.91) 1024 nodes, objects compared with passive caching on pastry main properties evaluated lookup performance storage and bandwidth overhead adaptation to changes in query distribution

evaluation: lookup performance passive caching is not very effective because of heavy tail query distribution and mutable objects. beehive converges to the target of 1 hop

evaluation: overhead Bandwidth Storage average number of replicas per node Pastry40 PC-Pastry420 Beehive380

evaluation: flash crowds lookup performance

evaluation: zipf parameter change

Cooperative Domain Name System (CoDoNS) replacement for legacy DNS secure authentication through DNSSEC incremental deployment path completely transparent to clients uses legacy DNS to populate resource records on demand deployed on planet-lab

advantages of CoDoNS higher performance than legacy DNS median latency of 7 ms for codons (planet- lab), 39 ms for legacy DNS resilience against denial of service attacks self configuration after host and network failures fast update propagation

conclusions model-driven proactive caching O(1) lookup performance with optimal replicas beehive: a general replication framework structured overlays with uniform fan-out high performance, resilience, improved availability well-suited for latency sensitive applications

evaluation: zipf parameter change

evaluation: instantaneous bandwidth overhead

lookup performance: target 0.5 hops

lookup performance: planet-lab

typical values of zipf parameter MIT DNS trace:  = 0.91 Web traces: traceDecUPisaFuNetUCBQuestNLANR 

comparative overview of structured DHTs DHT lookup performance CANO(dN 1/d ) Chord, Kademlia, Pastry, Tapestry, Viceroy O(logN) de Bruijn graphs (Koorde)O(logN/loglogN) Kelips, Salad, [Gupta, Liskov, Rodriguez], [Mizrak, Cheng, Kumar, Savage] O(1)

O(1) structured DHTs DHT lookup performance routing state SaladdO(dN 1/d ) [Mizrak, Cheng, Kumar, Savage] 2 NN Kelips1  N (  N replication) [Gupta, Liskov, Rodriguez] 1N

security issues in beehive underlying DHT corruption in routing tables [Castro, Druschel, Ganesh, Rowstrom, Wallach] beehive misrepresentation of popularity remove outliers application corruption of data certificates (ex. DNS-SEC)

Beehive DNS: Lookup Performance CoDoNSLegacy DNS median6.56 ms38.8 ms 90 th percentile281 ms337 ms

introduction distributed peer-peer overlay networks decentralized self-organized distributed hash tables (DHTs) store – lookup interface unstructured DHTs Freenet, Gnutella, Kazaa bad lookup performance: accuracy and latency

example b = 32 C = 1  = 0.9 N = 10,000 M = 1,000,000 x* 0 = = 1102 objects x* 1 = = objects x* 2 = 1 total storage = 3700 objects per node total storage for Kelips = M/  N = 10,000 objects per node

structured overlays distributed peer-to-peer overlays decentralized, self-organized, resilient structured DHTs object storage and retrieval bounded average, worst-case latency latency sensitive applications domain name service (DNS) and web access need sub one hop latency

analytical model configurable target lookup performance continuous range even better with proximity routing minimizing number of replicas provides storage as well as bandwidth efficiency k’ is a upper bound on lookup performance of successful query assumptions homogeneous object sizes infrequent updates

beehive replication protocol periodic packets to nodes in routing table asynchronous and independent exploit structure of underlying DHT replication packet sent by node A to each node B in level i of routing table node B pushes new replicas to A and tells A which replicas to remove fluctuations in estimated popularity aging to prevent sudden changes hysteresis to limit thrashing

evaluation: DNS application DNS survey queried unique domain names TTL distribution: 95% < 1 day rate of change of entries: 0.13% per day MIT DNS trace: 4 ~ 11 december million queries for 300,000 distinct names zipf parameter: 0.91 setup simulation mode on single node 1024 nodes, distinct objects 7 queries per sec from MIT trace 0.8% per day rate of change

introduction lookup latency and storage CANChordPastryKelips latencyO(dN 1/d )O(log 2 N)O(log b N)O(1) 1,000,000 nodes 39.8 hops d = hops4.98 hops b = 16 ~1 hop 86.4 ms/hop 3.4 sec1.7 sec0.43 sec sec storageO(1/N) O(1/  N)

improving lookup latency passive caching of lookup results  not effective for heavy-tail query distributions  no guaranteed performance  updates invalidate cached results O(1) lookup performance trade off storage and bandwidth for performance Kelips: O(  N) replicas per object [GLR2003]: complete routing table

differential replication 37420***** ***** level 0 3**** 3**** level 1 37***level 2 obj id

optimal storage: C = 1 hop

summary and useful properties constant average lookup latency the constant is configurable popular objects have shorter lookup times? upper bounded by K’ (2 for  = 0.8) optimal overhead the storage and bandwidth requirements can be estimated overhead decreases with increasing  high availability for popular objects mitigates flash crowd effect proactive replication supports mutable objects more benefits can be derived by using proximity optimizations

a peer-peer DNS why p2p? iterative queries name-server mis-configurations  lots of failures and increased traffic less availability  chain of NS records update problem (Akamai) why BeeHive? constant lookup time  upper bound given by K’ (~2 or 3 hops)  comparable or better performance better availability due to replication support for mutability

DNS over beehive: distributed cooperative active cache deploy incrementally non-uniform rate of change scale popularity metric proportionately lookup failure negative caching reverse iterative resolution  lookup x.y.com, then y.com, then com…  fetches NS records  locality  inverse queries

DNS over BeeHive: security DNSSEC public key cryptography, signature chains namespace management  big sizes of key and sig records cache chain of key records for authentication popularity(y.com) > popularity(x.y.com) avoid duplicate key records authenticated denial reverse iterative resolution

other potential applications translation services semantic free lookup P6P p2p file sharing  text based search  anti-social applications? web  widely varying object sizes  dynamic content

conclusions BeeHive: p2p system based on differential replication goals efficient: constant lookup time with minimal overhead robust: self-organization and resilience against the vagaries of the network secure: resilience against malicious elements CoDoNS: Cooperative DNS on BeeHive

selective bibliography traces and zipf distributions web caching and zipf-like distributions: evidence and implications  Breslau, Cao, Fan, Phillips, and Shenker [infocom’99] popularity of gnutella queries and their implications on scalability  Sripanidkulchai [2002] caching and replication replication strategies in unstructured p2p networks  Cohen and Shenker [sigcomm’02] cup: controlled update propagation in p2p networks  Roussopoulos and Baker [usenix’03] DNS development of DNS system  Mockapetris [sigcomm’88] DNS performance and effectiveness of caching  Jung, Sit, Balakrishnan, and Morris [sigmetrics’01] serving DNS using a peer-to-peer lookup service  Cox, Muthitacharoen, and Morris [iptps’02]

notations b:base of the DHT system N:number of nodes (b K ) M:number of objects  :alpha of zipf-like query distribution x j :fraction of objects replicated at level j or lower x 2 = fraction of objects replicated at levels 0, 1, and 2 0  x 0  x 1  x 2  …  x K-1  x K = 1

storage and bandwidth per node storage required at level j = M(x j – x j-1 )/b j total per node storage = Mx 0 + M(x 1 – x 0 )/b + M(x 2 – x 1 )/b 2 + … + M(x K – x K-1 )/b K = M [(1 – 1/b)(x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 ) + 1/b K ] total bandwidth = M b K [(1 – 1/b)(x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 )]

lookup latency fraction of queries for Mx j objects  ((Mx j ) 1-  - 1) / (M 1-  - 1)  x j 1-  average lookup time at level j  j (x j 1-  - x j-1 1-  ) average lookup time  (x 1 1-  - x 0 1-  ) + 2(x 2 1-  - x 1 1-  ) + … + K(x K 1-  - x K-1 1-  )  K – (x 0 1-  + x 1 1-  + x 2 1-  + … + x K-1 1-  )

optimization problem minimize x 0 + x 1 /b + x 2 /b 2 + … + x K-1 /b K-1 such that (x 0 1-  + x 1 1-  + x 2 1-  + … + x K-1 1-  )  K – C and x 0  x 1  x 2  …  x K-1  1 and x K-1  1

solution minimize x 0 + x 1 /b + x 2 /b 2 + … + x K’-1 /b K ’ -1 such that (x 0 1-  + x 1 1-  + x 2 1-  + … + x K’-1 1-  )  K’ – C solution using lagrange multiplier technique d j (K’ – C) 1 + d + … + d K’  [ ] x* j = 0  j  K’ – 1 d = b (1-  ) /  x* j = 1 K’  j  K K’ is determined by setting x* K’-1  1  d K’-1 (K’ – C) / (1 + d + … + d K’-1 )  1

constant lookup time systems kelips and scuttlebutt trade off bandwidth and storage for performance one-hop lookup latency resilience against transient nodes O(√N) replication of all objects  expensive update propagation