Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.

Slides:

Advertisements

Similar presentations

Internet Indirection Infrastructure (i3 ) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002 Presented by:

Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Internetworking II: MPLS, Security, and Traffic Engineering

Scalable Content-Addressable Network Lintao Liu

Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.

Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.

On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.

Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.

Small-world Overlay P2P Network

1 Towards a Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica Presented for Cs294-4 by.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.

EE 4272Spring, 2003 Chapter 10 Packet Switching Packet Switching Principles  Switching Techniques  Packet Size  Comparison of Circuit Switching & Packet.

Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.

Internet Indirection Infrastructure Ion Stoica UC Berkeley.

Handling Churn in a DHT USENIX Annual Technical Conference June 29, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz UC Berkeley and.

X Non-Transitive Connectivity and DHTs Mike Freedman Karthik Lakshminarayanan Sean Rhea Ion Stoica WORLDS 2005.

Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling.

P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.

A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:

Distributed Lookup Systems

Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Wide-area cooperative storage with CFS

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002.

1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)

 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.

Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.

Designing a DHT for low latency and high throughput Robert Vollmann P2P Information Systems.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks IPTPS Feb. 25, 2005 Byung-Gon Chun, Ben Y. Zhao, and John Kubiatowicz.

Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,

Information-Centric Networks07a-1 Week 7 / Paper 1 Internet Indirection Infrastructure –Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh.

1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Hongil Kim E. Chan-Tin, P. Wang, J. Tyra, T. Malchow, D. Foo Kune, N. Hopper, Y. Kim, "Attacking the Kad Network - Real World Evaluation and High.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.

An Improved Kademlia Protocol In a VoIP System Xiao Wu ， Cuiyun Fu and Huiyou Chang Department of Computer Science, Zhongshan University, Guangzhou, China.

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.

Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.

Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park

DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.

P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,

1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.

Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06.

Peer to Peer Network Design Discovery and Routing algorithms

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA

Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]

CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,

Internet Indirection Infrastructure (i3)

Data Center Network Architectures

Pastry Scalable, decentralized object locations and routing for large p2p systems.

Distributed Hash Tables

A brief history of key-value stores

Consistent Hashing and Distributed Hash Table

Presentation transcript:

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December 13, 2005

Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Distributed Hash Tables (DHTs) Same interface as a traditional hash table –put(key, value) — stores value under key –get(key) — returns all the values stored under key Built over a distributed overlay network –Partition key space over available nodes –Route each put/get request to appropriate node

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab DHTs: The Hype High availability –Each key-value pair replicated on multiple nodes Incremental scalability –Need more storage/tput? Just add more nodes. Low latency –Recursive routing, proximity neighbor selection, server selection, etc.

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab DHTs: The Hype Promises of DHTs realized only “in the lab” –Use isolated network (Emulab, ModelNet) –Measure while PlanetLab load is low –Look only at median performance Our goal: make DHTs perform “in the wild” –Network not isolated, machines shared –Look at long term 99th percentile performance –(Caveat: no outright malicious behavior)

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Why We Care Promise of P2P was to harness idle capacity –Not supposed to need dedicated machines Running OpenDHT service on PlanetLab –No control over what else is running –Load can be really bad at times –Up 24/7: have to weather good times and bad –Good median performance isn’t good enough

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Original OpenDHT Performance Long-term median get latency < 200 ms –Matches performance of DHASH on PlanetLab –Median RTT between hosts ~ 140 ms

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Original OpenDHT Performance But 95th percentile get latency is atrocious! –Generally measured in seconds –And even median spikes up from time to time

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Talk Overview Introduction and Motivation How OpenDHT Works The Problem of Slow Nodes Algorithmic Solutions Experimental Results Related Work and Conclusions

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab OpenDHT Partitioning Assign each node an identifier from the key space Store a key-value pair (k,v) on several nodes with IDs closest to k Call them replicas for (k,v) id = 0xC9A1… responsible for these keys

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab OpenDHT Graph Structure 0xC0 0x41 0x84 0xED Overlay neighbors match prefixes of local identifier Choose among nodes with same matching prefix length by network latency

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Performing Gets in OpenDHT gateway Client sends a get request to gateway Gateway routes it along neighbor links to first replica encountered Replica sends response back directly over IP 0x41 0x6c get(0x6b) client get response

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Robustness Against Failure 0xC0 If a neighbor dies, a node routes through its next best one If replica dies, remaining replicas create a new one to replace it client 0x41 0x6c

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab The Problem of Slow Nodes What if a neighbor doesn’t fail, but just slows down temporarily? –If it stays slow, node will replace it –But must adapt slowly for stability Many sources of slowness are short-lived –Burst of network congestion causes packet loss –User loads huge Photoshop image, flushing buffer cache In either case, gets will be delayed

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Flavors of Slowness At first, slowness may be unexpected –May not notice until try to route through a node –First few get requests delayed Can keep history of nodes’ performance –Stop subsequent gets from suffering same fate –Continue probing slow node for recovery

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Talk Overview Introduction and Motivation How OpenDHT Works The Problem of Slow Nodes Algorithmic Solutions Experimental Results Related Work and Conclusions

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Two Main Techniques Delay-aware routing –Guide routing not just by progress through key space, but also by past responsiveness

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Delay-Aware Routing Gateway Replicas 30 ms Best next hop

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Delay-Aware Routing Gateway Replicas 50 ms 30 ms About the same?

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Delay-Aware Routing Gateway Replicas 500 ms 30 ms Best next hop

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Two Main Techniques Delay-aware routing –Guide routing not just by progress through key space, but also by past responsiveness –Cheap, but must first observe slowness Added parallelism –Send each request along multiple paths

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Naïve Parallelism Gateway Replicas sudden delay

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Multiple Gateways (Only client replicates requests.) Gateways Replicas sudden delay Client

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Iterative Routing (Gateway maintains p concurrent RPCs.) Gateways Replicas Client sudden delay

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Two Main Techniques Delay-aware routing –Guide routing not just by progress through key space, but also by past responsiveness –Cheap, but must first observe slowness Added parallelism –Send each request along multiple paths –Expensive, but handles unexpected slowness

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Talk Overview Introduction and Motivation How OpenDHT Works The Problem of Slow Nodes Algorithmic Solutions Experimental Results Related Work and Conclusions

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Experimental Setup Can’t get reproducible numbers from PlanetLab –Both available nodes and load change hourly –But PlanetLab is the environment we care about Solution: run all experiments concurrently –Perform each get using every mode (random order) –Look at results over long time scales: 6 days; over 27,000 samples per mode

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Delay-Aware Routing Mode Latency (ms)Cost 50th99thMsgsBytes Greedy Delay-Aware Latency drops by 30-60% Cost goes up by only ~10%

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Multiple Gateways # of Gateways Latency (ms)Cost 50th99thMsgsBytes Latency drops by a further 30-73% But cost doubles or worse

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Iterative Routing # of GatewaysMode Cost 50th99thMsgsBytes 1 Recursive way Iterative Parallel iterative not as cost effective as just using multiple gateways

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Talk Overview Introduction and Motivation How OpenDHT Works The Problem of Slow Nodes Algorithmic Solutions Experimental Results Related Work and Conclusions

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Related Work Google MapReduce –Cluster owned by single company –Could presumably make all nodes equal –Turns out it’s cheaper to just work around the slow nodes instead Accordion –Another take on recursive parallel lookup Other related work in paper

December 13, 2005Sean C. RheaFixing the Embarrassing Slowness of OpenDHT on PlanetLab Conclusions Techniques for reducing get latency –Delay-aware routing is a clear win –Parallelism very fast, but costly –Iterative routing not cost effective OpenDHT get latency is now quite low –Was 150 ms on median, 4+ seconds on 99th –Now under 100 ms on median, 500 ms on 99th –Faster than DNS [Jung et al. 2001]

Thanks! For more information: