Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu.

Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu

Introduction In building OceanStore, worked on Tapestry –Found Tapestry problem harder than expected –Main problem: handling churn Built Bamboo to be churn-resilient from start –Working by 6/2003 –Rejected from NSDI in 9/2003 –Released code in 12/2003, about 10 groups using –Accepted to USENIX, 6/2004

Introduction (con’t.) Intended to Bamboo to be general, reusable –Supports Common API for DHTs –Tens of DHT applications proposed in literature –Still very few in common use--why? One possible barrier: deployment –Need access to machines, not everyone on PL –Must monitor, restart individual processes –Takes about an hour/day minimum right now

Simple DHT Applications Many uses of DHTs very simple: just put/get –Don’t use Common API [Dabek et al.] –No routing, no upcalls, etc. Examples: –Dynamic DNS –FreeDB In general: use DHT as highly available cache or rendezvous service Should be able to share a single DHT deployment

Sophisticated DHT Applications Other functionality of DHTs is lookup –Map identifiers to application nodes efficiently –Used by most sophisticated applications i3, OceanStore, SplitStream Can implement lookup on put/get –Algorithm called ReDiR, IPTPS paper this year Sophisticated applications could also share a single DHT deployment

OpenHash: a Public DHT Service Idea: Public DHT to amortize deployment effort –Very low barrier to entry for simple applications –Amortize bandwidth cost for sophisticated apps Challenges –Economics –Security –Resource Allocation

Overview Introduction OpenHash interface, assumptions Resource allocation –Goals/Problem formalization –Rate-limiting puts –Fair sharing Discussion

OpenHash Interface/Assumptions Want to keep things simple for clients –Remember goal: low barrier to entry Simple put/get –put (key, value, time-to-live) –get (key) Service contract: –Puts accepted/rejected immediately (not queued) –Once accepted, put values available for whole TTL Predictable, zero-effort availability for clients –After that, will be thrown out by DHT Easy garbage collection, also valuable for some apps

Resource Allocation Introduction Problem: disk space is limited –If service popular, may exhaust –Malicious clients might exhaust on purpose Rough goal: every client gets fair share of store –Ideally, algorithm should be work-conserving Example: –Three clients: A, B, and C; 10 GB of total space –A and B want 1 GB each, C wants 20 GB –A and B should get 1 GB each; C should get 8 GB

Problem Simplification For now, shares calculated per-DHT node –Global fair sharing saved for future work Clients that balance puts won’t notice a problem –Most DHT applications already balance puts –Apps that can choose their keys can do even better Side benefit: encourages balancing puts –Mitigates need for load balancing in DHT Let the users handle load balancing –Easier for us to implement!

Problem Formalization (First Try) C - total available storage s i - storage desired by client i, S =  s i s fair - fair share such that C =  min(s i, s fair ) g i - storage granted to client i, G -  g i Goals –Fairness:  i g i = min(s i, s fair ) –Utilization: G = min(C, S)

Problem Formulation (Second Try) Previous version didn’t account for time –Can only remove stored values as TTLs expire –As such, can only adapt so quickly –Before accepting one put, another must expire Add goal: always accept puts at rate  R –Prefer puts from underrepresented clients –Intuition: R bounds time it takes to correct unfairness New questions: –How to guarantee space frees up at rate  R? –How to divide R among clients?

Accepting At Rate  R S(t) - total data stored at time t A(t 1, t 2 ) - data added to system in [t 1, t 2 ) D(t 1, t 2 ) - data freed in [t 1, t 2 ) For adaptivity, need:A(t, t+∆t)  R  ∆t Capacity limit: S(t) + A(t, t+∆t) - D(t, t+∆t)  C –Rearrange: C + D(t, t+∆t) - S(t)  A(t, t+∆t) Combined with top eqn: C + D(t, t+∆t) - S(t)  R  ∆t –Rearrange: D(t, t+∆t)  R  ∆t - C + S(t) Result: can accept any put that won’t make us violate this equation at any point in the future

Implementing Rate Limiting Before accepting put, must check D(t, t+∆t) –Can we check this efficiently? Easy, assuming all puts have same TTL –Can implement using a virtual “pipe” –Pipe is TTL long, total capacity C –New puts go into pipe, expire on exit –Can easily show pipe is optimal for this case With varying TTLs, problem harder –Puts with short TTLs expire in middle of pipe –Bin-packing problem on new puts: find latest spot in pipe that satisfies desired size and TTL

Choosing Puts for Fair Sharing Assume can accept new puts at rate  R –How do we divide it up between clients? Unlike fair queuing, two competing goals: 1.Want to make decisions (put/reject) quickly –In FQ, may queue for a long time before fowarding 2.Suffer consequences of decisions for full TTL –In FQ, only interested in fairness over short window But one big advantage: long history –Remember all puts whose TTLs haven’t expired

The Rate-Based Approach Accept based on recent put rates –Already storing all puts, so also store rates –(Could estimate these as in Approx. Fair Drop.) –Basically, fair share the input rate R Pros: –Easy to implement –If all clients put at uniform rates, gives fair stores Cons: –To get fair share, must put at uniform rate –What about bursty clients (avg. rate << max. rate)?

The Storage-Based Approach Accept puts based on amount of storage used –Keep counters of storage used by each client –Prefer new puts from clients with less data on disk Pros: –Also easy to implement –Gives fair stores regardless of uniformity of client put rates Cons: –Over-represented clients block on under-represented ones –Could be very disruptive as new clients enter system

The Commitment-Based Approach Base fairness around “commitments” –How many bytes stored for how much more time –New bytes entail more future commitment than old Pros: –Better at bursts than rate-based approach –Better at not blocking over-represented clients than storage-based approach Cons: –Hard to think about in detail, hard to implement?

Related Work Various fair queuing techniques –Standard FQ –Approximate Fair Dropping –CSFQ Other DHT work –Palimpsest Other networking work –Internet backplane

Discussion What is the optimal rate limiting algorithm? –How close to our various schemes come to it? What’s the right model for sharing? –Rate-based approach? –Storage-based approach? –Commitment-based approach? –Some hybrid? –Lottery Scheduling? What other models make sense? –Palimpsest?

Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu.

Similar presentations

Presentation on theme: "Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu.

Similar presentations

Presentation on theme: "Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu."— Presentation transcript:

Similar presentations

About project

Feedback