Download presentation
Presentation is loading. Please wait.
1
Peer-to-Peer Networks
Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea Fall 2006 CS 561
2
What is Peer-to-Peer About?
Distribution Decentralized control Self-organization Symmetric communication P2P networks do two things: Map objects onto nodes Route requests to the node responsible for a given object Fall 2006 CS 561
3
Object Distribution Consistent hashing [Karger et al. ‘97] objId
O Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object objId Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. nodeIds Fall 2006 CS 561
4
Object Insertion/Lookup
O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Route(X) Fall 2006 CS 561
5
Routing: Distributed Hash Table
d471f1 d467c4 d462ba d46a1c d4213f Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Route(d46a1c) d13da3 Properties log16 N steps O(log N) state 65a1fc Fall 2006 CS 561
6
Leaf Sets Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination Fall 2006 CS 561
7
Routing Procedure if (D is within range of our leaf set)
forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (RouteTab[l,d] exists) forward to RouteTab[l,d] forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node Fall 2006 CS 561
8
Routing Integrity of overlay:
guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log16 N expected, 128/b + 1 max During failure recovery: O(N) worst case, average case much better Fall 2006 CS 561
9
Node Addition d471f1 d467c4 d462ba d46a1c New node: d46a1c d4213f
Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Route(d46a1c) d13da3 65a1fc Fall 2006 CS 561
10
Node Departure (Failure)
Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows Fall 2006 CS 561
11
PAST: Cooperative, archival file storage and distribution
Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources Fall 2006 CS 561
12
PAST API Insert - store replica of a file at k diverse storage nodes
Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable Fall 2006 CS 561
13
PAST: File storage fileId Insert fileId Fall 2006 CS 561
PAST file storage is mapped onto the Pastry overlay network by maintaing the invariant that replicas of a file are stored on the k nodes that are numerically closest to the file’s numeric fileId. During an insert operation, an insert request for the file is routed using the fileId as the key. The node closest to fileId replicates the file on the k-1 next nearest nodes in then namespace. Insert fileId Fall 2006 CS 561
14
PAST: File storage Storage Invariant: File “replicas” are
fileId Insert fileId k=4 Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) PAST file storage is mapped onto the Pastry overlay network by maintaing the invariant that replicas of a file are stored on the k nodes that are numerically closest to the file’s numeric fileId. During an insert operation, an insert request for the file is routed using the fileId as the key. The node closest to fileId replicates the file on the k-1 next nearest nodes in then namespace. Fall 2006 CS 561
15
PAST: File Retrieval C k replicas Lookup
fileId file located in log16 N steps (expected) usually locates replica nearest client C The last point is shown pictorally here. A lookup request is routed in at most log16 N steps to a node that stores a replica, if one exists. In practice, the node among the k that first receives the message serves the file. Furthermore, network locality properties of Pastry (not discussed in this talk) ensure that this is node is usually the node that is closest to the client in the network !! Fall 2006 CS 561
16
Every application deploys its own DHT
DHT Deployment Today PAST (MSR/ Rice) CFS (MIT) OStore (UCB) pSearch (HP) Coral (NYU) i3 (UCB) PIER (UCB) Overnet (open) Chord DHT Pastry DHT Tapestry DHT CAN DHT Kademlia DHT Chord DHT Bamboo DHT Kademlia DHT Every application deploys its own DHT (DHT as a library) IP connectivity Fall 2006 CS 561
17
DHT Deployment Tomorrow?
PAST (MSR/ Rice) CFS (MIT) OStore (UCB) pSearch (HP) i3 (UCB) PIER (UCB) Overnet (open) Coral (NYU) Chord DHT Pastry DHT Tapestry DHT CAN DHT Kademlia DHT Chord DHT Bamboo DHT Kademlia DHT DHT indirection OpenDHT: one DHT, shared across applications (DHT as a service) IP connectivity Fall 2006 CS 561
18
Two Ways To Use a DHT The Library Model The Service Model
DHT code is linked into application binary Pros: flexibility, high performance The Service Model DHT accessed as a service over RPC Pros: easier deployment, less maintenance Fall 2006 CS 561
19
The OpenDHT Service 200-300 Bamboo [USENIX’04] nodes on PlanetLab
All in one slice, all managed by us Clients can be arbitrary Internet hosts Access DHT using RPC over TCP Interface is simple put/get: put(key, value) — stores value under key get(key) — returns all the values stored under key Running on PlanetLab since April 2004 Building a community of users Fall 2006 CS 561
20
multicast tree construction
OpenDHT Applications Application Uses OpenDHT for Croquet Media Manager replica location DOA indexing HIP name resolution DTN Tetherless Computing Architecture host mobility Place Lab range queries QStream multicast tree construction VPN Index DHT-Augmented Gnutella Client rare object search FreeDB storage Instant Messaging rendezvous CFS i3 redirection Fall 2006 CS 561
21
An Example Application: The CD Database
Compute Disc Fingerprint Recognize Fingerprint? Album & Track Titles Fall 2006 CS 561
22
An Example Application: The CD Database
Type In Album and Track Titles Album & Track Titles No Such Fingerprint Fall 2006 CS 561
23
A DHT-Based FreeDB Cache
FreeDB is a volunteer service Has suffered outages as long as 48 hours Service costs born largely by volunteer mirrors Idea: Build a cache of FreeDB with a DHT Add to availability of main service Goal: explore how easy this is to do Fall 2006 CS 561
24
Cache Illustration DHT New Albums Disc Fingerprint Disc Info
Fall 2006 CS 561
25
Is Providing DHT Service Hard?
Is it any different than just running Bamboo? Yes, sharing makes the problem harder OpenDHT is shared in two senses Across applications need a flexible interface Across clients need resource allocation Fall 2006 CS 561
26
Sharing Between Applications
Must balance generality and ease-of-use Many apps want only simple put/get Others want lookup, anycast, multicast, etc. OpenDHT allows only put/get But use client-side library, ReDiR, to build others Supports lookup, anycast, multicast, range search Only constant latency increase on average (Different approach used by DimChord [KR04]) Fall 2006 CS 561
27
Sharing Between Clients
Must authenticate puts/gets/removes If two clients put with same key, who wins? Who can remove an existing put? Must protect system’s resources Or malicious clients can deny service to others The remainder of this talk Fall 2006 CS 561
28
Fair Storage Allocation
Our solution: give each client a fair share Will define “fairness” in a few slides Limits strength of malicious clients Only as powerful as they are numerous Protect storage on each DHT node separately Global fairness is hard Key choice imbalance is a burden on DHT Reward clients that balance their key choices Fall 2006 CS 561
29
Two Main Challenges Making sure disk is available for new puts
As load changes over time, need to adapt Without some free disk, our hands are tied Allocating free disk fairly across clients Adapt techniques from fair queuing Fall 2006 CS 561
30
Making Sure Disk is Available
Can’t store values indefinitely Otherwise all storage will eventually fill Add time-to-live (TTL) to puts put (key, value) put (key, value, ttl) (Different approach used by Palimpsest [RH03]) Fall 2006 CS 561
31
Making Sure Disk is Available
TTLs prevent long-term starvation Eventually all puts will expire Can still get short term starvation: time Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring B Starves Fall 2006 CS 561
32
Making Sure Disk is Available
Stronger condition: Be able to accept rmin bytes/sec new data at all times time space max now Sum must be < max capacity Reserved for future puts. Slope = rmin Candidate put TTL size Fall 2006 CS 561
33
Making Sure Disk is Available
Stronger condition: Be able to accept rmin bytes/sec new data at all times TTL size time space max now TTL size time space max now Fall 2006 CS 561
34
Fair Storage Allocation
Queue full: reject put Per-client put queues Wait until can accept without violating rmin Select most under- represented Store and send accept message to client Not full: enqueue put The Big Decision: Definition of “most under-represented” Fall 2006 CS 561
35
Defining “Most Under-Represented”
Not just sharing disk, but disk over time 1-byte put for 100s same as 100-byte put for 1s So units are bytes seconds, call them commitments Equalize total commitments granted? No: leads to starvation A fills disk, B starts putting, A starves up to max TTL time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves! Fall 2006 CS 561
36
Defining “Most Under-Represented”
Instead, equalize rate of commitments granted Service granted to one client depends only on others putting “at same time” time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A A & B share available rate Fall 2006 CS 561
37
Defining “Most Under-Represented”
Instead, equalize rate of commitments granted Service granted to one client depends only on others putting “at same time” Mechanism inspired by Start-time Fair Queuing Have virtual time, v(t) Each put gets a start time S(pci) and finish time F(pci) F(pci) = S(pci) + size(pci) ttl(pci) S(pci) = max(v(A(pci)) - , F(pci-1)) v(t) = maximum start time of all accepted puts Fall 2006 CS 561
38
Fairness with Different Arrival Times
Fall 2006 CS 561
39
Fairness With Different Sizes and TTLs
Fall 2006 CS 561
40
Performance Only 28 of 7 million values lost in 3 months
Where “lost” means unavailable for a full hour On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to PL kernel bug, only lost one value Fall 2006 CS 561
41
Performance Median get latency ~250 ms
Median RTT between hosts ~ 140 ms But 95th percentile get latency is atrocious And even median spikes up from time to time Fall 2006 CS 561
42
The Problem: Slow Nodes
Some PlanetLab nodes are just really slow But set of slow nodes changes over time Can’t “cherry pick” a set of fast nodes Seems to be the case on RON as well May even be true for managed clusters (MapReduce) Modified OpenDHT to be robust to such slowness Combination of delay-aware routing and redundancy Median now 66 ms, 99th percentile is 320 ms using 2X redundancy Fall 2006 CS 561
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.