Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea.

Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea

Fall 2006CS 5612 What is Peer-to-Peer About? Distribution Decentralized control Self-organization Symmetric communication P2P networks do two things: –Map objects onto nodes –Route requests to the node responsible for a given object

Fall 2006CS 5613 Pastry Self-organizing overlay network Consistent hashing Lookup/insert object in < log 16 N routing steps (expected) O(log N) per-node state Network locality heuristics

Fall 2006CS 5614 Object Distribution objId Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds O 2 128 - 1

Fall 2006CS 5615 Object Insertion/Lookup X Route(X) Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible O2 128 - 1

Fall 2006CS 5616 Routing: Distributed Hash Table Properties log 16 N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1

Fall 2006CS 5617 Leaf Sets Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination

Fall 2006CS 5618 Routing Procedure if (D is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if ( RouteTab[l,d] exists) forward to RouteTab[l,d] else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

Fall 2006CS 5619 Routing Integrity of overlay: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log 16 N expected, 128/b + 1 max During failure recovery: –O(N) worst case, average case much better

Fall 2006CS 56110 Node Addition d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 New node: d46a1c

Fall 2006CS 56111 Node Departure (Failure) Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows

Fall 2006CS 56112 PAST: Cooperative, archival file storage and distribution Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources

Fall 2006CS 56113 PAST API Insert - store replica of a file at k diverse storage nodes Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable

Fall 2006CS 56114 PAST: File storage fileId Insert fileId

Fall 2006CS 56115 PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) fileId Insert fileId k=4

Fall 2006CS 56116 PAST: File Retrieval fileId file located in log 16 N steps (expected) usually locates replica nearest client C Lookup k replicas C

Fall 2006CS 56117 IP Chord DHT CFS (MIT) Pastry DHT PAST (MSR/ Rice) Tapestry DHT OStore (UCB) Bamboo DHT PIER (UCB) CAN DHT pSearch (HP) Kademlia DHT Coral (NYU) Chord DHT i 3 (UCB) DHT Deployment Today Kademlia DHT Overnet (open) connectivity Every application deploys its own DHT (DHT as a library)

Fall 2006CS 56118 IP Chord DHT CFS (MIT) Pastry DHT PAST (MSR/ Rice) Tapestry DHT OStore (UCB) Bamboo DHT PIER (UCB) CAN DHT pSearch (HP) Kademlia DHT Coral (NYU) Chord DHT i 3 (UCB) Kademlia DHT Overnet (open) DHT connectivity indirection OpenDHT: one DHT, shared across applications (DHT as a service) DHT Deployment Tomorrow?

Fall 2006CS 56119 Two Ways To Use a DHT 1.The Library Model –DHT code is linked into application binary –Pros: flexibility, high performance 2.The Service Model –DHT accessed as a service over RPC –Pros: easier deployment, less maintenance

Fall 2006CS 56120 The OpenDHT Service 200-300 Bamboo [USENIX’04] nodes on PlanetLab –All in one slice, all managed by us Clients can be arbitrary Internet hosts –Access DHT using RPC over TCP Interface is simple put/get: –put(key, value) — stores value under key –get(key) — returns all the values stored under key Running on PlanetLab since April 2004 –Building a community of users

Fall 2006CS 56121 OpenDHT Applications ApplicationUses OpenDHT for Croquet Media Managerreplica location DOAindexing HIPname resolution DTN Tetherless Computing Architecturehost mobility Place Labrange queries QStreammulticast tree construction VPN Indexindexing DHT-Augmented Gnutella Clientrare object search FreeDBstorage Instant Messagingrendezvous CFSstorage i3i3redirection

Fall 2006CS 56122 An Example Application: The CD Database Compute Disc Fingerprint Recognize Fingerprint? Album & Track Titles

Fall 2006CS 56123 An Example Application: The CD Database Type In Album and Track Titles Album & Track Titles No Such Fingerprint

Fall 2006CS 56124 A DHT-Based FreeDB Cache FreeDB is a volunteer service –Has suffered outages as long as 48 hours –Service costs born largely by volunteer mirrors Idea: Build a cache of FreeDB with a DHT –Add to availability of main service –Goal: explore how easy this is to do

Fall 2006CS 56125 Cache Illustration DHT New Albums Disc Fingerprint Disc Info Disc Fingerprint

Fall 2006CS 56126 Is Providing DHT Service Hard? Is it any different than just running Bamboo? –Yes, sharing makes the problem harder OpenDHT is shared in two senses –Across applications  need a flexible interface –Across clients  need resource allocation

Fall 2006CS 56127 Sharing Between Applications Must balance generality and ease-of-use –Many apps want only simple put/get –Others want lookup, anycast, multicast, etc. OpenDHT allows only put/get –But use client-side library, ReDiR, to build others –Supports lookup, anycast, multicast, range search –Only constant latency increase on average –(Different approach used by DimChord [KR04])

Fall 2006CS 56128 Sharing Between Clients Must authenticate puts/gets/removes –If two clients put with same key, who wins? –Who can remove an existing put? Must protect system’s resources –Or malicious clients can deny service to others –The remainder of this talk

Fall 2006CS 56129 Fair Storage Allocation Our solution: give each client a fair share –Will define “fairness” in a few slides Limits strength of malicious clients –Only as powerful as they are numerous Protect storage on each DHT node separately –Global fairness is hard –Key choice imbalance is a burden on DHT –Reward clients that balance their key choices

Fall 2006CS 56130 Two Main Challenges 1.Making sure disk is available for new puts –As load changes over time, need to adapt –Without some free disk, our hands are tied 2.Allocating free disk fairly across clients –Adapt techniques from fair queuing

Fall 2006CS 56131 Making Sure Disk is Available Can’t store values indefinitely –Otherwise all storage will eventually fill Add time-to-live (TTL) to puts –put (key, value)  put (key, value, ttl) –(Different approach used by Palimpsest [RH03])

Fall 2006CS 56132 Making Sure Disk is Available TTLs prevent long-term starvation –Eventually all puts will expire Can still get short term starvation: time Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring B Starves

Fall 2006CS 56133 Making Sure Disk is Available Stronger condition: Be able to accept r min bytes/sec new data at all times Reserved for future puts. Slope = r min Candidate put TTL size Sum must be < max capacity time space max 0 now

Fall 2006CS 56134 Making Sure Disk is Available Stronger condition: Be able to accept r min bytes/sec new data at all times TTL size time space max 0 now TTL size time space max 0 now

Fall 2006CS 56135 Fair Storage Allocation Per-client put queues Queue full: reject put Not full: enqueue put Select most under- represented Wait until can accept without violating r min Store and send accept message to client The Big Decision: Definition of “most under-represented”

Fall 2006CS 56136 Defining “Most Under-Represented” Not just sharing disk, but disk over time –1-byte put for 100s same as 100-byte put for 1s –So units are bytes  seconds, call them commitments Equalize total commitments granted? –No: leads to starvation –A fills disk, B starts putting, A starves up to max TTL time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves!

Fall 2006CS 56137 Defining “Most Under-Represented” Instead, equalize rate of commitments granted –Service granted to one client depends only on others putting “at same time” time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A A & B share available rate

Fall 2006CS 56138 Defining “Most Under-Represented” Instead, equalize rate of commitments granted –Service granted to one client depends only on others putting “at same time” Mechanism inspired by Start-time Fair Queuing –Have virtual time, v(t) –Each put gets a start time S(p c i ) and finish time F(p c i ) F(p c i ) = S(p c i ) + size(p c i )  ttl(p c i ) S(p c i ) = max(v(A(p c i )) - , F(p c i-1 )) v(t) = maximum start time of all accepted puts

Fall 2006CS 56139 Fairness with Different Arrival Times

Fall 2006CS 56140 Fairness With Different Sizes and TTLs

Fall 2006CS 56141 Performance Only 28 of 7 million values lost in 3 months –Where “lost” means unavailable for a full hour On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to PL kernel bug, only lost one value

Fall 2006CS 56142 Performance Median get latency ~250 ms –Median RTT between hosts ~ 140 ms But 95th percentile get latency is atrocious –And even median spikes up from time to time

Fall 2006CS 56143 The Problem: Slow Nodes Some PlanetLab nodes are just really slow –But set of slow nodes changes over time –Can’t “cherry pick” a set of fast nodes –Seems to be the case on RON as well –May even be true for managed clusters (MapReduce) Modified OpenDHT to be robust to such slowness –Combination of delay-aware routing and redundancy –Median now 66 ms, 99th percentile is 320 ms using 2X redundancy

Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea.

Similar presentations

Presentation on theme: "Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea.

Similar presentations

Presentation on theme: "Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea."— Presentation transcript:

Similar presentations

About project

Feedback