Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Peer-to-Peer Structured Overlay Networks
Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge,
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Applications over P2P Structured Overlays Antonino Virgillito.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
A Public DHT Service Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu UC Berkeley and.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Peer to Peer Network Design Discovery and Routing algorithms
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Peer-to-Peer Information Systems Week 12: Naming
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Accessing nearby copies of replicated objects
PASTRY.
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Peer-to-Peer Information Systems Week 12: Naming
Peer-to-Peer Networks
Presentation transcript:

Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea

Fall 2006CS 5612 What is Peer-to-Peer About? Distribution Decentralized control Self-organization Symmetric communication P2P networks do two things: –Map objects onto nodes –Route requests to the node responsible for a given object

Fall 2006CS 5613 Pastry Self-organizing overlay network Consistent hashing Lookup/insert object in < log 16 N routing steps (expected) O(log N) per-node state Network locality heuristics

Fall 2006CS 5614 Object Distribution objId Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds O

Fall 2006CS 5615 Object Insertion/Lookup X Route(X) Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible O

Fall 2006CS 5616 Routing: Distributed Hash Table Properties log 16 N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1

Fall 2006CS 5617 Leaf Sets Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination

Fall 2006CS 5618 Routing Procedure if (D is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if ( RouteTab[l,d] exists) forward to RouteTab[l,d] else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

Fall 2006CS 5619 Routing Integrity of overlay: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log 16 N expected, 128/b + 1 max During failure recovery: –O(N) worst case, average case much better

Fall 2006CS Node Addition d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 New node: d46a1c

Fall 2006CS Node Departure (Failure) Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows

Fall 2006CS PAST: Cooperative, archival file storage and distribution Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources

Fall 2006CS PAST API Insert - store replica of a file at k diverse storage nodes Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable

Fall 2006CS PAST: File storage fileId Insert fileId

Fall 2006CS PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) fileId Insert fileId k=4

Fall 2006CS PAST: File Retrieval fileId file located in log 16 N steps (expected) usually locates replica nearest client C Lookup k replicas C

Fall 2006CS IP Chord DHT CFS (MIT) Pastry DHT PAST (MSR/ Rice) Tapestry DHT OStore (UCB) Bamboo DHT PIER (UCB) CAN DHT pSearch (HP) Kademlia DHT Coral (NYU) Chord DHT i 3 (UCB) DHT Deployment Today Kademlia DHT Overnet (open) connectivity Every application deploys its own DHT (DHT as a library)

Fall 2006CS IP Chord DHT CFS (MIT) Pastry DHT PAST (MSR/ Rice) Tapestry DHT OStore (UCB) Bamboo DHT PIER (UCB) CAN DHT pSearch (HP) Kademlia DHT Coral (NYU) Chord DHT i 3 (UCB) Kademlia DHT Overnet (open) DHT connectivity indirection OpenDHT: one DHT, shared across applications (DHT as a service) DHT Deployment Tomorrow?

Fall 2006CS Two Ways To Use a DHT 1.The Library Model –DHT code is linked into application binary –Pros: flexibility, high performance 2.The Service Model –DHT accessed as a service over RPC –Pros: easier deployment, less maintenance

Fall 2006CS The OpenDHT Service Bamboo [USENIX’04] nodes on PlanetLab –All in one slice, all managed by us Clients can be arbitrary Internet hosts –Access DHT using RPC over TCP Interface is simple put/get: –put(key, value) — stores value under key –get(key) — returns all the values stored under key Running on PlanetLab since April 2004 –Building a community of users

Fall 2006CS OpenDHT Applications ApplicationUses OpenDHT for Croquet Media Managerreplica location DOAindexing HIPname resolution DTN Tetherless Computing Architecturehost mobility Place Labrange queries QStreammulticast tree construction VPN Indexindexing DHT-Augmented Gnutella Clientrare object search FreeDBstorage Instant Messagingrendezvous CFSstorage i3i3redirection

Fall 2006CS An Example Application: The CD Database Compute Disc Fingerprint Recognize Fingerprint? Album & Track Titles

Fall 2006CS An Example Application: The CD Database Type In Album and Track Titles Album & Track Titles No Such Fingerprint

Fall 2006CS A DHT-Based FreeDB Cache FreeDB is a volunteer service –Has suffered outages as long as 48 hours –Service costs born largely by volunteer mirrors Idea: Build a cache of FreeDB with a DHT –Add to availability of main service –Goal: explore how easy this is to do

Fall 2006CS Cache Illustration DHT New Albums Disc Fingerprint Disc Info Disc Fingerprint

Fall 2006CS Is Providing DHT Service Hard? Is it any different than just running Bamboo? –Yes, sharing makes the problem harder OpenDHT is shared in two senses –Across applications  need a flexible interface –Across clients  need resource allocation

Fall 2006CS Sharing Between Applications Must balance generality and ease-of-use –Many apps want only simple put/get –Others want lookup, anycast, multicast, etc. OpenDHT allows only put/get –But use client-side library, ReDiR, to build others –Supports lookup, anycast, multicast, range search –Only constant latency increase on average –(Different approach used by DimChord [KR04])

Fall 2006CS Sharing Between Clients Must authenticate puts/gets/removes –If two clients put with same key, who wins? –Who can remove an existing put? Must protect system’s resources –Or malicious clients can deny service to others –The remainder of this talk

Fall 2006CS Fair Storage Allocation Our solution: give each client a fair share –Will define “fairness” in a few slides Limits strength of malicious clients –Only as powerful as they are numerous Protect storage on each DHT node separately –Global fairness is hard –Key choice imbalance is a burden on DHT –Reward clients that balance their key choices

Fall 2006CS Two Main Challenges 1.Making sure disk is available for new puts –As load changes over time, need to adapt –Without some free disk, our hands are tied 2.Allocating free disk fairly across clients –Adapt techniques from fair queuing

Fall 2006CS Making Sure Disk is Available Can’t store values indefinitely –Otherwise all storage will eventually fill Add time-to-live (TTL) to puts –put (key, value)  put (key, value, ttl) –(Different approach used by Palimpsest [RH03])

Fall 2006CS Making Sure Disk is Available TTLs prevent long-term starvation –Eventually all puts will expire Can still get short term starvation: time Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring B Starves

Fall 2006CS Making Sure Disk is Available Stronger condition: Be able to accept r min bytes/sec new data at all times Reserved for future puts. Slope = r min Candidate put TTL size Sum must be < max capacity time space max 0 now

Fall 2006CS Making Sure Disk is Available Stronger condition: Be able to accept r min bytes/sec new data at all times TTL size time space max 0 now TTL size time space max 0 now

Fall 2006CS Fair Storage Allocation Per-client put queues Queue full: reject put Not full: enqueue put Select most under- represented Wait until can accept without violating r min Store and send accept message to client The Big Decision: Definition of “most under-represented”

Fall 2006CS Defining “Most Under-Represented” Not just sharing disk, but disk over time –1-byte put for 100s same as 100-byte put for 1s –So units are bytes  seconds, call them commitments Equalize total commitments granted? –No: leads to starvation –A fills disk, B starts putting, A starves up to max TTL time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves!

Fall 2006CS Defining “Most Under-Represented” Instead, equalize rate of commitments granted –Service granted to one client depends only on others putting “at same time” time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A A & B share available rate

Fall 2006CS Defining “Most Under-Represented” Instead, equalize rate of commitments granted –Service granted to one client depends only on others putting “at same time” Mechanism inspired by Start-time Fair Queuing –Have virtual time, v(t) –Each put gets a start time S(p c i ) and finish time F(p c i ) F(p c i ) = S(p c i ) + size(p c i )  ttl(p c i ) S(p c i ) = max(v(A(p c i )) - , F(p c i-1 )) v(t) = maximum start time of all accepted puts

Fall 2006CS Fairness with Different Arrival Times

Fall 2006CS Fairness With Different Sizes and TTLs

Fall 2006CS Performance Only 28 of 7 million values lost in 3 months –Where “lost” means unavailable for a full hour On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to PL kernel bug, only lost one value

Fall 2006CS Performance Median get latency ~250 ms –Median RTT between hosts ~ 140 ms But 95th percentile get latency is atrocious –And even median spikes up from time to time

Fall 2006CS The Problem: Slow Nodes Some PlanetLab nodes are just really slow –But set of slow nodes changes over time –Can’t “cherry pick” a set of fast nodes –Seems to be the case on RON as well –May even be true for managed clusters (MapReduce) Modified OpenDHT to be robust to such slowness –Combination of delay-aware routing and redundancy –Median now 66 ms, 99th percentile is 320 ms using 2X redundancy