1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
OpenDHT: A Public DHT Service Sean C. Rhea UC Berkeley June 2, 2005 Joint work with: Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-Area Cooperative Storage with CFS Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides liberally borrowed from the SOSP 2001 CFS presentation.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Beyond Theory: DHTs in Practice CS Networks Sean C. Rhea April 18, 2005 In collaboration with: Dennis Geels, Brighten Godfrey, Brad Karp, John Kubiatowicz,
A Public DHT Service Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu UC Berkeley and.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Designing a DHT for low latency and high throughput Robert Vollmann P2P Information Systems.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Wide-Area Cooperative Storage with CFS Morris et al. Presented by Milo Martin UPenn Feb 17, 2004 (some slides based on slides by authors)
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Effective Replica Maintenance for Distributed Storage Systems USENIX NSDI’ 06 Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon,
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Beyond Theory: DHTs in Practice CS Distributed Systems Sean C. Rhea April 18, 2005 In collaboration with: Dennis Geels, Brighten Godfrey, Brad Karp,
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CS 268: Lecture 22 (Peer-to-Peer Networks)
EE 122: Peer-to-Peer (P2P) Networks
Peer-to-Peer (P2P) File Systems
CS 162: P2P Networks Computer Science Division
Peer-to-Peer Storage Systems
Chord and CFS Philip Skov Knudsen
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Information Systems Week 12: Naming
Peer-to-Peer Networks
Presentation transcript:

1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA (Presentation based on slides from Robert Morris and Sean Rhea)

2 Outline  Cooperative File System (CFS)  Open DHT

3 Target CFS Uses  Serving data with inexpensive hosts: -open-source distributions -off-site backups -tech report archive -efficient sharing of music node Internet node

4 How to mirror open-source distributions?  Multiple independent distributions -Each has high peak load, low average  Individual servers are wasteful  Solution: aggregate -Option 1: single powerful server -Option 2: distributed service But how do you find the data?

5 Design Challenges  Avoid hot spots  Spread storage burden evenly  Tolerate unreliable participants  Fetch speed comparable to whole-file TCP  Avoid O(#participants) algorithms -Centralized mechanisms [Napster], broadcasts [Gnutella]  CFS solves these challenges

6 CFS Architecture  Each node is a client and a server  Clients can support different interfaces -File system interface -Music key-word search node clientserver node clientserver Internet

7 Client-server interface  Files have unique names  Files are read-only (single writer, many readers)  Publishers split files into blocks  Clients check files for authenticity FS Clientserver Insert file f Lookup file f Insert block Lookup block node server node

8 Server Structure DHash stores, balances, replicates, caches blocks DHash uses Chord [SIGCOMM 2001] to locate blocks DHash Chord Node 1Node 2 DHash Chord

9 Chord Hashes a Block ID to its Successor N32 N10 N100 N80 N60 Circular ID Space Nodes and blocks have randomly distributed IDs Successor: node with next highest ID B33, B40, B52 B11, B30 B112, B120, …, B10 B65, B70 B100 Block ID Node ID

10 DHash/Chord Interface  lookup() returns list with node IDs closer in ID space to block ID -Sorted, closest first server DHash Chord Lookup(blockID)List of finger table with

11 DHash Uses Other Nodes to Locate Blocks N40 N10 N5 N20 N110 N99 N80 N50 N60 N68 Lookup(BlockID=45)

12 Storing Blocks  Long-term blocks are stored for a fixed time -Publishers need to refresh periodically  Cache uses LRU disk: cacheLong-term block storage

13 Replicate blocks at r successors N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Node IDs are SHA-1 of IP Address Ensures independent replica failure

14 Lookups find replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N Lookup(BlockID=17) RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch

15 First Live Successor Manages Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Copy of 17 Node can locally determine that it is the first live successor

16 DHash Copies to Caches Along Lookup Path N40 N10 N5 N20 N110 N99 N80 N60 Lookup(BlockID=45) N50 N RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache

17 Caching at Fingers Limits Load N32 Only O(log N) nodes have fingers pointing to N32 This limits the single-block load on N32

18 Virtual Nodes Allow Heterogeneity  Hosts may differ in disk/net capacity  Hosts may advertise multiple IDs -Chosen as SHA-1(IP Address, index) -Each ID represents a “virtual node”  Host load proportional to # v.n.’s  Manually controlled Node A N60N10N101 Node B N5

19 Why Blocks Instead of Files?  Cost: one lookup per block -Can tailor cost by choosing good block size  Benefit: load balance is simple -For large files -Storage cost of large files is spread out -Popular files are served in parallel

20 Outline  Cooperative File System (CFS)  Open DHT

21 Questions:  How many DHTs will there be?  Can all applications share one DHT?

22 Benefits of Sharing a DHT  Amortizes costs across applications -Maintenance bandwidth, connection state, etc.  Facilitates “bootstrapping” of new applications -Working infrastructure already in place  Allows for statistical multiplexing of resources -Takes advantage of spare storage and bandwidth  Facilitates upgrading existing applications -“Share” DHT between application versions

23 The DHT as a Service K V

24 The DHT as a Service K V OpenDHT

25 The DHT as a Service OpenDHT Clients

26 The DHT as a Service OpenDHT

27 The DHT as a Service OpenDHT What is this interface?

28 It’s not lookup() lookup(k) k What does this node do with it? Challenges: 1.Distribution 2.Security

29 How are DHTs Used? 1. Storage -CFS, UsenetDHT, PKI, etc. 2. Rendezvous -Simple: Chat, Instant Messenger -Load balanced: i3 -Multicast: RSS Aggregation, White Board -Anycast: Tapestry, Coral

30 What about put/get?  Works easily for storage applications  Easy to share -No upcalls, so no code distribution or security complications  But does it work for rendezvous? -Chat? Sure: put(my-name, my-IP) -What about the others?

31 Protecting Against Overuse  Must protect system resources against overuse -Resources include network, CPU, and disk -Network and CPU straightforward -Disk harder: usage persists long after requests  Hard to distinguish malice from eager usage -Don’t want to hurt eager users if utilization low  Number of active users changes over time -Quotas are inappropriate

32 Fair Storage Allocation  Our solution: give each client a fair share -Will define “fairness” in a few slides  Limits strength of malicious clients -Only as powerful as they are numerous  Protect storage on each DHT node separately -Must protect each subrange of the key space -Rewards clients that balance their key choices

33 The Problem of Starvation  Fair shares change over time -Decrease as system load increases time Client 1 arrives fills 50% of disk Client 2 arrives fills 40% of disk Client 3 arrives max share = 10% Starvation!

34 Preventing Starvation  Simple fix: add time-to-live (TTL) to puts -put (key, value)  put (key, value, ttl)  Prevents long-term starvation -Eventually all puts will expire

35 Preventing Starvation  Simple fix: add time-to-live (TTL) to puts -put (key, value)  put (key, value, ttl)  Prevents long-term starvation -Eventually all puts will expire  Can still get short term starvation time Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring B Starves

36 Preventing Starvation  Stronger condition: Be able to accept r min bytes/sec new data at all times  This is non-trivial to arrange! Reserved for future puts. Slope = r min Candidate put TTL size Sum must be < max capacity time space max 0 now

37 Preventing Starvation  Stronger condition: Be able to accept r min bytes/sec new data at all times  This is non-trivial to arrange! TTL size time space max 0 now TTL size time space max 0 now Violation!

38 Preventing Starvation  Formalize graphical intuition: f(  ) = B(t now ) - D(t now, t now +  ) + r min   D(t now, t now +  ): aggregate size of puts expiring in the interval (t now, t now +  )  To accept put of size x and TTL l: f(  ) + x < C for all 0 ≤  < l  Can track the value of f efficiently with a tree -Leaves represent inflection points of f -Add put, shift time are O(log n), n = # of puts

39 Fair Storage Allocation Per-client put queues Queue full: reject put Not full: enqueue put Select most under- represented Wait until can accept without violating r min Store and send accept message to client The Big Decision: Definition of “most under-represented”

40 Defining “Most Under-Represented”  Not just sharing disk, but disk over time -1 byte put for 100s same as 100 byte put for 1s -So units are bytes  seconds, call them commitments  Equalize total commitments granted? -No: leads to starvation -A fills disk, B starts putting, A starves up to max TTL time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves!

41 Defining “Most Under-Represented”  Instead, equalize rate of commitments granted -Service granted to one client depends only on others putting “at same time” time Client A arrives fills entire of disk Client B arrives asks for space B catches up with A A & B share available rate

42 Defining “Most Under-Represented”  Instead, equalize rate of commitments granted -Service granted to one client depends only on others putting “at same time”  Mechanism inspired by Start-time Fair Queuing -Have virtual time, v(t) -Each put gets a start time S(p c i ) and finish time F(p c i ) F(p c i ) = S(p c i ) + size(p c i )  ttl(p c i ) S(p c i ) = max(v(A(p c i )) - , F(p c i-1 )) v(t) = maximum start time of all accepted puts

43 FST Performance