1 CS 268: Lecture 21 DHT Geometry and DHT Applications.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
1 CS 194: Distributed Systems DHT Applications: What and Why Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
The Impact of DHT Routing Geometry on Resilience and Proximity New DHTs constantly proposed –CAN, Chord, Pastry, Tapestry, Plaxton, Viceroy, Kademlia,
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
The Impact of DHT Routing Geometry on Resilience and Proximity Krishna Gummadi, Ramakrishna Gummadi, Sylvia Ratnasamy, Steve Gribble, Scott Shenker, Ion.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
The Impact of DHT Routing Geometry on Resilience and Proximity Krishna Gummadi, Ramakrishna Gummadi, Sylvia Ratnasamy, Steve Gribble, Scott Shenker, Ion.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
The Impact of DHT Routing Geometry on Resilience and Proximity K. Gummadi, R. Gummadi..,S.Gribble, S. Ratnasamy, S. Shenker, I. Stoica.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
Distributed Hash Tables
Peer-to-Peer Data Management
CHAPTER 3 Architectures for Distributed Systems
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Consistent Hashing and Distributed Hash Table
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

1 CS 268: Lecture 21 DHT Geometry and DHT Applications

2 Outline for Today’s Lecture  What is a DHT? (review)  DHT Geometry  Three classes of DHT applications (with examples): -rendezvous -storage -routing  Why DHTs?

3 K V A DHT in Operation: Peers

4 K V A DHT in Operation: Overlay

5 K V A DHT in Operation: put() put(K 1,V 1 )

6 K V A DHT in Operation: put()

7 (K 1,V 1 ) K V A DHT in Operation: put()

8 get(K 1 ) K V A DHT in Operation: get()

9 get(K 1 ) K V A DHT in Operation: get()

10 Fundamental Requirement  All puts and gets for a particular key must end up at the same machine -Even in the presence of failures and new nodes (churn)  This depends on the DHT routing algorithm (last time) -Must be robust and scalable

11 DHT Geometry

12 Making Sense of the Mess  There are (too) many DHT routing algorithms  They are very hard to compare -The basic idea drowning in idiosyncratic system details one does caching, another doesn’t, etc. -Each is evaluated in terms of basic performance numbers leads to “black-box” comparison of entire system, not of basic algorithm  Goal: -Separate fundamental design choice from system details -Understand some basic differences between DHT routing schemes

13 Three Aspects of a DHT Design  Geometry: the basic graph structure -Tree, hypercube, ring, butterfly De Brujn  Distance function: metric imposed on geometry -d(id1,id2) measures how far apart two identifiers are  Algorithm: rules for selection -neighbors -routes

14 Two Fundamental Properties  Flexibility of neighbor selection: (FNS) -number of possible neighbors in a geometry  Flexibility of route selection: (FRS) -average number of next-hop choices for all destinations  Fine print: -select neighbors so that paths are still O(log N) -select routes so that each hop is closer (by metric) to destination

15 Basic Insight  Geometry determines flexibility  Flexibility determines performance -FRS and FNS lead to shorter paths -FRS leads to more reliable paths

16 Chord DHT has Ring Geometry

17 Chord Distance function captures Ring  Nodes are points on a clock-wise Ring  d(id1, id2) = length of clock-wise arc between ids = (id2 – id1) mod N d(100, 111) =

18 Neighbor selection flexibility allowed by Ring Geometry  Chord algorithm picks i th neighbor at 2 i distance  A different algorithm picks i th neighbor from [2 i, 2 i+1 )

19 Route selection flexibility allowed by Ring Geometry  Chord algorithm picks neighbor closest to destination  A different algorithm picks the best of alternate paths

20 Geometries GeometryAlgorithm RingChord, Symphony HypercubeCAN TreePlaxton Hybrid = Tree + Ring Tapestry, Pastry XOR d(id1, id2) = id1 XOR id2 Kademlia

21 Summary of Flexibility Analysis FlexibilityOrdering of Geometries Neighbors (FNS) Hypercube << Tree, XOR, Ring, Hybrid (logN) (2 i-1 ) Routes (FRS) Tree << XOR, Hybrid < Hypercube < Ring (1) (logN/2) (logN/2) (logN) How relevant is flexibility for DHT routing performance?

22 Analysis of Static Resilience Two aspects of robust routing  Dynamic Recovery : how quickly routing state is recovered after failures  Static Resilience : how well the network routes before recovery finishes -captures how quickly recovery algorithms need to work -depends on FRS Evaluation:  Fail a fraction of nodes, without recovering any state  Metric: % Paths Failed

23 Does flexibility affect Static Resilience? Tree << XOR ≈ Hybrid < Hypercube < Ring Flexibility in Route Selection matters for Static Resilience

24 Analysis of Overlay Path Latency  Goal: Minimize end-to-end overlay path latency -not just the number of hops  Both FNS and FRS can reduce latency -Tree has FNS, Hypercube has FRS, Ring & XOR have both Evaluation:  Using Internet latency distributions (see paper)

25 Which is more effective, FNS or FRS? Plain << FRS << FNS ≈ FNS+FRS Neighbor Selection is much better than Route Selection

26 Does Geometry affect performance of FNS or FRS? No, performance of FNS/FRS is independent of Geometry A Geometry’s support for neighbor selection is crucial

27 Summary of Results  FRS matters for Static Resilience -Ring has the best resilience  Both FNS and FRS reduce Overlay Path Latency  But, FNS is far more important than FRS -Ring, Hybrid, Tree and XOR have high FNS

28 Conclusion  Routing Geometry is a fundamental design choice -Geometry determines flexibility -Flexibility improves resilience and proximity  Ring has the greatest flexibility

29 DHT Applications

30 Two Important Distinctions  When talking about DHTs, must be clear whether you mean -Peers vs Infrastructure -Library vs Service

31 Peers or Infrastructure  Peer: -Application users provide nodes for DHT -Example: music sharing, cooperative web cache -Easier to get, less well behaved  Infrastructure: -Set of managed nodes provide DHT service -Perhaps serve many applications -Example: Planetlab -Harder to get, but more reliable

32 Library or Service  Library: DHT code bundled into application -Runs on each node running application -Each application requires own routing infrastructure -Allows customization of interface -Very flexible, but much duplication  Service: single DHT shared by applications -Requires common infrastructure -But eliminates duplicate routing systems -Harder to get, and much less flexible, but easier on each individual app

33 DHTs vs Unstructured P2P  DHTs good at: -exact match for “rare” items  DHTs bad at: -keyword search, etc. [can’t construct DHT-based Google] -tolerating extreme churn  Gnutella etc. good at: -general search -finding common objects -very dynamic environments  Gnutella etc. bad at: -finding “rare” items

34 Three Classes of DHT Applications Rendezvous, Storage, and Routing

35 Rendezvous Applications  Consider a pairwise application like telephony  If A wants to call B (using the Internet), A can do the following: -A looks up B’s “phone number” (IP address of current machine) -A’s phone client contacts B’s phone client  What is needed is a way to “look up” where to contact someone, based on a username or some other global identifier

36 Using DHT for Rendezvous  Each person has a globally unique key (say 128 bits) -Can be hash of a unique name, or something else  Each client (telephony, chat, etc.) periodically stores the IP address (and other metadata) describing where they can be contacted -This is stored using their unique key  When A wants to “call” B, it first does a get on B’s key

37 Key Point  The key (or identifier) is globally unique and static  The DHT infrastructure is used to store the mapping between that static (persistent) identifier and the current location -DHT functions as a dynamic and flat DNS  This can handle: -IP mobility -Chat -Internet telephony -DNS -The Web!

38 Using DHTs for the Web Oversimplified:  Name data with key  Store IP address of file server(s) holding data -replication trivial!  To get data, lookup key  If want CDN-like behavior, make sure IP address handed back is close to requester (several ways to do this)

39 Three Classes of DHT Applications Rendezvous, Storage, and Routing

40 Storage Applications  Rendezvous applications use the DHT only to store small pointers (IP addresses, etc.)  What about using DHTs for more serious storage, such as file systems? -Storage, bandwidth and availability all scale with DHT

41 Example: CFS (DHash over Chord)  Goal: serve a read-only file system -files can be modified by owner  Publisher inserts file system into DHT  CFS client looks like an NFS file system: -/cfs/7ff23bda0092  CFS client fetches data from the DHT

42 Root Directory File3Dir2File1 A “pointer”: Root contains DHT key of Directory Directory block contains filename/blockID pairs CFS Uses Tree of Blocks

43 CFS Uses Self-authentication Immutable block: (Content-Hash Block) key = CryptographicHash(value) encourages data sharing! Mutable block: (Public-key Block) key = K pub value = data + Sign[data] K priv

44 Root Directory File3Dir2File1 Mutable block Immutable blocks This is a single-writer mutable data structure Most Blocks are Immutable

45 Root Directory File3Dir2File1 File4 Directory v2 Mutable block Immutable blocks Adding a File to a Directory

46  DHash replicates each key/value pair at the nodes after it on the circle  It’s easy to find replicas  Put(k,v) to all  Get(k) from closest N32 N10 N5 N110 N99 N80 N60 N20 K19 N40 K19 Data Availability via Replication

47 N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 19 N68 Copy of 19 First Live Successor Manages Replicas

48 Example: OceanStore  Project here at Berkeley: the uber file system  Many facets to design, won’t cover them here  Main contrast with CFS here: -CFS: stores files -OceanStore: stores pointers, deals with locality

49 Copy1 K V OceanStore Finding Replicas Root(k) Copy2 Copy 3

50 Copy1 K V OceanStore Finding Replicas Root(k) Copy2 Request Copy 3

51 Example: Usenet over a DHT  Bulletin board (started in 1981) -Has grown exponentially in volume volume is 1.4 Terabyte/day  Hosting full Usenet has high costs -Large storage requirement -Bandwidth required: OC3+ (  $30,000/month)  Only 50 sites with full feed  Goal: save Usenet news by reducing needed storage and bandwidth

52 User posts article to local server Server exchanges headers & article w. peers Headers allow sorting into newsgroups S3 S1 S4 S2 Posting a Usenet Article

53 UsenetDHT  Store article in shared DHT  Only “single” copy of Usenet needed  Can scale DHT to handle increased volume  Incentive for ISPs: cut external bandwidth by providing high-quality hosting for local DHT server

54 Server writes article to DHT Server exchanges headers only All servers know about each article S3 S1 S4 S2 DHT Usenet Architecture User posts article to local server

55 UsenetDHT Tradeoff  Distribute headers as before: -clients have local access to headers  Bodies held in global DHT -only accessed when read -greater latency, lower overhead

56 UsenetDHT: potential savings  Suppose 300 site network  Each site reads 1% of all articles Net bandwidth Storage Usenet UsenetDHT 12 Megabyte/s 10 Terabyte/week 120 Kbyte/s60 Gbyte/week

57 Example: (ePost)  Mission critical application -required extreme use of erasure codes (99.99% recovery from 15% of the data)  Approach: -Mail split into components and stored independently (by content) multiple copies stored only once -Folders represented by logs each entry stored separately, in a chained data structure  User associated with multicast group: -When user is online, announces location and data delivered

58 Three Classes of DHT Applications Rendezvous, Storage, and Routing

59 “Routing” Applications  Application-layer multicast  Video streaming  Event notification systems ...

60 DHT-Based Multicast  Application-layer, not IP layer  Single-source, not any-source multicast  Easy to extend to anycast

61 Tree Formation  Group is associated with key  “root” of group is node that owns key  Any node that wants to join sends message to root, leaving forwarding state along path  Message stops when it hits existing state for group  Data sent from root reaches all nodes

62 K V Multicast Root(k)

63 K V Multicast Join Join(k) Root(k)

64 Join(k) K V Multicast Join Root(k)

65 Join(k) K V Multicast Join Root(k) Join(k)

66 Join(k) K V Multicast Send Root(k) Join(k)

67 Challenges  Repairing tree  Balancing duties among peers  Low-latency routing (proximity-based DHT routing)

68 Example: Internet-Scale Query Processing  Superficial motivation: -Database joins implemented with hash tables so... -Distributed joins can be implemented with DHTs -Scaling: latency O(log n) while computation O(n) K1A B C D K2E A F A K1A K2A A Put(A,..)

69 PIER  Range of operators -Joins, aggregation (routing!), recursive, continuous queries  Intended targets: -Data “in the wild” (filesharing, net monitoring, etc.) -No need for ACID semantics, just best-effort  Future: more sophisticated queries -Range searches, etc. -Prefix Hash Tree

70 What’s the Fuss about DHTs?

71 Distributed Systems Pre-Internet  Connected by LANs (low loss and delay)  Small scale (10s, maybe 100s per server)  PODC literature focused on algorithms to achieve strict semantics in the face of failures -Two-phase commits -Synchronization -Byzantine agreement -Etc.

72 Distributed Systems Post-Internet  Very different context: -Huge scales (thousands if not millions) -Highly variable connectivity -Failures common -Organic growth  Abandoned distributed strict semantics -Adaptive apps rather than “guaranteed” infrastructure  Adopted pairwise client-server approach -Server is centralized (even if server farm) -Relatively primitive approach (no sophisticated dist. algms.) -Little support from infrastructure or middleware

73 Problems with Centralized Server Farms  Weak availability: -Susceptible to point failures and DoS attacks  Management overhead -Data often manually partitioned to obtain scale -Management and maintenance large fraction of cost  Per-application design (e.g., GoogleOS) -High hurdle for new applications  Don’t leverage the advent of powerful clients - Limits scalability and availability

74 The DHT Community’s Goal Produce a common infrastructure that will help solve these problems by being:  Robust in the face of failures and attacks -Availability solved  Self-configuring and self-managing -Management overhead reduced  Usable for a wide variety of applications -No per-application design  Able to support very large scales, with no assumptions about locality, etc. -No scaling limits, few restrictive assumptions

75 The Strategy Define an interface for this infrastructure that is:  Generally useful for a wide variety of applications -So many applications can leverage this work -Research: building on the interface  Can be supported by a robust, self-configuring, widely- distributed infrastructure -Addressing the many problems raised before -Research: supporting the interface

76 Hourglass Analogy Interface Applications Infrastructure Algorithms

77 Two Crucial Design Decisions 1. Technology for infrastructure: P2P Take advantage of powerful clients Decentralized Nodes can be desktop machines or server quality 2. Choice of interface: Lookup and Hash Table Lookup(key) returns IP of host that “owns” key Put()/Get() standard HT interface Some flexibility in interface (no strict layers)

78 Two Crucial Design Decisions 1. Technology for infrastructure: P2P -P2P=symmetric and self-organizing -Take advantage of powerful clients -Robust 2. Choice of interface: Lookup and Hash Table -Lookup(key) returns IP of host that “owns” key -Put()/Get(): standard HT interface -Proven to be useful programming interfaces

79 DHT Layering Distributed hash table Distributed application get (key) data node …. put(key, data) Lookup service lookup(key)node IP address Application may be distributed over many nodes DHT distributes data storage over many nodes

80 Scenarios for DHT Usage Places where DHTs might have a competitive advantage

81 Scenario #1: Public Infrastructure  Consider CiteSeer or other nonprofit systems: -Service is very valuable to community -No source of revenue  How can it expand? -Not enough support for expanding centralized facility -But many institutions would donate remote use of their local machines  System problem: -Coordinating donated distributed infrastructure

82 The DHT Approach  DHTs are well-suited to such settings -Inherently distributed with general interface -Naturally provides rendezvous and data sharing  Developers can focus on how to layer app on top of DHT library -Resilience, scaling, all taken care of by DHT  Typical assumption for important services: -Server-like nodes with good network access

83 Examples  CiteSeer -Replicate current service (OverCite), but with 10x performance improvement -Use additional capacity to provide new features (e.g., SmartSeer’s alerts)  Cooperative CDNs -Coral allows universities to collaboratively handle “slashdot” workloads -Operational today with many users  UsenetDHT -Allows cooperative institutions to share bandwidth load -Operational system with small feed running

84 Scenario #2: Scaling Enterprise Apps  Enterprises rely on several crucial services - , backup, file storage  These services must be -Scalable -Robust -Easy to deploy -Easy to manage -Inexpensive

85 The DHT approach  Build all services on DHT interface  DHT infrastructure: -Scalable (just add nodes, need not be local) -Robust -Easy to deploy -Easy to manage -Exploits inexpensive commodity components

86 Examples  -ePOST (Rice)  Backup -MIT  File storage -OceanStore

87 Scenario #3: Supporting Tiny Apps  Many apps could use DHT interface, but are too small to deploy one themselves -Small: user population, importance, etc.  Such an application could use a DHT service  OpenDHT is a public DHT service -Lecture on this next week...

88 Scenario #4: Super-Resilence  DHTs are a natural way to build super-resilient services  DHTs would be a natural candidate for the next generation name service, or other such crucial pieces of the infrastructure

89 Not Just for Applications  DHTs resolve flat names scalably -We haven’t been able to do this before  How would we redesign the Internet, now that we can resolve flat names?