CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 4/1/2013.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
CS 4700 / CS 5700 Network Fundamentals Lecture 19: Overlays (P2P DHT via KBR FTW) Revised 4/1/2013.
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
P2P Course, Structured systems 1 Skip Net (9/11/05)
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Application Layer – Peer-to-peer UIUC CS438: Communication Networks Summer 2014 Fred Douglas Slides: Fred, Kurose&Ross (sometimes edited)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Distributed Hash Tables
CS 3700 Networks and Distributed Systems
CS 4700 / CS 5700 Network Fundamentals
Accessing nearby copies of replicated objects
PASTRY.
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Presentation transcript:

CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 4/1/2013

 Consistent Hashing  Structured Overlays / DHTs Outline 2

Key/Value Storage Service 3  One server is probably fine as long as total pairs < 1M  How do we scale the service as pairs grows?  Add more servers and distribute the data across them put(“ christo”, “abc…”) get(“ christo”) “ abc…”  Imagine a simple service that stores key/value pairs  Similar to memcached or redis

Mapping Keys to Servers 4  Problem: how do you map keys to servers? … ? Keep in mind, the number of servers may change (e.g. we could add a new server, or a server could crash)

Hash Tables 5 hash(key) % n  array index Array (length = n)

(Bad) Distributed Key/Value Service 6 hash(str) % n  array index Array (length = n) IP address of node A IP address of node B IP address of node C IP address of node D IP address of node E Size of network will change Need a “deterministic” mapping As few changes as possible when machines join/leave (length = n + 1) ABCDE k1 k2 k3

Consistent Hashing 7  Alternative hashing algorithm with many beneficial characteristics  Deterministic (just like normal hashing algorithms)  Balanced: given n servers, each server should get roughly 1/n keys  Locality sensitive: if a server is added, only 1/(n+1) keys need to be moved  Conceptually simple  Imagine a circular number line from 0  1  Place the servers at random locations on the number line  Hash each key and place it at the next server on the number line Move around the circle clockwise to find the next server

Consistent Hashing Example 8 hash(str) % 256/256  ring location 0 ABCDE k1 k2 k3 “server A” “server B” “server C” 1 A B C D k1 k2 k3 “server D” “server E” E

Practical Implementation 9  In practice, no need to implement complicated number lines  Store a list of servers, sorted by their hash (floats from 0  1)  To put() or get() a pair, hash the key and search through the list for the nearest server  O(log n) search time  O(log n) time to insert a new server into the list

0 1 B Improvements to Consistent Hashing 10  Problem: hashing may not result in perfect balance (1/n items per server)  Solution: balance the load by hashing each server multiple times consistent_hash(“key1”) = 0.4  Problem: if a server fails, data may be lost  Solution: replicate keys/value pairs on multiple servers A k1 0 1 A B A A B B consistent_hash(“serverA_1”) = … consistent_hash(“serverA_2”) = … consistent_hash(“serverA_3”) = …

Consistent Hashing Summary 11  Consistent hashing is a simple, powerful tool for building distributed systems  Provides consistent, deterministic mapping between names and servers  Often called locality sensitive hashing Ideal algorithm for systems that need to scale up or down gracefully  Many, many systems use consistent hashing  CDNs  Databases: memcached, redis, Voldemort, Dynamo, Cassandra, etc.  Overlay networks (more on this coming up…)

 Consistent Hashing  Structured Overlays / DHTs Outline 12

Layering, Revisited 13 Application Transport Network Data Link Physical Network Data Link Application Transport Network Data Link Physical Host 1 Router Host 2 Physical  Layering hides low level details from higher layers  IP is a logical, point-to-point overlay  ATM/SONET circuits on fibers

Towards Network Overlays 14  IP provides best-effort, point-to-point datagram service  Maybe you want additional features not supported by IP or even TCP  Like what?  Multicast  Security  Reliable, performance-based routing  Content addressing, reliable data storage  Idea: overlay an additional routing layer on top of IP that adds additional features

Example: VPN 15  Virtual Private Network Internet Private Public Dest: Dest: VPN is an IP over IP overlay Not all overlays need to be IP-based

VPN Layering 16 Application Transport Network Data Link Physical Network Data Link Application Transport Network Data Link Physical Host 1 Router Host 2 Physical VPN Network P2P Overlay

Network Layer, version 2? 17  Function:  Provide natural, resilient routes based on keys  Enable new classes of P2P applications  Key challenge:  Routing table overhead  Performance penalty vs. IP Application Network Transport Network Data Link Physical

Unstructured P2P Review 18 What if the file is rare or far away? Redundancy Traffic Overhead Search is broken High overhead No guarantee is will work

Why Do We Need Structure? 19  Without structure, it is difficult to search  Any file can be on any machine  Centralization can solve this (i.e. Napster), but we know how that ends  How do you build a P2P network with structure?  Give every machine and object a unique name  Map from objects  machines Looking for object A? Map(A)  X, talk to machine X Looking for object B? Map(B)  Y, talk to machine Y  Is this starting to sound familiar?

Naïve Overlay Network GoT_s03e04.mkv  P2P file-sharing network  Peers choose random IDs  Locate files by hashing their names hash(“GoT…”) =  Problems?  How do you know the IP addresses of arbitrary peers?  There may be millions of peers  Peers come and go at random

Structured Overlay Fundamentals 21  Every machine chooses a unique, random ID  Used for routing and object location, instead of IP addresses  Deterministic Key  Node mapping  Consistent hashing  Allows peer rendezvous using a common name  Key-based routing  Scalable to any network of size N Each node needs to know the IP of log(N) other nodes Much better scalability than OSPF/RIP/BGP  Routing from node A  B takes at most log(N) hops Advantages Completely decentralized Self organizing Infinitely scalable

Structured Overlays at 10,000ft. 22  Node IDs and keys from a randomized namespace  Incrementally route towards to destination ID  Each node knows a small number of IDs + IPs log(N) neighbors per node, log(N) hops between nodes To: ABCD A930 AB5F ABC0 ABCE Each node has a routing table Forward to the longest prefix match

Structured Overlay Implementations 23  Many P2P structured overlay implementations  Generation 1: Chord, Tapestry, Pastry, CAN  Generation 2: Kademlia, SkipNet, Viceroy, Symphony, Koorde, Ulysseus, …  Shared goals and design  Large, sparse, randomized ID space  All nodes choose IDs randomly  Nodes insert themselves into overlay based on ID  Given a key k, overlay deterministically maps k to its root node (a live node in the overlay)

Details 24  Structured overlay APIs  route(key, msg) : route msg to node responsible for key Just like sending a packet to an IP address  Distributed hash table (DHT) functionality put(key, value) : store value at node/key get(key) : retrieve stored value for key at node  Key questions:  Node ID space, what does it represent?  How do you route within the ID space?  How big are the routing tables?  How many hops to a destination (in the worst case)?

Tapestry/Pastry 25  Node IDs are numbers in a ring  128-bit circular ID space  Node IDs chosen at random  Messages for key X is routed to live node with longest prefix match to X  Incremental prefix routing  1110: 1XXX  11XX  111X  | 0 To: 1110

Physical and Virtual Routing | 0 To:

Tapestry/Pastry Routing Tables 27  Incremental prefix routing  How big is the routing table?  Keep b-1 hosts at each prefix digit  b is the base of the prefix  Total size: b * log b n  log b n hops to any destination |

Routing Table Example 28  Hexadecimal (base-16), node ID = 65a1fc4 Row 0 Row 1 Row 2 Row 3 log 16 n rows Each x is the IP address of a peer

Routing, One More Time 29  Each node has a routing table  Routing table size:  b * log b n  Hops to any destination:  log b n | 0 To: 1110

Pastry Leaf Sets 30  One difference between Tapestry and Pastry  Each node has an additional table of the L/2 numerically closest neighbors  Larger and smaller  Uses  Alternate routes  Fault detection (keep-alive)  Replication of data

Joining the Pastry Overlay Pick a new ID X 2. Contact an arbitrary bootstrap node 3. Route a message to X, discover the current owner 4. Add new node to the ring 5. Download routes from new neighbors, update leaf sets |

Node Departure 32  Leaf set members exchange periodic keep-alive messages  Handles local failures  Leaf set repair:  Request the leaf set from the farthest node in the set  Routing table repair:  Get table from peers in row 0, then row 1, …  Periodic, lazy

DHTs and Consistent Hashing | 0 To: 1101  Mappings are deterministic in consistent hashing  Nodes can leave  Nodes can enter  Most data does not move  Only local changes impact data placement  Data is replicated among the leaf set

Summary of Structured Overlays  A namespace  For most, this is a linear range from 0 to  A mapping from key to node  Tapestry/Pastry: keys belong to node w/ closest identifier  Chord: keys between node X and its predecessor belong to X  CAN: well defined N-dimensional space for each node 34

Summary, Continued  A routing algorithm  Numeric (Chord), prefix-based (Tapestry/Pastry/Chimera), hypercube (CAN)  Routing state: how much info kept per node  Tapestry/Pastry: b * log b N i th column specifies nodes that match i digit prefix, but differ on (i+1) th digit  Chord: Log 2 N pointers i th pointer points to MyID+ ( N * (0.5) i )  CAN: 2*d neighbors for d dimensions 35

Structured Overlay Advantages 36  High level advantages  Complete decentralized  Self-organizing  Scalable and (relatively) robust  Applications  Reliable distributed storage OceanStore (FAST’03), Mnemosyne (IPTPS’02)  Resilient anonymous communication Cashmere (NSDI’05)  Consistent state management Dynamo (SOSP’07)  Many, many others Multicast, spam filtering, reliable routing, services, even distributed mutexes

Trackerless BitTorrent | 0 Torrent Hash: 1101 Tracker Initial Seed Leecher Swarm Initial Seed Tracker Leecher