Kademlia: A Peer-to-peer Information System Based on the XOR Metric

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Kademlia: A Peer-to-peer Information System Based on the XOR Metric.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Road Map Application basics Web FTP DNS P2P DHT.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
 A P2P IRC Network Built on Top of the Kademlia Distributed Hash Table.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Kademlia A Peer-to-peer Information System Based on the XOR Metric Petar Maymounkov and David Mazières {petar,
An Improved Kademlia Protocol In a VoIP System Xiao Wu , Cuiyun Fu and Huiyou Chang Department of Computer Science, Zhongshan University, Guangzhou, China.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Peer to Peer Network Design Discovery and Routing algorithms
Sybil attacks as a mitigation strategy against the Storm botnet Authors:Carlton R. Davis, Jos´e M. Fernandez, Stephen Neville†, John McHugh Presenter:
Kademlia: A Peer-to-peer Information System Based on the XOR Metric.
Click to edit Master title style Multi-Destination Routing and the Design of Peer-to-Peer Overlays Authors John Buford Panasonic Princeton Lab, USA. Alan.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Peer-to-Peer Information Systems Week 12: Naming
School of Computing Clemson University Fall, 2012
Copyright notice © 2008 Raul Jimenez - -
CS 268: Lecture 22 (Peer-to-Peer Networks)
Decentralized peer discovery performance in swarm-protocols
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Data Management
Controlling the Cost of Reliability in Peer-to-Peer Overlays
(slides by Nick Feamster)
Improving and Generalizing Chord
Accessing nearby copies of replicated objects
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
Distributed P2P File System
P2P Systems and Distributed Hash Tables
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Information Systems Week 12: Naming
#02 Peer to Peer Networking
A. D. Sotiriou, P. Kalliaras, N. Mitrou
Presentation transcript:

Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original presentation

What is different? One major goal of P2P systems is object lookup: Given a data item X stored at some set of nodes in the system, find it. Unlike Chord, CAN, or Pastry Kademlia uses Tree-based routing.

Kademlia Nodes, files and key words, deploy SHA-1 hash into a 160 bits space. Every node maintains information about files, key words close to itself. The closeness between two objects measured as their bitwise XOR interpreted as an integer. distance(a, b) = a XOR b

Claims Only a small number of configuration messages sent by the nodes. Uses parallel asynchronous queries to avoid timeout delays of the failed nodes. Routes are selected based on latency Unlike (unidirectional) Chord Kademlia is symmetric i.e. dist (a,b) = dist (b,a)

Kademlia Binary Tree Treat nodes as leaves of a binary tree. Start from root, for any given node, dividing the binary tree into a series of successively lower subtrees that don’t contain the node.

Kademlia Binary Tree Subtrees of interest for a node 0011……

Kademlia Binary Tree Every node keeps touch with at least one node from each of its subtrees. (if there is a node in that subtree.) Corresponding to each subtree, there is a k-bucket. Every node keeps a list of (IP-address, Port, Node id) triples, and (key, value) tuples for further exchanging information with others.

Kademlia Search An example of lookup: node 0011 is searching for 1110……in the network

The XOR Metric d (x,x) = 0 d (x,y) > 0 if x ≠ y d (x,y) = d (y,x) d (x,y) + d (y,z) ≥ d (x, z) For each x and t, there is exactly one node y for which d (x,y) = t

Node state head For each i (0 ≤ i <160) every node keeps a list of nodes of distance between 2i and 2(i+1) from itself.. Call each list a k-bucket. The list is sorted by time last seen. The value of k is chosen so that any give set of k nodes is unlikely to fail within an hour. The list is updated whenever a node receives a message. k = system-wide replication parameter Least recenly seen Most recenly seen tail Gnutella showed that the longer a node is up, the more likely it is to remain up for one more hour

Node state The nodes in the k-buckets are the stepping stones of routing. By relying on the oldest nodes, k-buckets promise the probability that they will remain online. DoS attack is prevented since the new nodes find it difficult to get into the k-bucket Least recenly seen Most recenly seen How is the bucket updated?

Kademlia RPC PING: to test whether a node is online STORE: instruct a node to store a key FIND_NODE: takes an ID as an argument, a recipient returns (IP address, UDP port, node id) of the k nodes that it knows from the set of nodes closest to ID (node lookup) FIND_VALUE: behaves like FIND_NODE, unless the recipient received a STORE for that key, it just returns the stored value.

Kademlia Lookup The most important task is to locate the k closest nodes to some given node ID. Kademlia employs a recursive algorithm for node lookups. The lookup initiator starts by picking α nodes from its closest non-empty k-bucket. The initiator then sends parallel, asynchronous FIND_NODE to the α nodes it has chosen. α is a system-wide concurrency parameter, such as 3.

Kademlia Lookup When α = 1, the lookup resembles that in Chord in terms of message cost, and the latency of detecting the failed nodes. However, unlike Chord, Kademlia has the flexibility of choosing any one of the k nodes in a bucket, so it can forward with lower latency.

Kademlia Lookup The initiator resends the FIND_NODE to nodes it has learned about from previous RPCs. If a round of FIND_NODES fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried. The lookup terminates when the initiator has queried and gotten responses from the k closest nodes it has seen.

Kademlia Keys Store To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them STORE RPCs. Additionally, each node re-publishes (key,value) pairs as necessary to keep them alive. For Kademlia’s file sharing application, the original publisher of a (key,value) pair is required to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication.

New node join Each node bootstraps by looking for its own ID Search recursively until no closer nodes can be found The nodes passed on the way are stored in the routing table

Trackerless torrent Common problems with a single tracker is the single point of failure A solution is to use multiple trackers. Kademlia helps implement this

Main idea The DHT uses the SHA-1-hashes as keys The key is the hash of the metadata. It uniquely identifies a torrent. The data is a peer list of the peers in the swarm

Distributed tracker Each peer announces itself with the distributed tracker by looking up the 8 nodes closest to the SHA1-hash of the torrent and sending an announce message to them Those 8 nodes will then add the announcing peer to the peer list stored at that info-hash A peer joins a torrent by looking up the peer list at a specific hash

Conclusion Operation cost As low as other popular protocols Look up: O(logN), Join or leave: O(log2N) Fault tolerance and concurrent change Handles well via the use of k-buckets Proximity routing -- chooses nodes that has low latency Handles DoS attacks by using that are up for a long time The architecture works with various base values. A common choice is b=5.