A Backup System built from a Peer-to-Peer Distributed Hash Table Russ Cox joint work with Josh Cates, Frank Dabek, Frans Kaashoek, Robert Morris,

Slides:



Advertisements
Similar presentations
Distributed Hash Tables
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Distributed Lookup Systems
University of Oregon Slides from Gotz and Wehrle + Chord paper
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications Frans Kaashoek Joint work with: H. Balakrishnan, P.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord+DHash+Ivy: Building Principled Peer-to-Peer Systems Robert Morris Joint work with F. Kaashoek, D. Karger, I. Stoica, H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-peer computing research: a fad? Frans Kaashoek Joint work with: H. Balakrishnan, P. Druschel, J. Hellerstein, D. Karger, R.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Data Structure of Chord  For each peer -successor link on the ring -predecessor link on the ring -for all i ∈ {0,..,m-1} Finger[i] := the peer following.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Peer-to-Peer Information Systems Week 12: Naming
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
EE 122: Peer-to-Peer (P2P) Networks
Distributed Hash Tables
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

A Backup System built from a Peer-to-Peer Distributed Hash Table Russ Cox joint work with Josh Cates, Frank Dabek, Frans Kaashoek, Robert Morris, James Robertson, Emil Sit, Jacob Strauss MIT LCS

What is a P2P system? System without any central servers Every node is a server No particular node is vital to the network Nodes all have same functionality Huge number of nodes, many node failures Enabled by technology improvements Node Internet

Robust data backup Idea: backup on other user’s machines Why? Many user machines are not backed up Backup requires significant manual effort now Many machines have lots of spare disk space Requirements for cooperative backup: Don’t lose any data Make data highly available Validate integrity of data Store shared files once More challenging than sharing music!

The promise of P2P computing Reliability: no central point of failure Many replicas Geographic distribution High capacity through parallelism: Many disks Many network connections Many CPUs Automatic configuration Useful in public and proprietary settings

Distributed hash table (DHT) DHT distributes data storage over perhaps millions of nodes DHT provides reliable storage abstraction for applications Distributed hash table Distributed application get (key) data node …. put(key, data) Lookup service lookup(key)node IP address (Backup) (DHash) (Chord)

DHT implementation challenges 1.Data integrity 2.Scalable lookup 3.Handling failures 4.Network-awareness for performance 5.Coping with systems in flux 6.Balance load (flash crowds) 7.Robustness with untrusted participants 8.Heterogeneity 9.Anonymity 10.Indexing Goal: simple, provably-good algorithms this talk

1. Data integrity: self-authenticating data Key = SHA1(data) after download, can use key to verify data Use keys in other blocks as pointers can build arbitrary tree-like data structures always have key: can verify every block

2. The lookup problem put(key, data) Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Client get(key) ? How do you find the node responsible for a key?

Any node can store any key Central server knows where keys are Simple, but O( N ) state for server Server can be attacked (lawsuit killed Napster) Centralized lookup (Napster) Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) N4N4

Any node can store any key Lookup by asking every node about key Asking every node is very expensive Asking only some nodes might not find key Flooded queries (Gnutella) N4N4 Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”)

Lookup is a routing problem Assign key ranges to nodes Pass lookup from node to node making progress toward destination Nodes can’t choose what they store But DHT is easy: DHT put(): lookup, upload data to node DHT get(): lookup, download data from node

Routing algorithm goals Fair (balanced) key range assignments Small per-node routing table Easy to maintain routing table Small number of hops to route message Simple algorithm

Chord: key assignments Arrange nodes and keys in a circle Node IDs are SHA1(IP address) A node is responsible for all keys between it and the node before it on the circle Each node is responsible for about 1/N of keys K20 K5 K80 Circular ID space N32 N90 N105 N60 (N90 is responsible for keys K61 through K90)

Chord: routing table Routing table lists nodes: ½ way around circle ¼ way around circle 1/8 way around circle … next around circle log N entries in table Can always make a step at least halfway to destination N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

Lookups take O(log N ) hops Each step goes at least halfway to destination log N steps, like binary search N32 does lookup for K19 N32 N10 N5 N110 N99 N80 N60 N20 K19

3. Handling failures: redundancy Each node knows about next r nodes on circle Each key is stored by the r nodes after it on the circle To save space, each node stores only a piece of the block Collecting half the pieces is enough to reconstruct the block N32 N10 N5 N110 N99 N80 N60 N20 K19 N40 K19

Redundancy handles failures Failed Lookups (Fraction) Failed Nodes (Fraction) 1000 DHT nodes Average of 5 runs 6 replicas for each key Kill fraction of nodes Then measure how many lookups fail All replicas must be killed for lookup to fail

4. Exploiting proximity Path from N20 to N80 might usually go through N41 going through N40 would be faster In general, nodes close on ring may be far apart in Internet Knowing about proximity could help performance N20 N41 N80 N40

Proximity possibilities Given two nodes, how can we predict network distance (latency) accurately? Every node pings every other node requires N 2 pings (does not scale) Use static information about network layout poor predictions what if the network layout changes? Every node pings some reference nodes and “triangulates” to find position on Earth how do you pick reference nodes? Earth distances and network distances do not always match

Vivaldi: network coordinates Assign 2D or 3D “network coordinates” using spring algorithm. Each node: … starts with random coordinates … knows distance to recently contacted nodes and their positions … imagines itself connected to these other nodes by springs with rest length equal to the measured distance … allows the springs to push it for a small time step Algorithm uses measurements of normal traffic: no extra measurements Minimizes average squared prediction error

Vivaldi in action: Planet Lab Simulation on “Planet Lab” network testbed 100 nodes mostly in USA some in Europe, Australia ~25 measurements per node per second in movie

Geographic vs. network coordinates Derived network coordinates are similar to geographic coordinates but not exactly the same over-sea distances shrink (faster than over-land) without extra hints, orientation of Australia and Europe “wrong”

Vivaldi predicts latency well

When you can predict latency…

… contact nearby replicas to download the data

When you can predict latency… … contact nearby replicas to download the data … stop the lookup early once you identify nearby replicas

Finding nearby nodes Exchange neighbor sets with random neighbors Combine with random probes to explore Provably-good algorithm to find nearby neighbors based on sampling [Karger and Ruhl 02]

When you have many nearby nodes… … route using nearby nodes instead of fingers

DHT implementation summary Chord for looking up keys Replication at successors for fault tolerance Fragmentation and erasure coding to reduce storage space Vivaldi network coordinate system for Server selection Proximity routing

Backup system on DHT Store file system image snapshots as hash trees Can access daily images directly Yet images share storage for common blocks Only incremental storage cost Encrypt data User-level NFS server parses file system images to present dump hierarchy Application is ignorant of DHT challenges DHT is just a reliable block store

Future work DHTs Improve performance Handle untrusted nodes Vivaldi Does it scale to larger and more diverse networks? Apps Need lots of interesting applications

Related Work Lookup algs CAN, Kademlia, Koorde, Pastry, Tapestry, Viceroy, … DHTs OceanStore, Past, … Network coordinates and springs GNP, Hoppe’s mesh relaxation Applications Ivy, OceanStore, Pastiche, Twine, …

Conclusions Peer-to-peer promises some great properties Once we have DHTs, building large-scale, distributed applications is easy Single, shared infrastructure for many applications Robust in the face of failures and attacks Scalable to large number of servers Self configuring across administrative domains Easy to program

Links Chord home page Project IRIS (Peer-to-peer research)