1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory for Computer Science
2 SFS: a secure global file system One name space for all files Global deployment Security over untrusted networks Server Oxygen client Client MIT H21 /global/mit/kaashoek/sfs
3 SFS results Research: how to do server authentication? –Self-certifying pathnames –flexible key management Complete system available – –65,000 lines of C++ code –Toolkit for file system research System used inside and outside MIT Ported to iPAQ
4 New direction: peer-to-peer file sharing How to build distributed systems without centrally-managed servers? Many Oxygen technologies are peer-to-peer –INS, SFS/Chord, Grid Chord is a new elegant primitive for building peer- to-peer applications
5 Peer-to-Peer Properties Main advantage: decentralized –No single point of failure in the system –More robust to random faults or adversaries –No need for central adminstration Main disadvantage: decentralized –All failures equally important---no “clients” –Difficult to coordinate use of resources –No opportunity for central administration
6 Peer-to-Peer Challenges Load balancing –No node should be overloaded Coordination –Agree globally on who responsible for what Dynamic network/fault tolerance –Readjust responsibility as peers come and go Scalability –Rousources per peer must be negligible
7 Peer-to-peer sharing example Internet users share music files –Share disk storage and network bandwidth –10Gb for 1 hour/day continuous 400Mb Internet
8 Key Primitive: Lookup insert find
9 Chord: a P2P Routing Primitive Lookup is the key problem –Given identifier, find responsible machine Lookup is not easy: –GNUtella scales badly---too much lookup work –Freenet is imprecise---lookups can fail Chord lookup provides: –Good naming semantics and efficiency –Elegant base for layered features
10 Chord Architecture Interface: –Lookup(ID) IP-address –ID might be node name, document ID, etc. –Get IP address of node responsible for ID –Application decides what to do with IP address Chord consists of –Consistent hashing to assign IDs to nodes –Efficient routing protocol to find right node –Fast join/leave protocol
11 Chord Properties Log(n) lookup messages and table space. –Log(1,000,000) 20 Well-defined location for each ID –No search required Natural load balance Minimal join/leave disruption Does not store documents –But document store layers easily on top
12 Assignment of Responsibility
13 Consistent Hashing
14 Consistent Hashing Each node picks random point on identifier circle
15 Consistent Hashing Hash document ID to identifier circle
16 Consistent Hashing Assign ID to “successor” node on circle 49 Assign doc with hash 49 to node 51
17 Load Balance Each node responsible for circle segment between it and previous node But random node positions mean previous node close So no node responsible for too much Segment for node 31
18 Dynamic Network To know appropriate successor, must know identifiers of all nodes on circle Requires lots of state per node And state must be kept current Requires huge number of messages when a node joins or leaves
19 Successor Pointers Each node keeps track of successor on circle To find objects, walk around circle using successor pointers When node joins, notify one node to update successor Problem: slow!
20 Fingers Each node keeps carefully chosen “fingers”---shortcuts around circle For distant ID, shortcut covers much distance Result: –fast lookups –small tables
21 Powers of 2 Node at ID n stores fingers to nodes at IDs n+1/2, n+1/4, n+1/8, n+1/16…. log(n) fingers needed into n nodes Key fact: whatever current node, some power of 2 is halfway to target Distance to target halves in each step log(n) steps suffice to reach target log(1,000,000) ~ 20
22 Chord Lookups
23 Node Join Operations Integrate into routing mechanism –New node finds successor (via lookup) –Determines fingers (more lookups) –Total: O(log 2 (n)) time to join network Takes responsibility for certain objects from successor –Upcall for application dependent reaction –E.g., may copy documents from other node
24 Fault Tolerance Node failures have 2 problems: –Lost data –Corrupted routing (fingers cut off) Data solution: replicate: –Place copies of data at adjacent nodes –If successor fails, next node becomes successor Finger solution: alternate paths –If finger lost, use different (shorter) finger –Lookups still fast
25 File sharing with Chord Chord Key/Value Client App (e.g. Browser) Client Server lookup(id) get(key) put(k, v) Fault tolerance: store values at r successors Hot documents: cache values along Chord lookup path Authentication: self-certifying names (SFS)
26 Chord Status Working Chord implementation SFSRO file system layered on top Prototype deployed at 12 sites around world Understand design tradeoffs
27 Open Issues Network proximity Malicious data insertion Malicious Chord table information Anonymity Keyword search and indexing
28 Chord Summary Chord provides distributed lookup –Efficient, low-impact join and leave Flat key space allows flexible extensions Good foundation for peer-to-peer systems