Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed hash table Stores values at nodes Hash function Name -> Hash key, name can be any string or byte array Article mixes up key and ID Chord CFS
Chord A scalable Peer-to-peer Lookup Protocol for Internet Applications
Chord purpose Map keys to nodes (Compared to Freenet: No anonymity)
Goals Load balance Decentralization Scalability Availability Flexible naming
Consistent hashing
Simple network topology
Efficient network topology
Lookup algorithm
Node joining 26.join(friend) -> 26.successor = stabilize -> 32.notify(26) 21.stabilize -> 21.successor=26 -> 26.notify(21)
Preventing lookup failure Successor list length r Disregarding network failures Assuming each node failing within one stabilization period with probability p: Connectivity loss for a node with probability: p^r
Path lengths from simulation Probability density function for path length in a network of 2^12 nodes. Path lengths with varying N
Load balance Nodes: 10^4, keys: 5*10^5
Virtual servers 10^4 nodes and 10^6 keys
Resilience to failed nodes In a network of 1000 nodes
Latency stretch In a network of 2^16 nodesc = Chord latency i = IP latency stretch = c / i
CFS Wide-area cooperative storage
Purpose Distributed cooperative file system
System design
File system using DHash
Block placement Tick mark: Block ID Square: Server responsible for ID (in Chord) Circles: Servers holding replicas Triangle: Servers receiving a copy of the block to cache
Availability r servers holding replicas of a block The server responsible for ID is responsible for detecting failed replica servers If the server responsible for ID fails the new server in charge will be the first replica server Replica server detects this when Chord stabilizes Replica nodes are found in the successor list
Persistence Each server promises to keep a copy of a block available for at least an agreed-on interval Publishers can ask for extensions This does not apply to cached copies, but to replicas The server responsible for the ID is also responsible for relaying extension requests to servers holding replicas
Load balancing Consistent hashing Virtual servers Caching
Preventing flooding Each CFS server limits any one IP address to using a certain percentage of its storage Percentage might be lowered as more nodes enter the network Can be circumvented by clients with dynamic IP addresses
Efficiency Efficient lookups using Chord Prefetching Server selection
Conclusion Efficient Scalable Available Load-balanced Decentralized Persistent Prevents flooding