CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Peer-To-Peer Networks A network in which nodes employ distributed resources to accomplish critical task Nodes are typically equals, i.e. (approximately) indistinguishable in functionality System is highly dynamic, nodes frequently come and go
P2P Definition Distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority. – A Survey of Peer-To-Peer Content Distribution Technologies, Androutsellis-Theotokis and Spinellis
Peer-To-Peer Applications Direct real-time communication: instant messaging Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation Store and distribute digital content: mp3 file sharing
Peer-To-Peer Benefits Self-organized and adaptive Easily scalable Fault-tolerant and load balanced Resistant to censorship
P2P Construction MIT Clients Servers SPRINT UUNET AOL P2P overlay network
P2P Classification Unstructured Loosely Structured Highly Structured HybridNapster, IM PartialKazaa, Gia NoneGnutellaFreenetChord, CAN Centralization Data organization
Napster, IM Centralized servers maintain list of files and peer at which file is stored Peers join, leave, and query network via direct communication with servers File transfers occur directly between peers Server Query: U2 Reply: 6 File Transfer
Napster, IM Advantages: Highly efficient data lookup Rapidly adapts to changes in network Disadvantages: Questionable scalability Vulnerable to censorship, failure, attack
Gnutella All peers, called servents, are identical and function as both servers and clients A peer joins network by contacting existing servents (chosen from online databases) using PING messages A servent receiving a PING message replies with a PONG message and forwards PING to other servents Peer connects to servents who send PONG
Gnutella A servent queries network by sending a QUERY message A servent receiving a QUERY message replies with a QUERYHIT message if he can answer the query. If not, he forwards QUERY message to other servents
Routing in Gnutella How PING/QUERY messages are forwarded affects network topology, search efficiency/accuracy, and scalability Proposals Breadth-First-Search: flooding, iterative deepening, modified random BFS Depth-First-Search: random walk, k-walker random walks, two-level random walk, dominating set based search
Hybrid Search Random walk with lookahead: short random walks with shallow local flooding Takes advantage of “supernodes” or nodes of high degree Stationary dist. of random walk is naturally biased towards supernodes Lookahead allows search to quickly discover content stored at all neighbors of these high- degree nodes
Supernodes Improve scalability and performance of Gnutella-like systems via supernodes Supernodes are special peers with high degree, elected dynamically according to bandwidth and other considerations Supernodes maintain a list of content stored at peers Advantages: Searches propagate on supernodes, 3 to 5 times faster Takes advantage of heterogeneity in network
Gnutella Advantages Entirely decentralized, pure P2P network Highly resistant to failure Disadvantages Search is time-consuming Network typically scales poorly
Chord Distributed hash table (DHT) implementation Each node/piece of content has an ID Content IDs are deterministically mapped to node IDs so a searcher knows exactly where data is located, a content addressable network Efficient: O(log n) messages per lookup Scalable: O(log n) state per node
Keys in Chord m bit identifier space for both nodes and content keys Content ID = hash(content) Node ID = hash(IP address) Both are uniformly distributed How to map content IDs to node IDs?
N32 N90 N123 K20 K5 Circular 7-bit ID space 0 IP=“ ” K101 K60 Content = “U2” Mapping Content to Nodes Content is stored at successor node, node with next higher ID Figure adapted from Stoica et al.
Routing Every node knows of every other node Routing tables O(n), lookup O(1) N32 N90 N123 Hash(“U2”) = K60 N10 N55 Where is “U2”? “N90 has K60” K60 Figure adapted from Stoica et al.
Routing Every node knows its successor in ring Routing tables O(1), lookup O(n) N32 N90 N123 Hash(“U2”) = K60 N10 N55 Where is “U2”? “N90 has K60” K60 Figure adapted from Stoica et al.
Routing Every node knows m others Distances increase exponentially, node i points to node whose ID is successor of i + 2 j for j from 1 to m. These pointers are called fingers. The finger (routing) table and search time are both O(log n)
Finger Tables N N112 N96 N Figure adapted from Stoica et al.
Routing with Finger Tables N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19 Figure adapted from Stoica et al.
Chord Dynamics When a node joins Initialize all fingers of new node Update fingers of existing nodes Transfer content from successor to new node When a node leaves Transfer content to successor
Chord Failures Churn rate is very high (on average, nodes are in system for only 60 minutes) and events happen concurrently Churn (esp. ungraceful departures or simultaneous joins/departures) can failure states, e.g. inconsistencies in successor relationships or, worse, loopy states Requires a lot of maintenance messages to preserve ideal state
Maintenance in P2P Maintenance protocol ensures global connectivity and efficient lookup by continuously repairing overlay network and routing tables Maintenance is essential, e.g. when a node Joins and announces presence Updates routing table to ensure efficient search Monitors neighbors for failures/departures Cost of maintenance protocol can be measured in terms of the rate of maintenance messages
Half Life Defn. Suppose there are N t nodes at time t. Let the doubling time t be such that at time t + t, N t new nodes have arrived. Similarly let halving time t be such that at time t + t, N t /2 nodes have departed. Then the half life of a system is min t ( t, t ). The half life is the average amount of time until half the system has been replaced. Measures rate of change of system.
Example Nodes arrive according to Poisson with rate : prob. k arrivals in time t proportional to e - t Nodes remain for duration exponential rate : prob. node stays for amount of time is e If system in steady state, then arrival rate must equal departure rate N, so N = / . Doubling time = N/ = 1/ , halving time = (ln 2)/ , and so half life = (ln 2)/ .
Bounding Maintenance Costs Thm. There exists a sequence of joins and leaves such that any node that, at any time, has received an average of fewer than k notifications per half-life will be disconnected from the network with prob. at least (1 – 1/e) k. Cor. Any N-node P2P network that remains connected with prob. at least 1 – 1/N must generate an average of (log N) notifications per node per half life.
Proof of Bound Consider Poisson arrival rate, exponential waiting time =1 in system. Suppose node n averages fewer than k notifications per half life and so there is a minimum time t such that at time t, n has received less than kt notifications. Observe n is isolated at time t with probability at least (1 – 1/(e-1)) k.
Maintenance in Chord Liben-Nowell, Balakrishnan, Karger: (Modified) Chord requires only O(log 2 n) maintenance messages per half life to maintain efficiency and correctness of search.
Chord Advantages: Highly efficient search Good load balancing Disadvantages: Locality of data is destroyed Only handles exact match queries, but keyword queries are more prevalent Most requests are for highly replicated files (needles vs haystack)
Conclusion Saw several representative P2P systems, each with advantages and disadvantages Many important issues Efficiency of search Ability to adapt to dynamics of system Security: Malicious peers, Spread of worms Free riding: Reputation mechanisms, Micro- payment mechanisms Legality