Evolution of P2P Content Distribution Pei Cao
Outline History of P2P Content Distribution Architectures History of P2P Content Distribution Architectures Techniques to Improve Gnutella Techniques to Improve Gnutella Brief Overview of DHT Brief Overview of DHT Techniques to Improve BitTorrent Techniques to Improve BitTorrent
History of P2P Napster Napster Gnutella Gnutella KaZaa KaZaa Distributed Hash Tables Distributed Hash Tables BitTorrent BitTorrent
Napster Centralized directory Centralized directory –A central website to hold directory of contents of all peers –Queries performed at the central directory –File transfer occurs between peers –Support arbitrary queries –Con: Single point of failure
Gnutella Decentralized homogenous peers Decentralized homogenous peers –No central directory –Queries performed distributed on peers via “flooding” –Support arbitrary queries –Very resilient against failure –Problem: Doesn’t scale
FastTrack/KaZaa Distributed Two-Tier architecture Distributed Two-Tier architecture –Supernodes: keep content directory for regular nodes –Regular nodes: do not participate in Query processing –Queries performed by Supernodes only –Support arbitrary queries –Con: supernodes stability affect system performance
Distributed Hash Tables Structured Distributed System Structured Distributed System –Structured: all nodes participate in a precise scheme to maintain certain invariants –Provide a directory service: Directory service Directory service Routing Routing –Extra work when nodes join and leave –Support key-based lookups only
BitTorrent Distribution of very large files Distribution of very large files Tracker connects peers to each other Tracker connects peers to each other Peers exchange file blocks with each other Peers exchange file blocks with each other Use Tit-for-Tat to discourage free loading Use Tit-for-Tat to discourage free loading
Improving Gnutella
Gnutella-Style Systems Advantages of Gnutella: Advantages of Gnutella: –Support more flexible queries Typically, precise “name” search is a small portion of all queries Typically, precise “name” search is a small portion of all queries –Simplicity –High resilience against node failures Problems of Gnutella: Scalability Problems of Gnutella: Scalability –Flooding # of messages ~ O(N*E)
Flooding-Based Searches Duplication increases as TTL increases in flooding Duplication increases as TTL increases in flooding Worst case: a node A is interrupted by N * q * degree(A) messages Worst case: a node A is interrupted by N * q * degree(A) messages
Load on Individual Nodes Why is a node interrupted: Why is a node interrupted: –To process a query –To route the query to other nodes –To process duplicated queries sent to it
Communication Complexity Communication complexity determined by: Network topology Network topology Distribution of object popularity Distribution of object popularity Distribution of replication density of objects Distribution of replication density of objects
Network Topologies Uniform Random Graph (Random) Uniform Random Graph (Random) –Average and median node degree is 4 Power-Law Random Graph (PLRG) Power-Law Random Graph (PLRG) –max node degree: 1746, median: 1, average: 4.46 Gnutella network snapshot (Gnutella) Gnutella network snapshot (Gnutella) –Oct 2000 snapshot –max degree: 136, median: 2, average: 5.5 Two-dimensional grid (Grid) Two-dimensional grid (Grid)
Modeling Methods Object popularity distribution p i Object popularity distribution p i –Uniform –Zipf-like Object replication density distribution r i Object replication density distribution r i –Uniform –Proportional: r i p i –Square-Root: r i p i
Evaluation Metrics Overhead: average # of messages per node per query Overhead: average # of messages per node per query Probability of search success: Pr(success) Probability of search success: Pr(success) Delay: # of hops till success Delay: # of hops till success
Duplications in Various Network Topologies
Relationship between TTL and Search Successes
Problems with Simple TTL- Based Flooding Hard to choose TTL: Hard to choose TTL: –For objects that are widely present in the network, small TTLs suffice –For objects that are rare in the network, large TTLs are necessary Number of query messages grow exponentially as TTL grows Number of query messages grow exponentially as TTL grows
Idea #1: Adaptively Adjust TTL “Expanding Ring” “Expanding Ring” –Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Success varies by network topology Success varies by network topology –For “Random”, 30- to 70- fold reduction in message traffic –For Power-law and Gnutella graphs, only 3- to 9- fold reduction
Limitations of Expanding Ring
Idea #2: Random Walk Simple random walk Simple random walk –takes too long to find anything! Multiple-walker random walk Multiple-walker random walk –N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps –When to terminate the search: check back with the query originator once every C steps
Search Traffic Comparison
Search Delay Comparison
Flexible Replication In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters Limited node storage => what’s the optimal replication density distribution? Limited node storage => what’s the optimal replication density distribution? –In Gnutella, only nodes who query an object store it => r i p i –What if we have different replication strategies?
Optimal r i Distribution Goal: minimize ( p i / r i ), where r i =R Goal: minimize ( p i / r i ), where r i =R Calculation: Calculation: –introduce Lagrange multiplier, find r i and that minimize: ( p i / r i ) + * ( r i - R) ( p i / r i ) + * ( r i - R) => - p i / r i 2 = 0 for all i => - p i / r i 2 = 0 for all i => r i p i => r i p i
Square-Root Distribution General principle: to minimize ( p i / r i ) under constraint r i =R, make r i proportional to square root of p i General principle: to minimize ( p i / r i ) under constraint r i =R, make r i proportional to square root of p i Other application examples: Other application examples: –Bandwidth allocation to minimize expected download times –Server load balancing to minimize expected request latency
Achieving Square-Root Distribution Suggestions from some heuristics Suggestions from some heuristics –Store an object at a number of nodes that is proportional to the number of node visited in order to find the object –Each node uses random replacement Two implementations: Two implementations: –Path replication: store the object along the path of a successful “walk” –Random replication: store the object randomly among nodes visited by the agents
Evaluation of Replication Methods Metrics Metrics –Overall message traffic –Search delay Dynamic simulation Dynamic simulation –Assume Zipf-like object query probability –5 query/sec Poisson arrival –Results are during 5000sec-9000sec
Distribution of r i
Total Search Message Comparison Observation: path replication is slightly inferior to random replication Observation: path replication is slightly inferior to random replication
Search Delay Comparison
Summary Multi-walker random walk scales much better than flooding Multi-walker random walk scales much better than flooding –It won’t scale as perfectly as structured network, but current unstructured network can be improved significantly Square-root replication distribution is desirable and can be achieved via path replication Square-root replication distribution is desirable and can be achieved via path replication
KaZaa Use Supernodes Use Supernodes Regular Nodes : Supernodes = 100 : 1 Regular Nodes : Supernodes = 100 : 1 Simple way to scale the system by a factor of 100 Simple way to scale the system by a factor of 100
DHTs: A Brief Overview (Slides by Bard Karp)
What Is a DHT? Single-node hash table: Single-node hash table: key = Hash(name) put(key, value) get(key) -> value How do I do this across millions of hosts on the Internet? How do I do this across millions of hosts on the Internet? –Distributed Hash Table
Distributed Hash Tables Chord Chord CAN CAN Pastry Pastry Tapastry Tapastry etc. etc. etc. etc.
The Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Put (Key=“title” Value=file data…) Client Get(key=“title”) ? Key Placement Routing to find key
Key Placement Traditional hashing Traditional hashing –Nodes numbered from 1 to N –Key is placed at node (hash(key) % N) Why Traditional Hashing have problems Why Traditional Hashing have problems
Consistent Hashing: IDs Key identifier = SHA-1(key) Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly SHA-1 distributes both uniformly How to map key IDs to node IDs? How to map key IDs to node IDs?
Consistent Hashing: Placement A key is stored at its successor: node with next higher ID K80 N32 N90 N105 K20 K5 Circular 7-bit ID space Key 5 Node 105
Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”
“Finger Table” Allows log(N)-time Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128
Finger i Points to Successor of n+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/ N120
Lookups Take O( log(N) ) Hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19
Joining: Linked List Insert N36 N40 N25 1. Lookup(36) K30 K38
Join (2) N36 N40 N25 2. N36 sets its own successor pointer K30 K38
Join (3) N36 N40 N25 3. Copy keys from N40 to N36 K30 K38 K30
Join (4) N36 N40 N25 4. Set N25’s successor pointer Predecessor pointer allows link to new host Update finger pointers in the background Correct successors produce correct lookups K30 K38 K30
Chord Lookup Algorithm Properties Interface: lookup(key) IP address Interface: lookup(key) IP address Efficient: O(log N) messages per lookup Efficient: O(log N) messages per lookup –N is the total number of servers Scalable: O(log N) state per node Scalable: O(log N) state per node Robust: survives massive failures Robust: survives massive failures Simple to analyze Simple to analyze
Many Many Variations of The Same Theme Different ways to choose the fingers Different ways to choose the fingers Ways to make it more robust Ways to make it more robust Ways to make it more network efficient Ways to make it more network efficient etc. etc. etc. etc.
Improving BitTorrent
BitTorrent File Sharing Network Goal: replicate K chunks of data among N nodes Form neighbor connection graph Form neighbor connection graph Neighbors exchange data Neighbors exchange data
BitTorrent: Neighbor Selection Tracker file.torrent 1 Seed Whole file A
BitTorrent: Piece Replication Tracker file.torrent 1 Seed Whole file A 3 2
BitTorrent: Piece Replication Algorithms “Tit-for-tat” (choking/unchoking): “Tit-for-tat” (choking/unchoking): –Each peer only uploads to 7 other peers at a time –6 of these are chosen based on amount of data received from the neighbor in the last 20 seconds –The last one is chosen randomly, with a 75% bias toward new comers (Local) Rarest-first replication: (Local) Rarest-first replication: –When peer 3 unchokes peer A, A selects which piece to download
Performance of BitTorrent Conclusion from modeling studies: BitTorrent is nearly optimal in idealized, homogeneous networks Conclusion from modeling studies: BitTorrent is nearly optimal in idealized, homogeneous networks –Demonstrated by simulation studies –Confirmed by theoretical modeling studies Intuition: in a random graph, Intuition: in a random graph, Prob(Peer A’s content is a subset of Peer B’s) ≤ 50%
Lessons from BitTorrent Often, randomized simple algorithms perform better than elaborately designed deterministic algorithms Often, randomized simple algorithms perform better than elaborately designed deterministic algorithms
Problems of BitTorrent ISPs are unhappy ISPs are unhappy –BitTorrent is notoriously difficult to “traffic engineer” –ISPs: different links have different monetary costs –BitTorrent: Peers are all equal Peers are all equal Choices made based on measured performance Choices made based on measured performance No regards for underlying ISP topology or preferences No regards for underlying ISP topology or preferences
BitTorrent and ISPs: Play Together? Current state of affairs: a clumsy co-existence Current state of affairs: a clumsy co-existence –ISPs “throttle” BitTorrent traffic along high-cost links –Users suffer Can they be partners? Can they be partners? –ISPs inform BitTorrent of its preferences –BitTorrent schedules traffic in ways that benefit both Users and ISPs
Random Neighbor Selection Existing studies all assume random neighbor selection Existing studies all assume random neighbor selection –BitTorrent no longer optimal if nodes in the same ISP only connect to each other Random neighbor selection high cross- ISP traffic Random neighbor selection high cross- ISP traffic Q: Can we modify the neighbor selection scheme without affecting performance?
Biased Neighbor Selection Idea: of N neighbors, choose N-k from peers in the same ISP, and choose k randomly from peers outside the ISP Idea: of N neighbors, choose N-k from peers in the same ISP, and choose k randomly from peers outside the ISP ISP
Implementing Biased Neighbor Selection By Tracker By Tracker –Need ISP affiliations of peers Peer to AS maps Peer to AS maps Public IP address ranges from ISPs Public IP address ranges from ISPs Special “X-” HTTP header Special “X-” HTTP header By traffic shaping devices By traffic shaping devices –Intercept “peer tracker” messages and manipulate responses –No need to change tracker or client
Evaluation Methodology Event-driven simulator Event-driven simulator –Use actual client and tracker codes as much as possible –Calculate bandwidth contention, assume perfect fair- share from TCP Network settings Network settings –14 ISPs, each with 50 peers, 100Kb/s upload, 1Mb/s download –Seed node, 400Kb/s upload –Optional “university” nodes (1Mb/s upload) –Optional ISP bottleneck to other ISPs
Limitation of Throttling
Throttling: Cross-ISP Traffic Redundancy: Average # of times a data chunk enters the ISP
Biased Neighbor Selection: Download Times
Biased Neighbor Selection: Cross- ISP Traffic
Importance of Rarest-First Replication Random piece replication performs badly Random piece replication performs badly –Increases download time by 84% - 150% –Increase traffic redundancy from 3 to 14 Biased neighbors + Rarest-First More uniform progress of peers Biased neighbors + Rarest-First More uniform progress of peers
Biased Neighbor Selection: Single-ISP Deployment
Presence of External High- Bandwidth Peers Biased neighbor selection alone: Biased neighbor selection alone: –Average download time same as regular BitTorrent –Cross-ISP traffic increases as # of “university” peers increase Result of tit-for-tat Result of tit-for-tat Biased neighbor selection + Throttling: Biased neighbor selection + Throttling: –Download time only increases by 12% Most neighbors do not cross the bottleneck Most neighbors do not cross the bottleneck –Traffic redundancy (i.e. cross-ISP traffic) same as the scenario without “university” peers
Comparison with Alternatives Gateway peer: only one peer connects to the peers outside the ISP Gateway peer: only one peer connects to the peers outside the ISP –Gateway peer must have high bandwidth It is the “seed” for this ISP It is the “seed” for this ISP –Ends up benefiting peers in other ISPs Caching: Caching: –Can be combined with biased neighbor selection –Biased neighbor selection reduces the bandwidth needed from the cache by an order of magnitude
Summary By choosing neighbors well, BitTorrent can achieve high peer performance without increasing ISP cost By choosing neighbors well, BitTorrent can achieve high peer performance without increasing ISP cost –Biased neighbor selection: choose initial set of neighbors well –Can be combined with throttling and caching P2P and ISPs can collaborate!