Brief Overview of Academic Research on P2P Pei Cao.

Brief Overview of Academic Research on P2P Pei Cao

Relevant Conferences IPTPS (International Workshop on Peer-to-Peer Systems) IPTPS (International Workshop on Peer-to-Peer Systems) ICDCS (IEEE Conference on Distributed Computer Systems) ICDCS (IEEE Conference on Distributed Computer Systems) NSDI (USENIX Symposium on Network System Design and Implementation) NSDI (USENIX Symposium on Network System Design and Implementation) PODC (ACM Symposium on Principles of Distributed Computing) PODC (ACM Symposium on Principles of Distributed Computing) SIGCOMM SIGCOMM

Areas of Research Focus Gnutella-Inspired Gnutella-Inspired The “Directory Service” Problem The “Directory Service” Problem BitTorrent-Inspired BitTorrent-Inspired The “File Distribution” Problem The “File Distribution” Problem P2P Live Streaming P2P Live Streaming P2P and Net Neutrality P2P and Net Neutrality

Gnutella-Inspired Research Studies

The Applications and The Problems Napster, Gnutella, KaZaa/FastTrak, Skype Napster, Gnutella, KaZaa/FastTrak, Skype Look for a particular content/object, and find which peer has it  the “directory service” problem Look for a particular content/object, and find which peer has it  the “directory service” problem Challenge: how to offer a scalable directory service in a fully decentralized fashion Challenge: how to offer a scalable directory service in a fully decentralized fashion Arrange direct transfer from the peer  the “punch a hole in the firewall” problem Arrange direct transfer from the peer  the “punch a hole in the firewall” problem

Decentralized Directory Services Structured Networks Structured Networks DHT (Distributed Hash Tables) DHT (Distributed Hash Tables) Very active research areas from 2001 to 2004 Very active research areas from 2001 to 2004 Limitation: lookup by keys only Limitation: lookup by keys only Multi-Attribute DHT Multi-Attribute DHT Limited support for query-based lookup Limited support for query-based lookup Unstructured Networks Unstructured Networks Various improvements to basic flooding based schemes Various improvements to basic flooding based schemes

What Is a DHT? Single-node hash table: Single-node hash table: key = Hash(name) put(key, value) get(key) -> value How do I do this across millions of hosts on the Internet? How do I do this across millions of hosts on the Internet? Distributed Hash Table Distributed Hash Table

Distributed Hash Tables Chord Chord CAN CAN Pastry Pastry Tapastry Tapastry Symphony Symphony Koodle Koodle etc. etc.

The Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Put (Key=“title” Value=file data…) Client Get(key=“title”) ? Key Placement Routing to find key

Key Placement Traditional hashing Traditional hashing Nodes numbered from 1 to N Nodes numbered from 1 to N Key is placed at node (hash(key) % N) Key is placed at node (hash(key) % N) Why Traditional Hashing have problems Why Traditional Hashing have problems

Consistent Hashing: IDs Key identifier = SHA-1(key) Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly SHA-1 distributes both uniformly How to map key IDs to node IDs? How to map key IDs to node IDs?

Consistent Hashing: Placement A key is stored at its successor: node with next higher ID K80 N32 N90 N105 K20 K5 Circular 7-bit ID space Key 5 Node 105

Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

“Finger Table” Allows log(N)-time Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

Finger i Points to Successor of n+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128 112 N120

Lookups Take O( log(N) ) Hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

Chord Lookup Algorithm Properties Interface: lookup(key)  IP address Interface: lookup(key)  IP address Efficient: O(log N) messages per lookup Efficient: O(log N) messages per lookup N is the total number of servers N is the total number of servers Scalable: O(log N) state per node Scalable: O(log N) state per node Robust: survives massive failures Robust: survives massive failures Simple to analyze Simple to analyze

Related Studies on DHTs Many variations of DHTs Many variations of DHTs Different ways to choose the fingers Different ways to choose the fingers Ways to make it more robust Ways to make it more robust Ways to make it more network efficient Ways to make it more network efficient Studies of different DHTs Studies of different DHTs What happens when peers leave aka churns? What happens when peers leave aka churns? Applications built using DHTs Applications built using DHTs Tracker-less BitTorrent Tracker-less BitTorrent Beehive --- a P2P based DNS system Beehive --- a P2P based DNS system

Directory Lookups: Unstructured Networks Example: Gnutella Example: Gnutella Support more flexible queries Support more flexible queries Typically, precise “name” search is a small portion of all queries Typically, precise “name” search is a small portion of all queries Simplicity Simplicity High resilience against node failures High resilience against node failures Problems: Scalability Problems: Scalability Flooding  # of messages ~ O(N*E) Flooding  # of messages ~ O(N*E)

Flooding-Based Searches...... Duplication increases as TTL increases in flooding Duplication increases as TTL increases in flooding Worst case: a node A is interrupted by N * q * degree(A) messages Worst case: a node A is interrupted by N * q * degree(A) messages 1 2 3 4 5 6 78

Problems with Simple TTL- Based Flooding Hard to choose TTL: Hard to choose TTL: For objects that are widely present in the network, small TTLs suffice For objects that are widely present in the network, small TTLs suffice For objects that are rare in the network, large TTLs are necessary For objects that are rare in the network, large TTLs are necessary Number of query messages grow exponentially as TTL grows Number of query messages grow exponentially as TTL grows

Idea #1: Adaptively Adjust TTL “Expanding Ring” “Expanding Ring” Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Success varies by network topology Success varies by network topology

Idea #2: Random Walk Simple random walk Simple random walk takes too long to find anything! takes too long to find anything! Multiple-walker random walk Multiple-walker random walk N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps When to terminate the search: check back with the query originator once every C steps When to terminate the search: check back with the query originator once every C steps

Flexible Replication In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters Limited node storage => what’s the optimal replication density distribution? Limited node storage => what’s the optimal replication density distribution? In Gnutella, only nodes who query an object store it => r i  p i In Gnutella, only nodes who query an object store it => r i  p i What if we have different replication strategies? What if we have different replication strategies?

Optimal r i Distribution Goal: minimize  ( p i / r i ), where  r i =R Goal: minimize  ( p i / r i ), where  r i =R Calculation: Calculation: introduce Lagrange multiplier, find r i and that minimize: introduce Lagrange multiplier, find r i and that minimize:  ( p i / r i ) + * (  r i - R)  ( p i / r i ) + * (  r i - R) => - p i / r i 2 = 0 for all i => - p i / r i 2 = 0 for all i => r i   p i => r i   p i

Square-Root Distribution General principle: to minimize  ( p i / r i ) under constraint  r i =R, make r i proportional to square root of p i General principle: to minimize  ( p i / r i ) under constraint  r i =R, make r i proportional to square root of p i Other application examples: Other application examples: Bandwidth allocation to minimize expected download times Bandwidth allocation to minimize expected download times Server load balancing to minimize expected request latency Server load balancing to minimize expected request latency

Achieving Square-Root Distribution Suggestions from some heuristics Suggestions from some heuristics Store an object at a number of nodes that is proportional to the number of node visited in order to find the object Store an object at a number of nodes that is proportional to the number of node visited in order to find the object Each node uses random replacement Each node uses random replacement Two implementations: Two implementations: Path replication: store the object along the path of a successful “walk” Path replication: store the object along the path of a successful “walk” Random replication: store the object randomly among nodes visited by the agents Random replication: store the object randomly among nodes visited by the agents

KaZaa Use Supernodes Use Supernodes Regular Nodes : Supernodes = 100 : 1 Regular Nodes : Supernodes = 100 : 1 Simple way to scale the system by a factor of 100 Simple way to scale the system by a factor of 100

BitTorrent-Inspired Research Studies

Modeling and Understanding BitTorrent Analysis based on modeling Analysis based on modeling View it as a type of Gossip Algorithm View it as a type of Gossip Algorithm Usually do not model the Tit-for-Tat aspects Usually do not model the Tit-for-Tat aspects Assume perfectly connected networks Assume perfectly connected networks Statistical modeling techniques Statistical modeling techniques Mostly published in PODC or SIGMETRICS Mostly published in PODC or SIGMETRICS Simulation Studies Simulation Studies Different assumption of bottlenecks Different assumption of bottlenecks Varying details of the modeling of the data transfer Varying details of the modeling of the data transfer Published in ICDCS and SIGCOMM Published in ICDCS and SIGCOMM

Studies on Effect of BitTorrent on ISPs Observation: P2P contributes to cross-ISP traffic Observation: P2P contributes to cross-ISP traffic SIGCOMM 2006 publication on studies in Japan backbone traffic SIGCOMM 2006 publication on studies in Japan backbone traffic Attempts to improve network locality of BitTorrent-like applications Attempts to improve network locality of BitTorrent-like applications ICDCS 2006 publicatoin ICDCS 2006 publicatoin Academic P2P file sharing systems Academic P2P file sharing systems Bullet, Julia, etc. Bullet, Julia, etc.

Techniques to Alleviate the “Last Missing Piece” Problem Apply Network Coding to pieces exchanged between peers Apply Network Coding to pieces exchanged between peers Pablo Rodriguez Rodriguez, Microsoft Research (recently moved to Telefonica Research) Pablo Rodriguez Rodriguez, Microsoft Research (recently moved to Telefonica Research) Use a different piece-replication strategy Use a different piece-replication strategy Dahlia Makhi, Microsoft Research Dahlia Makhi, Microsoft Research “On Collaborative Content Distribution Using Multi- Message Gossip” “On Collaborative Content Distribution Using Multi- Message Gossip” Associate “age” with file segments Associate “age” with file segments

Network Coding Main Feature Main Feature Allowing intermediate nodes to encode packets Allowing intermediate nodes to encode packets Making optimal use of the available network resources Making optimal use of the available network resources Similar Technique: Erasure Codes Similar Technique: Erasure Codes Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols

Network Coding in P2P: The Model Server Server Dividing the file into k blocks Dividing the file into k blocks Uploading blocks at random to different clients Uploading blocks at random to different clients Clients (Users) Clients (Users) Collaborating with each other to assemble the blocks and reconstruct the original file Collaborating with each other to assemble the blocks and reconstruct the original file Exchanging information and data with only a small subset of others (neighbors) Exchanging information and data with only a small subset of others (neighbors) Symmetric neighborhood and links Symmetric neighborhood and links

Network Coding in P2P Assume a node with blocks B1, B2, …, Bk Assume a node with blocks B1, B2, …, Bk Pick random numbers C1, C2, …, Ck Pick random numbers C1, C2, …, Ck Construct new block Construct new block E = C1 * B1 + C2 * B2 + … + Ck * Bk Send E and (C1, C2, …, Ck) to neighbor Send E and (C1, C2, …, Ck) to neighbor Decoding: collect enough linearly independent E’s, solve the linear system Decoding: collect enough linearly independent E’s, solve the linear system If all nodes pick vector C randomly, chances are high that after receiving ~K blocks, can recover B1 through Bk If all nodes pick vector C randomly, chances are high that after receiving ~K blocks, can recover B1 through Bk

P2P Live Streaming

Motivations Internet Applications: Internet Applications: PPLive, PPStream, etc. PPLive, PPStream, etc. Challenge: QoS Issues Challenge: QoS Issues Raw bandwidth constraints Raw bandwidth constraints Example: PPLive utilizes the significant bandwidth disparity between “Univeristy nodes” and “Residential nodes” Example: PPLive utilizes the significant bandwidth disparity between “Univeristy nodes” and “Residential nodes” Satisfying demand of content publishers Satisfying demand of content publishers

P2P Live Streaming Can’t Stand on Its Own P2P as a complement to IP-Multicast P2P as a complement to IP-Multicast Used where IP-Multicast isn’t enabled Used where IP-Multicast isn’t enabled P2P as a way to reduce server load P2P as a way to reduce server load By sourcing parts of streams from peers, server load might be reduced by 10% By sourcing parts of streams from peers, server load might be reduced by 10% P2P as a way to reduce backbone bandwidth requirements P2P as a way to reduce backbone bandwidth requirements When core network bandwidth isn’t sufficient When core network bandwidth isn’t sufficient

P2P and Net- Neutrality

It’s All TCP’s Fault TCP: per-flow fairness TCP: per-flow fairness Browsers Browsers 2-4 TCP flows per web server 2-4 TCP flows per web server Contact a few web servers at a time Contact a few web servers at a time Short flows Short flows P2P applications: P2P applications: Much higher number of TCP connections Much higher number of TCP connections Many more endpoints Many more endpoints Long flows Long flows

When and How to Apply Traffic Shaping Current practice: application recognition Current practice: application recognition Needs: Needs: An application ignostic way to trigger the traffic shaping An application ignostic way to trigger the traffic shaping A clear statement to users on what happens A clear statement to users on what happens

Brief Overview of Academic Research on P2P Pei Cao.

Similar presentations

Presentation on theme: "Brief Overview of Academic Research on P2P Pei Cao."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Brief Overview of Academic Research on P2P Pei Cao.

Similar presentations

Presentation on theme: "Brief Overview of Academic Research on P2P Pei Cao."— Presentation transcript:

Similar presentations

About project

Feedback