Download presentation
Presentation is loading. Please wait.
Published byMuriel Craig Modified over 9 years ago
1
Peer-to-peer Courtesy of Luciano Bononi with Univ. of Bologna Michael Nelson with Old Dominion Univ. Anthony Joseph with Berkeley
3
What is Peer-to-Peer? P2P is a communications model in which each party has the same capabilities and either party can initiate a communication session. Whatis.com A type of network in which each workstation has equivalent capabilities and responsibilities. Webopedia.com A P2P computer network refers to any network that does not have fixed clients and servers, but a number of peer nodes that function as both clients and servers to other nodes on the network. Wikipedia.org
4
What is P2P? “ Peer-to-peer is a class of applications that takes advantage of resources -- storage, cycles, content, human presence -- available at the edges of the Internet. ” –Clay Shirky
5
What is P2P? “ Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside DNS and have significant or total autonomy from central servers ” –Clay Shirky
6
Is Peer-to-peer new? Certainly doesn ’ t seem like it –What about Usenet? News groups first truly decentralized system –DNS? Handles huge number of clients –Basic IP? Vastly decentralized, many equivalent routers One view: P2P is a reverting to the old internet –Once upon a time, all members on the internet were trusted. –Every machine had an IP address. –Every machine was a client and server. –Many machines were routers.
7
P2P now What is new? –Scale: people are envisioning much larger scale –Security: Systems must deal with privacy and integrity –Anonymity/deniability: Protect identity and prevent censorship –(In)Stability: Deal with unstable components as the edges But, can systems designed this way be more stable?
8
Why the hype??? File Sharing: Napster, Gnutella, KaZaa, etc –Is this peer-to-peer? Hard to say. –Suddenly people could contribute to active global network –Served a high-demand niche: online jukebox Anonymity/Privacy/Anarchy: FreeNet, Publis, etc –Libertarian dream of freedom from the man –Extremely valid concern of Censorship/Privacy –In search of copyright violators, RIAA challenging rights to privacy RIAA: Recording Industry Association of America
9
Hype? (cont ’ d) Computing: The Grid –Scavenge the numerous free cycles of the world to do work –Seti@Home: most visible version of this Management: Businesses –Suddenly Businesses have discovered extremely distributed computing IDC –Does P2P mean “ self-configuring ” from equivalent resources?
11
P2P Litmus Test 1.Does it allow for variable connectivity and temporary network addresses? 2.Does it give the nodes at the edges of the network significant autonomy?
12
Client-Server query=DJ Shadow query=Veloce query=Thievery Corporationquery=Radiohead
13
Hybrid: C-S & P2P query=Yo La Tengo query=Spinal Tap
14
Full P2P query=Cut Chemist
15
Better? Or Just Different? What scenarios would you prefer C-S? What scenarios would you prefer P2P? Hybrids?
16
P2P - the cure for what ails you? “ decentralization is a tool, not a goal ” if P2P doesn ’ t make sense for a project, don ’ t use it … the “ disruptive technology ” will be assimilated disruptive technology is a new, lower performance, but less expensive product; sustaining technology provides improved performance and will almost always be incorporated into the incumbent's product
17
“ The P in P2P is People ” Understanding how people operate is critical to knowing how P2P systems operate –P2P is, after all, a reflection of people ’ s desire to communicate with each other –social networks …
18
It ’ s a Small World Stanley Milgram ’ s “ small world ” experiment –Stanley Milgram: The Small World Problem, Psychology Today, 1(1), 60- 67 (1967) –range = 2-10, median = 5 http://backspaces.net/PLaw/ Oracle of Bacon –http://www.cs.virginia.edu/oracle/ Cf. The Erd ö s Number Project –http://www.oakland.edu/enp/
19
Graphs of Social Networks A node has K neighbors (edges) N is the number of edges among neighbors of the node, C=N/(K*(K-1)/2) is a clustering coefficient L is the average distance between nodes http://backspaces.net/PLaw/
21
Zipf ’ s Law P k ~ Ck -a where 1.P k = frequency of k th ranked item; 2.C = a corpus-wide constant; 3.a ~ 1 –Zipf was a linguist who sought to describe the frequency of words in language http://linkage.rockefeller.edu/wli/zipf/
22
Zipf ’ s Law http://www.sewanee.edu/Phy_Students/123_Spring01/schnejm0/PROJECT.html
23
Power Law (Pareto ’ s Law) a formalization of the “ 80/20 Rule ” A power-law implies that small occurrences are extremely common, whereas large instances are extremely rare P x ~ x -k http://ginger.hpl.hp.com/shl/papers/ranking/ranking.html.
24
Why Do We Care? This shows up in web page linkage: And it shows up in P2P usage: –Adar & Huberman, “ Free Riding on Gnutella ” http://www.firstmonday.dk/issues/issue5_10/adar/ Summary: if people are involved, “ small world ” and “ power law ” effects will be observed … –design your systems accordingly
25
P2P can solve this?
31
Napster program for sharing files over the Internet
32
Napster: how does it work? Application-level, client-server protocol over point- to-point TCP Four steps: Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select “ best ” of correct answers. (pings)
33
Napster napster.com users File list is uploaded 1.
34
Napster napster.com user Request and results User requests search at server. 2.
35
Napster napster.com user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.
36
Napster napster.com user Retrieves file User retrieves file 4.
37
Napster: architecture notes centralized server: –single logical point of failure –can load balance among servers using DNS rotation –potential for congestion –Napster “ in control ” (freedom is an illusion) no security: –passwords in plain text –no authentication –no anonymity
38
Case studies from now on What we care about: –How much traffic does one query generate? –how many hosts can it support at once? –What is the latency associated with querying? –Is there a bottleneck?
39
Gnutella peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files servent == server and client each application instance serves to: –store selected files –route queries (file searches) from and to its neighboring peers –respond to queries (serve file) if file stored locally
41
Gnutella
42
Gnutella: how it works Searching by flooding: If you don ’ t have the file you want, query 7 of your partners. If they don ’ t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice. Reverse path forwarding
43
Flooding in Gnutella: loop prevention Seen already list: “A”
44
Gnutella message format Message ID: 16 bytes FunctionID: 1 byte indicating –00 ping: used to probe gnutella network for hosts –01 pong: used to reply to ping, return # files shared –80 query: search string, and desired minimum bandwidth –81 query hit: indicating matches to 80:query, my IP address/port, available bandwidth RemainingTTL: decremented at each peer to prevent TTL-scoped flooding HopsTaken: number of peer visited so far by this message DataLength: length of data field
45
Gnutella: initial problems and fixes Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing. –Block file-serving to browser-based non-file-sharing users Prematurely terminated downloads: –long download times over modems –modem users run gnutella peer only briefly (Napster problem also!) or any users becomes overloaded –fix: peer can reply “ I have it, but I am busy. Try again later ”
46
A More Centralized Gnutella? Reflectors –maintains an index of its neighbors –does not re-transmit the query, but answers from its own index “ mini-napster ” prelude to “ super-nodes ” Host caches –bootstrapping your connection –more convenient for users, but it doesn ’ t produce a nice random graph everyone ends up in the tightly connected cell
47
Gnutella - Power Law? figures 5&6 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
48
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
49
Motivation How to find data in a distributed file sharing system? o Lookup is the key problem Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client ?
50
Centralized Solution o Requires O(M) state o Single point of failure Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client DB o Central server (Napster)
51
Distributed Solution (1) o Worst case O(N) messages per lookup Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Flooding (Gnutella, Morpheus, etc.)
52
Distributed Solution (2) o Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Only exact matches
53
Routing Challenges o Define a useful key nearness metric o Keep the hop count small o Keep the routing tables “right size” o Stay robust despite rapid changes in membership
54
Chord Overview o Provides peer-to-peer hash lookup service: o Lookup(key) IP address o Chord does not store the data o How does Chord locate a node? o How does Chord maintain routing tables? o How does Chord cope with changes in membership?
55
Chord IDs o m bit identifier space for both keys and nodes o Key identifier = SHA-1(key) Key=“LetItBe” ID=60 SHA-1 IP=“198.10.10.1” ID=123 SHA-1 o Node identifier = SHA-1(IP address) o Both are uniformly distributed o How to map key IDs to node IDs?
56
Consistent Hashing [Karger 97] o A key is stored at its successor: node with next higher ID N32 N90 N123 K20 K5 Circular 7-bit ID space 0 IP=“198.10.10.1” K101 K60 Key=“LetItBe”
57
Consistent Hashing o Every node knows of every other node o requires global information o Routing tables are large O(N) o Lookups are fast O(1) N32 N90 N123 0 Hash(“LetItBe”) = K60 N10 N55 Where is “LetItBe”? “N90 has K60” K60
58
Chord: Basic Lookup N32 N90 N123 0 Hash(“LetItBe”) = K60 N10 N55 Where is “LetItBe”? “N90 has K60” K60 o Every node knows its successor in the ring o requires O(N) time
59
“Finger Tables” o Every node knows m other nodes in the ring o Increase distance exponentially N80 80 + 2 0 N112 N96 N16 80 + 2 1 80 + 2 2 80 + 2 3 80 + 2 4 80 + 2 5 80 + 2 6
60
“Finger Tables” o Finger i points to successor of n+2 i N120 N80 80 + 2 0 N112 N96 N16 80 + 2 1 80 + 2 2 80 + 2 3 80 + 2 4 80 + 2 5 80 + 2 6
61
Lookups are Faster o Lookups take O(Log N) hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19
62
Chord properties o Efficient: O(Log N) messages per lookup o N is the total number of servers o Scalable: O(Log N) state per node o Robust: survives massive changes in membership o Proofs are in paper / tech report o Assuming no malicious participants
63
Joining the Ring o Three step process: o Initialize all fingers of new node o Update fingers of existing nodes o Transfer keys from successor to new node o Less aggressive mechanism (lazy finger update): o Initialize only the finger to successor node o Periodically verify immediate successor, predecessor o Periodically refresh finger table entries
64
Joining the Ring - Step 1 o Initialize the new node finger table o Locate any node p in the ring o Ask node p to lookup fingers of new node N36 o Return results to new node N36 1. Lookup(37,38,40,…,100,164) N60 N40 N5 N20 N99 N80
65
Joining the Ring - Step 2 o Updating fingers of existing nodes o new node calls update function on existing nodes o existing nodes can recursively update fingers of other nodes N36 N60 N40 N5 N20 N99 N80
66
Joining the Ring - Step 3 o Transfer keys from successor node to new node o only keys in the range are transferred Copy keys 21..36 from N40 to N36 K30 K38 N36 N60 N40 N5 N20 N99 N80 K30 K38
67
Handing Failures o Failure of nodes might cause incorrect lookup N120 N113 N102 N80 N85 N10 Lookup(90) o N80 doesn’t know correct successor, so lookup fails o Successor fingers are enough for correctness
68
Handling Failures o Use successor list o Each node knows r immediate successors o After failure, will know first live successor o Correct successors guarantee correct lookups o Guarantee is with some probability o Can choose r to make probability of lookup failure arbitrarily small
69
Evaluation Overview o Quick lookup in large systems o Low variation in lookup costs o Robust despite massive failure o Experiments confirm theoretical results
70
Cost of lookup o Cost is O(Log N) as predicted by theory o constant is 1/2 Number of Nodes Average Messages per Lookup
71
Robustness o Simulation results: static scenario o Failed lookup means original node with key failed (no replica of keys) o Result implies good balance of keys among nodes!
72
Robustness o Simulation results: dynamic scenario o Failed lookup means finger path has a failed node o 500 nodes initially o average stabilize( ) call 30s o 1 lookup per second (Poisson) o x join/fail per second (Poisson)
73
Current implementation o Chord library: 3,000 lines of C++ o Deployed in small Internet testbed o Includes: o Correct concurrent join/fail o Proximity-based routing for low delay (?) o Load control for heterogeneous nodes (?) o Resistance to spoofed node IDs (?)
74
Strengths o Based on theoretical work (consistent hashing) o Proven performance in many different aspects o “with high probability” proofs o Robust (Is it?)
75
Weakness o NOT that simple (compared to CAN) o Member joining is complicated o aggressive mechanisms requires too many messages and updates o no analysis of convergence in lazy finger mechanism o Key management mechanism mixed between layers o upper layer does insertion and handle node failures o Chord transfer keys when node joins (no leave mechanism!) o Routing table grows with # of members in group o Worst case lookup can be slow
76
Discussions o Network proximity (consider latency?) o Protocol security o Malicious data insertion o Malicious Chord table information o Keyword search and indexing o...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.