Download presentation
Presentation is loading. Please wait.
1
Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica
2
2 Why Study P2P Huge fraction of traffic on networks today >=50%! Exciting new applications Next level of resource sharing Vs. timesharing, client-server E.g. Access 10’s-100’s of TB at low cost.
3
3 Share of Internet Traffic
4
4 Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gaining share from FastTrack (Kazaa).
5
5 What is P2P used for? Use resources of end-hosts to accomplish a shared task Typically share files Play game Search for patterns in data (Seti@Home)
6
6 What’s new? Taking advantage of resources at the edge of the network Fundamental shift in computing capability Increase in absolute bandwidth over WAN
7
7 Peer to Peer Systems: Napster Gnutella KaZaA BitTorrent Chord
8
8 Key issues for P2P systems Join/leave How do nodes join/leave? Who is allowed? Search and retrieval How to find content? How are metadata indexes built, stored, distributed? Content Distribution Where is content stored? How is it downloaded and retrieved?
9
9 4 Key Primitives Join – How to enter/leave the P2P system? Publish – How to advertise a file? Search how to find a file? Fetch – how to download a file?
10
10 Publish and Search Basic strategies: Centralized (Napster) Flood the query (Gnutella) Route the query(Chord) Different tradeoffs depending on application Robustness, scalability, legal issues
11
11 Napster: History In 1999, S. Fanning launches Napster Peaked at 1.5 million simultaneous users Jul 2001, Napster shuts down
12
12 Napster: Overiew Centralized Database: Join: on startup, client contacts central server Publish: reports list of files to central server Search: query the server => return someone that stores the requested file Fetch: get the file directly from peer
13
13 Napster: Publish I have X, Y, and Z! Publish insert(X, 123.2.21.23)... 123.2.21.23
14
14 Napster: Search Where is file A? Query Reply search(A) --> 123.2.0.18 Fetch 123.2.0.18
15
15 Napster: Discussion Pros: Simple Search scope is O(1) Controllable (pro or con?) Cons: Server maintains O(N) State Server does all processing Single point of failure
16
16 Gnutella: History In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella Soon many other clients: Bearshare, Morpheus, LimeWire, etc. In 2001, many protocol enhancements including “ultrapeers”
17
17 Gnutella: Overview Query Flooding: Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: ask neighbors, who as their neighbors, and so on... when/if found, reply to sender. Fetch: get the file directly from peer
18
18 I have file A. Gnutella: Search Where is file A? Query Reply
19
19 Gnutella: Discussion Pros: Fully de-centralized Search cost distributed Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable
20
20 Aside: Search Time?
21
21 Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL
22
22 Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002
23
23 KaZaA: History In 2001, KaZaA created by Dutch company Kazaa BV Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. Eventually protocol changed so other clients could no longer talk to it Most popular file sharing network today with >10 million users (number varies)
24
24 KaZaA: Overview “Smart” Query Flooding: Join: on startup, client contacts a “supernode”... may at some point become one itself Publish: send list of files to supernode Search: send query to supernode, supernodes flood query amongst themselves. Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers
25
25 KaZaA: Network Design “Super Nodes”
26
26 KaZaA: File Insert I have X! Publish insert(X, 123.2.21.23)... 123.2.21.23
27
27 KaZaA: File Search Where is file A? Query search(A) --> 123.2.0.18 search(A) --> 123.2.22.50 Replies 123.2.0.18 123.2.22.50
28
28 KaZaA: Discussion Pros: Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?) Rumored to take into account network locality Cons: Mechanisms easy to circumvent Still no real guarantees on search scope or search time
29
29 P2P systems Napster Launched P2P Centralized index Gnutella: Focus is simple sharing Using simple flooding Kazaa More intelligent query routing BitTorrent Focus on Download speed, fairness in sharing
30
30 BitTorrent: History In 2002, B. Cohen debuted BitTorrent Key Motivation: Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game release Focused on Efficient Fetching, not Searching: Distribute the same file to all peers Single publisher, multiple downloaders Has some “real” publishers: Blizzard Entertainment using it to distribute the beta of their new game
31
31 BitTorrent: Overview Swarming: Join: contact centralized “tracker” server, get a list of peers. Publish: Run a tracker server. Search: Out-of-band. E.g., use Google to find a tracker for the file you want. Fetch: Download chunks of the file from your peers. Upload chunks you have to them.
32
32
33
33 BitTorrent: Sharing Strategy Employ “Tit-for-tat” sharing strategy “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download Otherwise no one would ever start! Also allows you to discover better peers to download from when they reciprocate
34
34 BitTorrent: Summary Pros: Works reasonably well in practice Gives peers incentive to share resources; avoids freeloaders Cons: Central tracker server needed to bootstrap swarm (is this really necessary?)
35
35 DHT: History In 2000-2001, academic researchers said “we want to play too!” Motivation: Frustrated by popularity of all these “half-baked” P2P apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node Hot Topic in networking ever since
36
36 DHT: Overview Abstraction: a distributed “hash-table” (DHT) data structure: put(id, item); item = get(id); Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly Network,...
37
37 DHT: Overview (2) Structured Overlay Routing: Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id along the data structure Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. Fetch: P2P fetching
38
Tapestry: a DHT-based P2P system focusing on fault-tolerance Danfeng (Daphne) Yao
39
39 A very big picture – P2P networking Observe Dynamic nature of the computing environment Network Expansion Require Robustness, or fault tolerance Scalability Self-administration, or self-organization
40
40 Goals to achieve in Tapestry Decentralization Routing uses local data Efficiency, scalability Robust routing mechanism Resilient to node failures An easy way to locate objects
41
41 IDs in Tapestry NodeID ObjectID IDs are computed using a hash function The hash function is a system-wide parameter Use ID for routing Locating an object Finding a node A node stores its neighbors’ IDs
42
42 Tapestry routing: 0325 4598 Suffix-based routing
43
43 Each object is assigned a root node Root node: a unique node for object O with the longest matching suffix Root knows where O is rootId = O Copy of O Publish: node S routes a msg to root of O Closest location is stored if exist multiple copies Location query: a msg for O is routed towards O’s root How to choose the root node for an object? S
44
44 Surrogate routing: unique mapping without global knowledge Problem: how to choose a unique root node of an object deterministically Surrogate routing Choose an object’s root (surrogate) node such that whose nodeId is the closest to the objectId Small overhead, but no need of global knowledge The number of additional hops is small
45
45 Surrogate example: find the root node for objectId = 6534 2234 1114 2534 6534 object root 7534 6534 empty 7534 6534 empty …….
46
46 Tapestry: adaptable, fault-resilient, self-management Basic routing and location Backpointer list in neighbor map Multiple mappings are stored More semantic flexibility: selection operator for choosing returned objects
47
47 A single Tapestry node Neighbor Map For “1732” (Octal) 1234 xxx1 1732 xxx0 xxx3 xxx4 xxx5 xxx6 xxx7 xx02 1732 xx22 xx32 xx42 xx52 xx62 xx72 x032 x132 x232 x332 x432 x532 x632 1732 0732 1732 2732 3732 4732 5732 6732 7732 Object location Pointers Hotspot monitor Object store
48
48 Fault-tolerant routing: detect, operate, and recover Expected faults Server outages (high load, hw/sw faults) Link failures (router hw/sw faults) Neighbor table corruption at the server Detect: TCP timeout, heartbeats to nodes on backpointer list Operate under faults: backup neighbors Second chance for failed server: probing msgs
49
49 Fault-tolerant location: avoid single point of failure Multiple roots for each object rootId = f(objectId + salt) Redundant, but reliable Storage servers periodically republish location info Cached location info on routers times out New and recovered objects periodically advertise their location info
50
50 Dynamic node insertion Populate new node’s neighbor maps Routing messages to its own nodeId Copy and optimize neighbor maps Inform relevant nodes of its entries to update neighbor maps
51
51 Dynamic deletion Inform relevant nodes using backptrs Or rely on soft-state (stop heartbeats) In general, Tapestry expects small number of dynamic insertions/deletions
52
52 Fault handling outline Design choice Soft state for graceful fault recovery Soft-state Caches: updated by periodic refreshment msgs, or purged if receiving no such msgs Faults are expected in Tapestry Fault-tolerant routing Fault-tolerant location Surrogate routing
53
53 Tapestry Conclusions Decentralized location and routing Distributed algorithms for object-root mapping, node insertion/deletion Fault-handling with redundancy Per node routing table size: bLog b (N) N = size of namespace Find object in Log b (n) overlay hops n = # of physical nodes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.