Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica.

Similar presentations


Presentation on theme: "Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica."— Presentation transcript:

1 Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica

2 2 Why Study P2P  Huge fraction of traffic on networks today  >=50%!  Exciting new applications  Next level of resource sharing  Vs. timesharing, client-server  E.g. Access 10’s-100’s of TB at low cost.

3 3 Share of Internet Traffic

4 4 Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gaining share from FastTrack (Kazaa).

5 5 What is P2P used for?  Use resources of end-hosts to accomplish a shared task  Typically share files  Play game  Search for patterns in data (Seti@Home)

6 6 What’s new?  Taking advantage of resources at the edge of the network  Fundamental shift in computing capability  Increase in absolute bandwidth over WAN

7 7 Peer to Peer  Systems:  Napster  Gnutella  KaZaA  BitTorrent  Chord

8 8 Key issues for P2P systems  Join/leave  How do nodes join/leave? Who is allowed?  Search and retrieval  How to find content?  How are metadata indexes built, stored, distributed?  Content Distribution  Where is content stored? How is it downloaded and retrieved?

9 9 4 Key Primitives Join – How to enter/leave the P2P system? Publish – How to advertise a file? Search how to find a file? Fetch – how to download a file?

10 10 Publish and Search  Basic strategies:  Centralized (Napster)  Flood the query (Gnutella)  Route the query(Chord)  Different tradeoffs depending on application  Robustness, scalability, legal issues

11 11 Napster: History  In 1999, S. Fanning launches Napster  Peaked at 1.5 million simultaneous users  Jul 2001, Napster shuts down

12 12 Napster: Overiew  Centralized Database:  Join: on startup, client contacts central server  Publish: reports list of files to central server  Search: query the server => return someone that stores the requested file  Fetch: get the file directly from peer

13 13 Napster: Publish I have X, Y, and Z! Publish insert(X, 123.2.21.23)... 123.2.21.23

14 14 Napster: Search Where is file A? Query Reply search(A) --> 123.2.0.18 Fetch 123.2.0.18

15 15 Napster: Discussion  Pros:  Simple  Search scope is O(1)  Controllable (pro or con?)  Cons:  Server maintains O(N) State  Server does all processing  Single point of failure

16 16 Gnutella: History  In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella  Soon many other clients: Bearshare, Morpheus, LimeWire, etc.  In 2001, many protocol enhancements including “ultrapeers”

17 17 Gnutella: Overview  Query Flooding:  Join: on startup, client contacts a few other nodes; these become its “neighbors”  Publish: no need  Search: ask neighbors, who as their neighbors, and so on... when/if found, reply to sender.  Fetch: get the file directly from peer

18 18 I have file A. Gnutella: Search Where is file A? Query Reply

19 19 Gnutella: Discussion  Pros:  Fully de-centralized  Search cost distributed  Cons:  Search scope is O(N)  Search time is O(???)  Nodes leave often, network unstable

20 20 Aside: Search Time?

21 21 Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL

22 22 Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002

23 23 KaZaA: History  In 2001, KaZaA created by Dutch company Kazaa BV  Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.  Eventually protocol changed so other clients could no longer talk to it  Most popular file sharing network today with >10 million users (number varies)

24 24 KaZaA: Overview  “Smart” Query Flooding:  Join: on startup, client contacts a “supernode”... may at some point become one itself  Publish: send list of files to supernode  Search: send query to supernode, supernodes flood query amongst themselves.  Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers

25 25 KaZaA: Network Design “Super Nodes”

26 26 KaZaA: File Insert I have X! Publish insert(X, 123.2.21.23)... 123.2.21.23

27 27 KaZaA: File Search Where is file A? Query search(A) --> 123.2.0.18 search(A) --> 123.2.22.50 Replies 123.2.0.18 123.2.22.50

28 28 KaZaA: Discussion  Pros:  Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?)  Rumored to take into account network locality  Cons:  Mechanisms easy to circumvent  Still no real guarantees on search scope or search time

29 29 P2P systems  Napster  Launched P2P  Centralized index  Gnutella:  Focus is simple sharing  Using simple flooding  Kazaa  More intelligent query routing  BitTorrent  Focus on Download speed, fairness in sharing

30 30 BitTorrent: History  In 2002, B. Cohen debuted BitTorrent  Key Motivation:  Popularity exhibits temporal locality (Flash Crowds)  E.g., Slashdot effect, CNN on 9/11, new movie/game release  Focused on Efficient Fetching, not Searching:  Distribute the same file to all peers  Single publisher, multiple downloaders  Has some “real” publishers:  Blizzard Entertainment using it to distribute the beta of their new game

31 31 BitTorrent: Overview  Swarming:  Join: contact centralized “tracker” server, get a list of peers.  Publish: Run a tracker server.  Search: Out-of-band. E.g., use Google to find a tracker for the file you want.  Fetch: Download chunks of the file from your peers. Upload chunks you have to them.

32 32

33 33 BitTorrent: Sharing Strategy  Employ “Tit-for-tat” sharing strategy  “I’ll share with you if you share with me”  Be optimistic: occasionally let freeloaders download Otherwise no one would ever start! Also allows you to discover better peers to download from when they reciprocate

34 34 BitTorrent: Summary  Pros:  Works reasonably well in practice  Gives peers incentive to share resources; avoids freeloaders  Cons:  Central tracker server needed to bootstrap swarm (is this really necessary?)

35 35 DHT: History  In 2000-2001, academic researchers said “we want to play too!”  Motivation:  Frustrated by popularity of all these “half-baked” P2P apps :)  We can do better! (so we said)  Guaranteed lookup success for files in system  Provable bounds on search time  Provable scalability to millions of node  Hot Topic in networking ever since

36 36 DHT: Overview  Abstraction: a distributed “hash-table” (DHT) data structure:  put(id, item);  item = get(id);  Implementation: nodes in system form a distributed data structure  Can be Ring, Tree, Hypercube, Skip List, Butterfly Network,...

37 37 DHT: Overview (2)  Structured Overlay Routing:  Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id  Publish: Route publication for file id toward a close node id along the data structure  Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication.  Fetch: P2P fetching

38 Tapestry: a DHT-based P2P system focusing on fault-tolerance Danfeng (Daphne) Yao

39 39 A very big picture – P2P networking  Observe  Dynamic nature of the computing environment  Network Expansion  Require  Robustness, or fault tolerance  Scalability  Self-administration, or self-organization

40 40 Goals to achieve in Tapestry  Decentralization  Routing uses local data  Efficiency, scalability  Robust routing mechanism  Resilient to node failures  An easy way to locate objects

41 41 IDs in Tapestry  NodeID  ObjectID  IDs are computed using a hash function  The hash function is a system-wide parameter  Use ID for routing  Locating an object  Finding a node  A node stores its neighbors’ IDs

42 42 Tapestry routing: 0325  4598 Suffix-based routing

43 43 Each object is assigned a root node  Root node: a unique node for object O with the longest matching suffix  Root knows where O is rootId = O Copy of O Publish: node S routes a msg to root of O Closest location is stored if exist multiple copies Location query: a msg for O is routed towards O’s root How to choose the root node for an object? S

44 44 Surrogate routing: unique mapping without global knowledge  Problem: how to choose a unique root node of an object deterministically  Surrogate routing  Choose an object’s root (surrogate) node such that whose nodeId is the closest to the objectId  Small overhead, but no need of global knowledge  The number of additional hops is small

45 45 Surrogate example: find the root node for objectId = 6534 2234 1114 2534 6534 object root 7534 6534 empty 7534 6534 empty …….

46 46 Tapestry: adaptable, fault-resilient, self-management  Basic routing and location  Backpointer list in neighbor map  Multiple mappings are stored  More semantic flexibility: selection operator for choosing returned objects

47 47 A single Tapestry node Neighbor Map For “1732” (Octal) 1234 xxx1 1732 xxx0 xxx3 xxx4 xxx5 xxx6 xxx7 xx02 1732 xx22 xx32 xx42 xx52 xx62 xx72 x032 x132 x232 x332 x432 x532 x632 1732 0732 1732 2732 3732 4732 5732 6732 7732 Object location Pointers Hotspot monitor Object store

48 48 Fault-tolerant routing: detect, operate, and recover  Expected faults  Server outages (high load, hw/sw faults)  Link failures (router hw/sw faults)  Neighbor table corruption at the server  Detect: TCP timeout, heartbeats to nodes on backpointer list  Operate under faults: backup neighbors  Second chance for failed server: probing msgs

49 49 Fault-tolerant location: avoid single point of failure  Multiple roots for each object  rootId = f(objectId + salt)  Redundant, but reliable  Storage servers periodically republish location info  Cached location info on routers times out  New and recovered objects periodically advertise their location info

50 50 Dynamic node insertion  Populate new node’s neighbor maps  Routing messages to its own nodeId  Copy and optimize neighbor maps  Inform relevant nodes of its entries to update neighbor maps

51 51 Dynamic deletion  Inform relevant nodes using backptrs  Or rely on soft-state (stop heartbeats)  In general, Tapestry expects small number of dynamic insertions/deletions

52 52 Fault handling outline  Design choice  Soft state for graceful fault recovery  Soft-state  Caches: updated by periodic refreshment msgs, or purged if receiving no such msgs  Faults are expected in Tapestry  Fault-tolerant routing  Fault-tolerant location  Surrogate routing

53 53 Tapestry Conclusions  Decentralized location and routing  Distributed algorithms for object-root mapping, node insertion/deletion  Fault-handling with redundancy  Per node routing table size: bLog b (N) N = size of namespace  Find object in Log b (n) overlay hops n = # of physical nodes


Download ppt "Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica."

Similar presentations


Ads by Google