Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer to Peer and Distributed Hash Tables
Course on Computer Communication and Networks Lecture 10 Chapter 2; peer-to-peer applications (and network overlays) EDA344/DIT 420, CTH/GU.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
L-19 P2P. Scaling Problem Millions of clients  server and network meltdown 2.
Peer-to-Peer
Peer-to-Peer Jeff Pang Spring Spring 2004, Jeff Pang2 Intro Quickly grown in popularity –Dozens or hundreds of file sharing applications.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Peer-to-Peer Credits: Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D. Qui, R. Shrikant, Xiaoyan Li and R. Martin.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Object Naming & Content based Object Search 2/3/2003.
Peer-to-Peer Applications
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
Peer-to-Peer Scaling Problem Millions of clients  server and network meltdown.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Peer-to-Peer Outline p2p file sharing techniques –Downloading: Whole-file vs. chunks –Searching Centralized index (Napster, etc.) Flooding (Gnutella,
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Introduction Widespread unstructured P2P network
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.

Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
Peer-to-Peer Systems.  Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
EE324 DISTRIBUTED SYSTEMS L17 P2P. Scaling Problem 2  Millions of clients  server and network meltdown.
Hui Zhang, Fall Computer Networking CDN and P2P.
Peer-to-Peer File Sharing
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Distributed Systems CDN & Peer-to-Peer.
مظفر بگ محمدی دانشگاه ایلام
EE 122: Peer-to-Peer (P2P) Networks
Presentation transcript:

Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica

2 Why Study P2P  Huge fraction of traffic on networks today  >=50%!  Exciting new applications  Next level of resource sharing  Vs. timesharing, client-server  E.g. Access 10’s-100’s of TB at low cost.

3 Share of Internet Traffic

4 Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gaining share from FastTrack (Kazaa).

5 What is P2P used for?  Use resources of end-hosts to accomplish a shared task  Typically share files  Play game  Search for patterns in data

6 What’s new?  Taking advantage of resources at the edge of the network  Fundamental shift in computing capability  Increase in absolute bandwidth over WAN

7 Peer to Peer  Systems:  Napster  Gnutella  KaZaA  BitTorrent  Chord

8 Key issues for P2P systems  Join/leave  How do nodes join/leave? Who is allowed?  Search and retrieval  How to find content?  How are metadata indexes built, stored, distributed?  Content Distribution  Where is content stored? How is it downloaded and retrieved?

9 4 Key Primitives Join – How to enter/leave the P2P system? Publish – How to advertise a file? Search how to find a file? Fetch – how to download a file?

10 Publish and Search  Basic strategies:  Centralized (Napster)  Flood the query (Gnutella)  Route the query(Chord)  Different tradeoffs depending on application  Robustness, scalability, legal issues

11 Napster: History  In 1999, S. Fanning launches Napster  Peaked at 1.5 million simultaneous users  Jul 2001, Napster shuts down

12 Napster: Overiew  Centralized Database:  Join: on startup, client contacts central server  Publish: reports list of files to central server  Search: query the server => return someone that stores the requested file  Fetch: get the file directly from peer

13 Napster: Publish I have X, Y, and Z! Publish insert(X, )

14 Napster: Search Where is file A? Query Reply search(A) --> Fetch

15 Napster: Discussion  Pros:  Simple  Search scope is O(1)  Controllable (pro or con?)  Cons:  Server maintains O(N) State  Server does all processing  Single point of failure

16 Gnutella: History  In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella  Soon many other clients: Bearshare, Morpheus, LimeWire, etc.  In 2001, many protocol enhancements including “ultrapeers”

17 Gnutella: Overview  Query Flooding:  Join: on startup, client contacts a few other nodes; these become its “neighbors”  Publish: no need  Search: ask neighbors, who as their neighbors, and so on... when/if found, reply to sender.  Fetch: get the file directly from peer

18 I have file A. Gnutella: Search Where is file A? Query Reply

19 Gnutella: Discussion  Pros:  Fully de-centralized  Search cost distributed  Cons:  Search scope is O(N)  Search time is O(???)  Nodes leave often, network unstable

20 Aside: Search Time?

21 Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL

22 Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002

23 KaZaA: History  In 2001, KaZaA created by Dutch company Kazaa BV  Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.  Eventually protocol changed so other clients could no longer talk to it  Most popular file sharing network today with >10 million users (number varies)

24 KaZaA: Overview  “Smart” Query Flooding:  Join: on startup, client contacts a “supernode”... may at some point become one itself  Publish: send list of files to supernode  Search: send query to supernode, supernodes flood query amongst themselves.  Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers

25 KaZaA: Network Design “Super Nodes”

26 KaZaA: File Insert I have X! Publish insert(X, )

27 KaZaA: File Search Where is file A? Query search(A) --> search(A) --> Replies

28 KaZaA: Discussion  Pros:  Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?)  Rumored to take into account network locality  Cons:  Mechanisms easy to circumvent  Still no real guarantees on search scope or search time

29 P2P systems  Napster  Launched P2P  Centralized index  Gnutella:  Focus is simple sharing  Using simple flooding  Kazaa  More intelligent query routing  BitTorrent  Focus on Download speed, fairness in sharing

30 BitTorrent: History  In 2002, B. Cohen debuted BitTorrent  Key Motivation:  Popularity exhibits temporal locality (Flash Crowds)  E.g., Slashdot effect, CNN on 9/11, new movie/game release  Focused on Efficient Fetching, not Searching:  Distribute the same file to all peers  Single publisher, multiple downloaders  Has some “real” publishers:  Blizzard Entertainment using it to distribute the beta of their new game

31 BitTorrent: Overview  Swarming:  Join: contact centralized “tracker” server, get a list of peers.  Publish: Run a tracker server.  Search: Out-of-band. E.g., use Google to find a tracker for the file you want.  Fetch: Download chunks of the file from your peers. Upload chunks you have to them.

32

33 BitTorrent: Sharing Strategy  Employ “Tit-for-tat” sharing strategy  “I’ll share with you if you share with me”  Be optimistic: occasionally let freeloaders download Otherwise no one would ever start! Also allows you to discover better peers to download from when they reciprocate

34 BitTorrent: Summary  Pros:  Works reasonably well in practice  Gives peers incentive to share resources; avoids freeloaders  Cons:  Central tracker server needed to bootstrap swarm (is this really necessary?)

35 DHT: History  In , academic researchers said “we want to play too!”  Motivation:  Frustrated by popularity of all these “half-baked” P2P apps :)  We can do better! (so we said)  Guaranteed lookup success for files in system  Provable bounds on search time  Provable scalability to millions of node  Hot Topic in networking ever since

36 DHT: Overview  Abstraction: a distributed “hash-table” (DHT) data structure:  put(id, item);  item = get(id);  Implementation: nodes in system form a distributed data structure  Can be Ring, Tree, Hypercube, Skip List, Butterfly Network,...

37 DHT: Overview (2)  Structured Overlay Routing:  Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id  Publish: Route publication for file id toward a close node id along the data structure  Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication.  Fetch: P2P fetching

Tapestry: a DHT-based P2P system focusing on fault-tolerance Danfeng (Daphne) Yao

39 A very big picture – P2P networking  Observe  Dynamic nature of the computing environment  Network Expansion  Require  Robustness, or fault tolerance  Scalability  Self-administration, or self-organization

40 Goals to achieve in Tapestry  Decentralization  Routing uses local data  Efficiency, scalability  Robust routing mechanism  Resilient to node failures  An easy way to locate objects

41 IDs in Tapestry  NodeID  ObjectID  IDs are computed using a hash function  The hash function is a system-wide parameter  Use ID for routing  Locating an object  Finding a node  A node stores its neighbors’ IDs

42 Tapestry routing: 0325  4598 Suffix-based routing

43 Each object is assigned a root node  Root node: a unique node for object O with the longest matching suffix  Root knows where O is rootId = O Copy of O Publish: node S routes a msg to root of O Closest location is stored if exist multiple copies Location query: a msg for O is routed towards O’s root How to choose the root node for an object? S

44 Surrogate routing: unique mapping without global knowledge  Problem: how to choose a unique root node of an object deterministically  Surrogate routing  Choose an object’s root (surrogate) node such that whose nodeId is the closest to the objectId  Small overhead, but no need of global knowledge  The number of additional hops is small

45 Surrogate example: find the root node for objectId = object root empty empty …….

46 Tapestry: adaptable, fault-resilient, self-management  Basic routing and location  Backpointer list in neighbor map  Multiple mappings are stored  More semantic flexibility: selection operator for choosing returned objects

47 A single Tapestry node Neighbor Map For “1732” (Octal) 1234 xxx xxx0 xxx3 xxx4 xxx5 xxx6 xxx7 xx xx22 xx32 xx42 xx52 xx62 xx72 x032 x132 x232 x332 x432 x532 x Object location Pointers Hotspot monitor Object store

48 Fault-tolerant routing: detect, operate, and recover  Expected faults  Server outages (high load, hw/sw faults)  Link failures (router hw/sw faults)  Neighbor table corruption at the server  Detect: TCP timeout, heartbeats to nodes on backpointer list  Operate under faults: backup neighbors  Second chance for failed server: probing msgs

49 Fault-tolerant location: avoid single point of failure  Multiple roots for each object  rootId = f(objectId + salt)  Redundant, but reliable  Storage servers periodically republish location info  Cached location info on routers times out  New and recovered objects periodically advertise their location info

50 Dynamic node insertion  Populate new node’s neighbor maps  Routing messages to its own nodeId  Copy and optimize neighbor maps  Inform relevant nodes of its entries to update neighbor maps

51 Dynamic deletion  Inform relevant nodes using backptrs  Or rely on soft-state (stop heartbeats)  In general, Tapestry expects small number of dynamic insertions/deletions

52 Fault handling outline  Design choice  Soft state for graceful fault recovery  Soft-state  Caches: updated by periodic refreshment msgs, or purged if receiving no such msgs  Faults are expected in Tapestry  Fault-tolerant routing  Fault-tolerant location  Surrogate routing

53 Tapestry Conclusions  Decentralized location and routing  Distributed algorithms for object-root mapping, node insertion/deletion  Fault-handling with redundancy  Per node routing table size: bLog b (N) N = size of namespace  Find object in Log b (n) overlay hops n = # of physical nodes