Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer to Peer and Distributed Hash Tables
Professor Yashar Ganjali Department of Computer Science University of Toronto
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Peer-to-Peer Architecture Steve Ko Computer Sciences and Engineering University at Buffalo.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 EE 122: Overlay Networks and p2p Networks Ion Stoica TAs: Junda Liu, DK Moon, David Zats (Materials with thanks.
1 Peer-to-Peer Applications Reading: 9.4 COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford Teaching Assistants: Sunghwan.
EE 122: A Note On Joining Operation in Chord Ion Stoica October 20, 2002.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Chapter 2: Application layer
CS162 Operating Systems and Systems Programming Lecture 23 Peer-to-Peer Systems April 18, 2011 Anthony D. Joseph and Ion Stoica
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,
Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
Peer-to-Peer Systems.  Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks.
Hui Zhang, Fall Computer Networking CDN and P2P.
Peer-to-Peer File Sharing
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
1 Overlay Networks. 2 Routing overlays –Experimental versions of IP (e.g., 6Bone) –Multicast (e.g., MBone and end-system multicast) –Robust routing (e.g.,
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS162 Operating Systems and Systems Programming Lecture 25 P2P Systems and Review November 28, 2012 Ion Stoica
CSE 486/586 Distributed Systems Distributed Hash Tables
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
BitTorrent Vs Gnutella.
CS 268: Lecture 22 (Peer-to-Peer Networks)
CSE 486/586 Distributed Systems Distributed Hash Tables
Scott Shenker and Ion Stoica April 11, 2005
Peer-to-Peer and Social Networks
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
CS 162: P2P Networks Computer Science Division
P2P Systems and Distributed Hash Tables
CPE 401/601 Computer Network Systems
CPE 401/601 Computer Network Systems
Consistent Hashing and Distributed Hash Table
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 486/586 Distributed Systems Peer-to-Peer Architecture --- 1
Presentation transcript:

winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella –KaZaA –BitTorrent Distributed Hash Tables (DHT) and Structured Networks –Chord –Pros and Cons Readings: do required and optional readings if interested

winter 2008 P2P2 Peer-to-Peer Networks: How Did it Start? A killer application: Naptser –Free music over the Internet Key idea: share the content, storage and bandwidth of individual (home) users Internet

winter 2008 P2P3 Model Each user stores a subset of files Each user has access (can download) files from all users in the system

winter 2008 P2P4 Main Challenge Find where a particular file is stored A B C D E F E?

winter 2008 P2P5 Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time

winter 2008 P2P6 Peer-to-Peer Networks: Napster Napster history: the rise –January 1999: Napster version 1.0 –May 1999: company founded –September 1999: first lawsuits –2000: 80 million users Napster history: the fall –Mid 2001: out of business due to lawsuits –Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained –2003: growth of pay services like iTunes Napster history: the resurrection –2003: Napster reconstituted as a pay service –2006: still lots of file sharing going on Shawn Fanning, Northeastern freshman

winter 2008 P2P7 Napster Technology: Directory Service User installing the software –Download the client program –Register name, password, local directory, etc. Client contacts Napster (via TCP) –Provides a list of music files it will share –… and Napster’s central server updates the directory Client searches on a title or performer –Napster identifies online clients with the file –… and provides IP addresses Client requests the file from the chosen supplier –Supplier transmits the file to the client –Both client and supplier report status to Napster

winter 2008 P2P8 Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) –Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine –ftp the file Advantages: –Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: –Robustness, scalability (?)

winter 2008 P2P9 Napster: Example A B C D E F m1 m2 m3 m4 m5 m6 m1 A m2 B m3 C m4 D m5 E m6 F E? m5 E? E

winter 2008 P2P10 Napster Technology: Properties Server’s directory continually updated –Always know what music is currently available –Point of vulnerability for legal action Peer-to-peer file transfer –No load on the server –Plausible deniability for legal action (but not enough) Proprietary protocol –Login, search, upload, download, and status operations –No security: cleartext passwords and other vulnerability Bandwidth issues –Suppliers ranked by apparent bandwidth & response time

winter 2008 P2P11 Napster: Limitations of Central Directory Single point of failure Performance bottleneck Copyright infringement So, later P2P systems were more distributed File transfer is decentralized, but locating content is highly centralized

winter 2008 P2P12 Peer-to-Peer Networks: Gnutella Gnutella history –2000: J. Frankel & T. Pepper released Gnutella –Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare) –2001: protocol enhancements, e.g., “ultrapeers” Query flooding –Join: contact a few nodes to become neighbors –Publish: no need! –Search: ask neighbors, who ask their neighbors –Fetch: get file directly from another node

winter 2008 P2P13 Gnutella Distribute file location Idea: flood the request Hot to find a file: –Send request to all neighbors –Neighbors recursively multicast the request –Eventually a machine that has the file receives the request, and it sends back the answer Advantages: –Totally decentralized, highly robust Disadvantages: –Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

winter 2008 P2P14 Gnutella Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall Query: “xyz” xyz

winter 2008 P2P15 Gnutella: Query Flooding Fully distributed –No central server Public domain protocol Many Gnutella clients implementing protocol Overlay network: graph Edge between peer X and Y if there’s a TCP connection All active peers and edges is overlay net Given peer will typically be connected with < 10 overlay neighbors

winter 2008 P2P16 Gnutella: Protocol Query message sent over existing TCP connections Peers forward Query message QueryHit sent over reverse path Query QueryHit Query QueryHit Query Query Hit File transfer: HTTP Scalability: limited scope flooding

winter 2008 P2P17 Gnutella: Peer Joining Joining peer X must find some other peer in Gnutella network: use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then setup additional TCP connections

winter 2008 P2P18 Gnutella: Pros and Cons Advantages –Fully decentralized –Search cost distributed –Processing per node permits powerful search semantics Disadvantages –Search scope may be quite large –Search time may be quite long –High overhead and nodes come and go often

winter 2008 P2P19 Peer-to-Peer Networks: KaAzA KaZaA history –2001: created by Dutch company (Kazaa BV) –Single network called FastTrack used by other clients as well –Eventually the protocol changed so other clients could no longer talk to it Smart query flooding –Join: on start, the client contacts a super-node (and may later become one) –Publish: client sends list of files to its super-node –Search: send query to super-node, and the super- nodes flood queries among themselves –Fetch: get file directly from peer(s); can fetch from multiple peers at once

winter 2008 P2P20 KaZaA: Exploiting Heterogeneity Each peer is either a group leader or assigned to a group leader –TCP connection between peer and its group leader –TCP connections between some pairs of group leaders Group leader tracks the content in all its children

winter 2008 P2P21 KaZaA: Motivation for Super-Nodes Query consolidation –Many connected nodes may have only a few files –Propagating query to a sub-node may take more time than for the super-node to answer itself Stability –Super-node selection favors nodes with high up-time –How long you’ve been on is a good predictor of how long you’ll be around in the future

winter 2008 P2P22 Peer-to-Peer Networks: BitTorrent BitTorrent history and motivation –2002: B. Cohen debuted BitTorrent –Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN Web site on 9/11, release of a new movie or game –Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders –Preventing free-loading

winter 2008 P2P23 BitTorrent: Simultaneous Downloading Divide large file into many pieces –Replicate different pieces on different peers –A peer with a complete piece can trade with other peers –Peer can (hopefully) assemble the entire file Allows simultaneous downloading –Retrieving different parts of the file from different peers at the same time

winter 2008 P2P24 BitTorrent Components Seed –Peer with entire file –Fragmented in pieces Leacher –Peer with an incomplete copy of the file Torrent file –Passive component –Stores summaries of the pieces to allow peers to verify their integrity Tracker –Allows peers to find each other –Returns a list of random peers

winter 2008 P2P25 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent

winter 2008 P2P26 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Web Server

winter 2008 P2P27 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Response-peer list Web Server

winter 2008 P2P28 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand Web Server Shake-hand

winter 2008 P2P29 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

winter 2008 P2P30 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

winter 2008 P2P31 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloade r “US” Peer [Seed] Peer [Leech] Tracker Get- announce Response-peer list pieces Web Server

winter 2008 P2P32 Free-Riding Problem in P2P Networks Vast majority of users are free-riders –Most share no files and answer no queries –Others limit # of connections or upload speed A few “peers” essentially act as servers –A few individuals contributing to the public good –Making them hubs that basically act as a server BitTorrent prevent free riding –Allow the fastest peers to download from you –Occasionally let some free loaders download

winter 2008 P2P33 Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure –insert(id, item); –item = query(id); (or lookup(id);) –Note: item can be anything: a data object, document, file, pointer to a file… Proposals –CAN, Chord, Kademlia, Pastry, Tapestry, etc

winter 2008 P2P34 DHT Design Goals Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes

winter 2008 P2P35 Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall Structured Networks K I put(K 1,I 1 ) (K 1,I 1 ) get (K 1 ) I1I1

winter 2008 P2P36 Chord Associate to each node and item a unique id in an uni-dimensional space 0..2 m -1 Key design decision –Decouple correctness from efficiency Properties –Routing table size O(log(N)), where N is the total number of nodes –Guarantees that a file is found in O(log(N)) steps

winter 2008 P2P37 Identifier to Node Mapping Example Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4] Each node maintains a pointer to its successor

winter 2008 P2P38 Lookup Each node maintains its successor Route packet (ID, data) to the node responsible for ID using successor pointers lookup(37) node=44

winter 2008 P2P39 Joining Operation Each node A periodically sends a stabilize() message to its successor B Upon receiving a stabilize() message, node B –returns its predecessor B’=pred(B) to A by sending a notify(B’) message Upon receiving notify(B’) from B, –if B’ is between A and B, A updates its successor to B’ –A doesn’t do anything, otherwise

winter 2008 P2P40 Joining Operation Node with id=50 joins the ring Node 50 needs to know at least one node already in the system –Assume known node is succ=4 pred=44 succ=nil pred=nil succ=58 pred=35

winter 2008 P2P41 Joining Operation Node 50: send join(50) to node 15 Node 44: returns node 58 Node 50 updates its successor to join(50) succ=58 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35 58

winter 2008 P2P42 Joining Operation Node 50: send stabilize() to node 58 Node 58: –update predecessor to 50 –send notify() back succ=58 pred=nil succ=58 pred=35 stabilize() notify(pred=50) pred=50 succ=4 pred=44

winter 2008 P2P43 Joining Operation (cont’d) Node 44 sends a stabilize message to its successor, node 58 Node 58 reply with a notify message Node 44 updates its successor to succ=58 stabilize() notify(pred=50) succ=50 pred=50 succ=4 pred=nil succ=58 pred=35

winter 2008 P2P44 Joining Operation (cont’d) Node 44 sends a stabilize message to its new successor, node 50 Node 50 sets its predecessor to node succ=58 succ=50 Stabilize() pred=44 pred=50 pred=35 succ=4 pred=nil

winter 2008 P2P45 Joining Operation (cont’d) This completes the joining operation! succ=58 succ=50 pred=44 pred=50

winter 2008 P2P46 Achieving Efficiency: finger tables ( ) mod 2 7 = 16 0 Say m=7 ith entry at peer with id n is first peer with id >= i ft[i] Finger Table at

winter 2008 P2P47 Achieving Robustness To improve robustness each node maintains the k (> 1) immediate successors instead of only one successor In the notify() message, node A can send its k-1 successors to its predecessor B Upon receiving notify() message, B can update its successor list by concatenating the successor list received from A with A itself

winter 2008 P2P48 Chord Optimizations Reduce latency –Chose finger that reduces expected time to reach destination –Chose the closest node from range [N+2 i-1,N+2 i ) as successor Accommodate heterogeneous systems –Multiple virtual nodes per physical node

winter 2008 P2P49 DHT Conclusions Distributed Hash Tables are a key component of scalable and robust overlay networks Chord: O(log n) state, O(log n) distance Both can achieve stretch < 2 Simplicity is key Services built on top of distributed hash tables –persistent storage (OpenDHT, Oceanstore) –p2p file storage, i3 (chord) –multicast (CAN, Tapestry)