Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
CS 268: Overlay Networks: Distributed Hash Tables Kevin Lai May 1, 2001.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Introduction of P2P systems
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Peer to Peer Network Design Discovery and Routing algorithms
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
15-744: Computer Networking L-23: P2P. L -23; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Review.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer Data Management
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
CS 162: P2P Networks Computer Science Division
An Overview of Peer-to-Peer
Peer-to-Peer Networks and Distributed Hash Tables
#02 Peer to Peer Networking
Presentation transcript:

Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

2 2 Peer-to-Peer? u Centralized server u Distributed server –Client –server paradigm »Plat: RPC »Hierarchical: DNS, mount –Peer to Peer paradigm »both a web client and a transient web server: Easy part u How a peer determines which peers have the desired content? –Connected peers that have copies of the desired object.: Difficult part »Dynamic member list makes it more difficult –Pure: Gnutella, Chord –Hybrid: Napster, Groove u Other challenges –Scalability: up to hundred of thousands or millions of machines –Dynamicity: machines can come and go any time

Prof. Younghee Lee 3 3 Peer-to-Peer? u Network is hard to change. Especially the Internet! u Overlay network => end node => easy to change u If overlay end nodes act as the network node? –Overlay multicast –(VoIP) –File sharing u The Internet was an overlay on the telephone network u Future Internet –“naming” : key design issue today –Querying and data independence : key to tomorrow? »Decouple application level API from data organization

Prof. Younghee Lee 4 4 Peer-to-Peer? u Share the resources of individual peers –CPU, disk, bandwidth, information, … u Communication and collaboration –Magi, Groove, Skype u File sharing –Napster, Gnutella, Kazaa, Freenet, Overnet u P2P applications built over emerging overlays –PlanetLab

Prof. Younghee Lee 5 5 Peer-to-Peer? u Distributed computing »is a scientific experiment that will harness the power of hundreds of thousands of Internet-connected computers in the Search for Extra- Terrestrial Intelligence »Server assigns work unit: u Computers send machine information to server u Server assigns Task u Computers send results »distributed computing project which studies protein folding, misfolding, aggregation, and related diseases

Prof. Younghee Lee 6 6 Overlay Networks u Virtual edge: TCP connection or simply a pointer to an IP address u Overlay maintenance: Ping?, messaging?, new edge? (neighbor goes down),.. TCP/IP P2P/overlay middleware P2P applications/ DNS, CDN, ALM,..

Prof. Younghee Lee 7 7 P2P file sharing u Napster –Centralized, sophisticated search –C-S search –Point to point file transfer u Gnutella –Open source, Flooding, TTL, unreachable nodes u FastTrack (KaZaA) –Heterogeneous peers u Freenet –Anonymity, caching, replication

Prof. Younghee Lee 8 8 Centralized directory : Napster u Napster: first commercial company, –for MP3 distribution u Large scale server(server farm) u How to find a file: –On startup, client contacts central server and reports list of files –Query the index system  return a machine that stores the required file »Ideally this is the closest/least-loaded machine –ftp the file directly from peer u Centralized index –Lawsuits –Denial of service u Copyright issues –Direct infringement: download/upload –Indirect infringement: Individual accountables for actions of others, contributory

Prof. Younghee Lee 9 9 Centralized Lookup (Napster) Inform & update Client Query for content Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state single point of failure Performance bottleneck Copyright infringement Key=“title” Value=MP3 data… N4N4 To keep its database current Directory server can determine when peer become disconnected -send msg periodically to the peers -Permanent TCP connection with connected peer

Prof. Younghee Lee 10 Decentralized directory:Flooding(1) u Gnutella –Distribute file location –Idea: flood the request –How to find a file: »Send request to all neighbors »Neighbors recursively multicast the request »A machine that has the file receives the request, and it sends back the answer »Transfers are done with HTTP between peers –Advantages: »Totally decentralized, highly robust –Disadvantages: »Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL(limited scope query) »worst case O( N ) messages per lookup

Prof. Younghee Lee 11 Decentralized directory:Flooding(2) u FastTrack (aka Kazaa) –Modifies the Gnutella protocol into two-level hierarchy »Hybrid of Gnutella and Napster –Group leader: Super node »Nodes that have better connection to Internet »Act as temporary directory servers for other nodes in group »Maintains database, mapping names of content to IP address of its group member »Not a dedicated server; an ordinary server –Bootstrapping node »A peer wants to join the network contacts this node. »This node can designate this peer as new bootstrapping node. –Standard nodes: Ordinary node »Connect to super nodes and report list of files »Allows slower nodes to participate –Broadcast (Gnutella-style) search across Group leader peer; Query flooding –Drawbacks »Fairly complex protocol to construct and maintain the overlay network »Group leader have more responsibility. Not truly decentralized »Still not purely serverless(Bootstrapping node is on “always up server”) Overlay peer Group leader peer Neighboring relationships In overlay network KaZaA metadata - File name - File size - Content Hash - File descriptors: used for keyword matches during query

Prof. Younghee Lee 12 Gossip protocols u Epidemic algorithms u Originally targeted at database replication –Rumor mongering »Propagate newly received update to k random neighbors u Extended to routing –Rumor mongering of queries instead of flooding

Prof. Younghee Lee 13 Hierarchical Networks u IP –Hierarchical routing u DNS –Hierarchical name space »Client + hierarchy of server u Pros & Cons of Hierarchical data management –Work well for things aligned with hierarchy »Physical locality –Inflexible: no data independence u A Layered naming Architecture for the Internet [Balak+ 04] –Three levels of name resolution: for “mobility. Multihoming, integrate middlebox(NAT, firewall)…..” »From user level descriptors to service identifiers (SID) »From SID to endpoint identifier (EID) »From EID to IP address ; DNS –Flat names for SID and EID –Scalable resolution for flat architecture? => DHT

Prof. Younghee Lee 14 Commercial products u JXTA –Java/XML Framework for p2p applications –Name resolution and routing is done with floods & superpeers u MS WinXP p2p networking –An unstructured overlay, flooded publication and caching –“does not yet support distributed searches” u Security support –Authentication via signatures (assumes a trusted authority) –Encryption of traffic u Groove –Platform for p2p “experience”. MS collaboration tools. »Microsoft Office 2007 –Client-server name resolution, backup services, etc.

Prof. Younghee Lee 15 Routed lookups: Freenet, Chord, CAN N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”) Key=“title” Value=MP3 data…

Prof. Younghee Lee 16 Routing: Freenet u Addition goals to file location: –Provide publisher anonymity, security –Resistant to attacks – a third party shouldn’t be able to deny the access to a particular file (data item, object), even if it compromises a large fraction of machines u Architecture: –Each file is identified by a unique identifier –Each machine stores a set of files, and maintains a “routing table” to route the individual requests u Files are stored according to associated key; unique identifier –Core idea: try to cluster information about similar keys u Messages –Random 64bit ID used for loop detection

Prof. Younghee Lee 17 Routing: Freenet Routing Tables u Each node maintains a common stack –id – file identifier –next_hop – another node that stores the file id –file – file identified by id being stored on the local node u Forwarding of query for file id –If file id stored locally, then stop »Forward data back to upstream requestor »Requestor adds file to cache, adds entry in routing table –If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop –If data is not found, failure is reported back »Requestor then tries next closest match in routing table id next_hop file … …

Prof. Younghee Lee 18 Query u API: file = query(id); u Notes: –Any node forwarding reply may change the source of the reply (to itself or any other node) »Helps anonymity –Each query is associated a TTL that is decremented each time the query message is forwarded; »to obscure distance to originator: u TTL can be initiated to a random value within some bounds u When TTL=1, the query is forwarded with a finite probability u Depth counter –Opposite of TTL – incremented with each hop –Depth counter initialized to small random value –Each node maintains the state for all outstanding queries that have traversed it  help to avoid cycles –When file is returned, the file is cached along the reverse path

Prof. Younghee Lee 19 Routing: Freenet Example Note: doesn’t show file caching on the reverse path 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) ’ 5

Prof. Younghee Lee 20 Insert u API: insert(id, file); u Two steps –Search for the file to be inserted –If not found, insert the file u Searching: –like query, but nodes maintain state after a collision is detected and the reply is sent back to the originator u Insertion –Follow the forward path; insert the file at all nodes along the path –A node probabilistically replace the originator with itself; obscure the true originator

Prof. Younghee Lee 21 Cache Management u LRU(Least Recently Used) Cache of files –Files are not guaranteed to live forever »Files “fade away” as fewer requests are made for them u File contents can be encrypted with original text names as key(id) –Cache owners do not know either original name or contents  cannot be held responsible

Prof. Younghee Lee 22 Freenet Summary u Advantages –Provides publisher anonymity –Totally decentralize architecture  robust and scalable –Resistant against malicious file deletion u Disadvantages –Does not always guarantee that a file is found, even if the file is in the network

Prof. Younghee Lee 23 Routing: Structured Approaches u Goal: make sure that an item (file) identified is always found in a reasonable # of steps u Abstraction: a distributed hash-table (DHT) data structure –insert(id, item); –item = query(id); –Note: item can be anything: a data object, document, file, pointer to a file… u Proposals –CAN (ICIR/Berkeley) –Chord (MIT/Berkeley) –Pastry (Rice) –Tapestry (Berkeley)

Prof. Younghee Lee 24 High level idea: Indirection u Indirection in space –Logical IDs (Content based) »Routing to those IDs »“Content addressable” network –Tolerant of “nodes joining and leaving the network” u Indirection in time –Scheme to temporally decouple send and receive –Soft state »“publisher” requests TTL on storage u Distributed Hash Table –Directed search

Prof. Younghee Lee 25 Distributed Hash Table (DHT) u Hash table –Data structure that maps “keys” to “values” u DHT –Hash table but spread across the Internet u Interface –insert(key, value) –lookup(key) u Every DHT node supports a single operation –Given key as input route messages toward node holding key

Prof. Younghee Lee 26 Distributed Hash Table (DHT)

Prof. Younghee Lee 27 K V DHT in action: put() Operation: take key as input; route messages to node holding key insert(K 1,V 1 ) (K 1,V 1 )

Prof. Younghee Lee 28 retrieve (K 1 ) K V DHT in action: get() Operation: take key as input; route messages to node holding key

Prof. Younghee Lee 29 Routing: Chord u Associate to each node and item a unique id in an uni-dimensional space u Goals –Scales to hundreds of thousands of nodes –Handles rapid arrival and failure of nodes u Properties –Routing table size O(log(N)), where N is the total number of nodes –Guarantees that a file is found in O(log(N)) steps

Prof. Younghee Lee 30 Aside:Consistent Hashing [Karger97] A key is stored at its successor: node with next higher ID This is designed to let nodes enter and leave the network with minimal disruption

Prof. Younghee Lee 31 Routing: Chord Basic Lookup

Prof. Younghee Lee 32 Routing: Finger table - Faster Lookups

Prof. Younghee Lee 33 Routing: join operation

Prof. Younghee Lee 34 Routing: join operation Before and after node 6 joins.

Prof. Younghee Lee 35 Routing: Chord Summary u Assume identifier space is 0…2 m u Each node maintains –Finger table »Entry i in the finger table of n is the first node that succeeds or equals n + 2 i –Predecessor node u An item identified by id is stored on the successor node of id u Pastry –Similar to Chord

Prof. Younghee Lee 36 Routing: Chord Example u Assume an identifier space 0..8 u Node n1:(1) joins  all entries in its finger table are initialized to itself i id+2 i succ Succ. Table

Prof. Younghee Lee 37 Routing: Chord Example u Node n2:(3) joins i id+2 i succ Succ. Table i id+2 i succ Succ. Table

Prof. Younghee Lee 38 Routing: Chord Example u Nodes n3:(0), n4:(6) join i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table

Prof. Younghee Lee 39 Routing: Chord Examples u Nodes: n1:(1), n2(3), n3(0), n4(6) u Items: f1:(7), f2:(2) i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table

Prof. Younghee Lee 40 Routing: Query u Upon receiving a query for item id, a node u Check whether stores the item locally u If not, forwards the query to the largest node in its successor table that does not exceed id i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table query(7)

Prof. Younghee Lee 41 CAN u Associate to each node and item a unique id in an d- dimensional space –Virtual Cartesian coordinate space u Entire space is partitioned amongst all the nodes –Every node “owns” a zone in the overall space u Abstraction –Can store data at “points” in the space –Can route from one “point” to another u Point = node that owns the enclosing zone u Properties –Routing table size O(d) –Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes

Prof. Younghee Lee 42 CAN E.g.: Two Dimensional Space u Space divided between nodes u All nodes cover the entire space u Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 u Example: –Assume space size (8 x 8) –Node n1:(1, 2) first node that joins  cover the entire space n1

Prof. Younghee Lee 43 CAN E.g.: Two Dimensional Space u Node n2:(4, 2) joins  space is divided between n1 and n n1 n2

Prof. Younghee Lee 44 CAN E.g.: Two Dimensional Space u Node n2:(4, 2) joins  space is divided between n1 and n n1 n2 n3

Prof. Younghee Lee 45 CAN E.g.: Two Dimensional Space u Nodes n4:(5, 5) and n5:(6,6) join n1 n2 n3 n4 n5

Prof. Younghee Lee 46 CAN E.g.: Two Dimensional Space u Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) u Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); n1 n2 n3 n4 n5 f1 f2 f3 f4

Prof. Younghee Lee 47 CAN E.g.: Two Dimensional Space u Each item is stored by the node who owns its mapping in the space n1 n2 n3 n4 n5 f1 f2 f3 f4

Prof. Younghee Lee 48 CAN: Query Example u Each node knows its neighbors in the d-space u Forward query to the neighbor that is closest to the query id u Example: assume n1 queries f n1 n2 n3 n4 n5 f1 f2 f3 f4

Prof. Younghee Lee 49 CAN: Query Example u Each node knows its neighbors in the d-space u Forward query to the neighbor that is closest to the query id u Example: assume n1 queries f4 u Can route around some failures n1 n2 n3 n4 n5 f1 f2 f3 f4

Prof. Younghee Lee 50 Node Failure Recovery u Simple failures –Know your neighbor’s neighbors –When a node fails, one of its neighbors takes over its zone u More complex failure modes –Simultaneous failure of multiple adjacent nodes –Scoped flooding to discover neighbors –Hopefully, a rare event

Prof. Younghee Lee 51 Routing: Concerns/optimization u Each hop in a routing-based P2P network can be expensive –No correlation between neighbors and their location –A query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! –Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance u CAN/Chord Optimizations –Weight neighbor nodes by RTT »When routing, choose neighbor who is closer to destination with lowest RTT from me »Reduces path latency –Multiple physical nodes per virtual node »Reduces path length (fewer virtual nodes) »Reduces path latency (can choose physical node from virtual node with lowest RTT) »Improved fault tolerance (only one node per zone needs to survive to allow routing through the zone) u What type of lookups? –Only exact match!

Prof. Younghee Lee 52 BitTorent u A p2p file sharing system –Load sharing through file splitting –Uses bandwidth of peers instead of a server u Successfully used: –Used to distribute RedHat 9 ISOs (about 80TB) u Setup –A “seed” node has the file –File is split into fixed-size segments (256KB typ) –Hash calculated for each segment –A “tracker” node is associated with the file –A “.torrent” meta-file is built for the file – identifies the address of the tracker node –The.torrent file is passed around the web

Prof. Younghee Lee 53 BitTorent u Download –A client contacts the tracker identified in the.torrent file (using HTTP) –Tracker sends client a (random) list of peers who have/are downloading the file –Client contacts peers on list to see which segments of the file they have –Client requests segments from peers (via TCP) –Client uses hash from.torrent to confirm that segment is legitimate –Client reports to other peers on the list that it has the segment –Other peers start to contact client to get the segment (while client is getting other segments) file.torrent info: length name hash url of tracker

Prof. Younghee Lee 54 Conclusions u Distributed Hash Tables are a key component of scalable and robust overlay networks u CAN: O(d) state, O(d*n1/d) distance u Chord: O(log n) state, O(log n) distance u Both can achieve stretch < 2 u Simplicity is key u Services built on top of distributed hash tables –p2p file storage, i3 (chord) –multicast (CAN, Tapestry) –persistent storage (OceanStore using Tapestry)