Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

2/66 GET /index.html HTTP/1.0 HTTP/ OK... Clients Server.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
P2P Computing. 2/48 What is peer-to-peer (P2P)? “Peer-to-peer is a way of structuring distributed applications such that the individual nodes have symmetric.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-Area Cooperative Storage with CFS Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides liberally borrowed from the SOSP 2001 CFS presentation.
Wide-area cooperative storage with CFS
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Introduction of P2P systems
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Dr. Yingwu Zhu.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
P2P Networking. Client/Server Architecture GET /index.html HTTP/1.0 HTTP/ OK... Clients Server 2/66.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
P2P Networking. 2/51 What is peer-to-peer (P2P)? “Peer-to-peer is a way of structuring distributed applications such that the individual nodes have symmetric.
Peer-to-Peer (P2P) Networking Client/Server Architecture GET /index.html HTTP/1.0 HTTP/ OK... Clients Server.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CS 268: Lecture 22 (Peer-to-Peer Networks)
EE 122: Peer-to-Peer (P2P) Networks
Peer-to-Peer (P2P) File Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Peer-to-Peer Storage Systems
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
#02 Peer to Peer Networking
Presentation transcript:

Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001

Lecture Overview  1st Part:  The P2P communication model, architecture and applications  2nd Part:  Chord and CFS

Peer to Peer - Overview  A class of applications that takes advantage of resources:  Storage, CPU cycles, content, human presence  Available at the edges of the Internet  A decentralized system that must cope with the unstable nature of computers located at the network edge

Client/Server Architecture  An architecture in which each process is a client or a server  Servers are powerful computers dedicated for providing services – storage, traffic, etc  Clients rely on servers for resources

Client/Server Properties  Big, strong server  Well known port/address of the server  Many to one relationship  Different software runs on the client/server  Client can be dumb (lacks functionality), server performs for the client  Client usually initiates connection

Client Server Architecture Server Client Internet

Client/Server Architecture GET /index.html HTTP/1.0 HTTP/ OK... Client Server

Disadvantages of C/S Architecture  Single point of failure  Strong expensive server  Dedicated maintenance (a sysadmin)  Not scalable - more users, more servers

Solutions Replication of data (several servers) Problems: redundancy, synchronization, expensive Brute force (a bigger, faster server) Problems: Not scalable, expensive, single point of failure

The Client Side  Although the model hasn’t changed over the years, the entities in it have  Today’s clients can perform more roles than just forwarding users requests  Today’s clients have:  More computing power  Storage space

Thin Client  Performs simple tasks:  I/O  Properties:  Cheap  Limited processing power  Limited storage

Fat Client  Can perform complex tasks:  Graphics  Data manipulation  Etc…  Properties:  Strong computation power  Bigger storage  More expensive than thin

Evolution at the Client Side IBM 4.77MHz 360k diskettes A 2GHz 40GB HD DEC’S VT100 No storage ‘70‘

What Else Has Changed?  The number of home PCs is increasing rapidly  PCs with dynamic IPs  Most of the PCs are “fat clients”  Software cannot cope with hardware development  As the Internet usage grow, more and more PCs are connecting to the global net  Most of the time PCs are idle  How can we use all this?

Sharing  Definition: 1.To divide and distribute in shares 2.To partake of, use, experience, occupy, or enjoy with others 3.To grant or give a share in intransitive senses Merriam Webster’s online dictionary (  There is a direct advantage of a co-operative network versus a single computer

Resources Sharing  What can we share?  Computer resources  Shareable computer resources:  “CPU cycles” -  Storage - CFS  Information - Napster / Gnutella  Bandwidth sharing - Crowds

 SETI – Search for ExtraTerrestrial Intelligence – On your own computer  A radio telescope in Puerto Rico scans the sky for radio signals  Fills a DAT tape of 35GB in 15 hours  That data has to be analyzed

(cont.)  The problem – analyzing the data requires a huge amount of computation  Even a supercomputer cannot finish the task on its own  Accessing a supercomputer is expensive  What can be done?

(cont.)  Can we use distributed computing?  YEAH  Fortunately, the problem be solved in parallel - examples:  Analyzing different parts of the sky  Analyzing different frequencies  Analyzing different time slices

(cont.)  The data can be divided into small segments  A PC is capable of analyzing a segment in a reasonable amount of time  An enthusiastic UFO searcher will lend his spare CPU cycles for the computation  When? Screensavers

- Example

- Summary  SETI reverses the C/S model  Clients can also provide services  Servers can be weaker, used mainly for storage  Distributed peers serving the center  Not yet P2P but we’re close  Outcome - great results:  Thousands of unused CPU hours tamed for the mission  3+ millions of users

What Exactly is P2P?  A distributed communication model with the properties:  All nodes have identical responsibilities  All communication is symmetric

P2P Properties  Cooperative, direct sharing of resources  No central servers  Symmetric clients Client Internet

P2P Advantages  Harnesses client resources  Scales with new clients  Provides robustness under failures  Redundancy and fault-tolerance  Immune to DoS  Load balance

P2P Disadvantages -- A Tough Design Problem  How do you handle a dynamic network (nodes join and leave frequently)  A number of constrains and uncontrolled variables:  No central servers  Clients are unreliable  Client vary widely in the resources they provide  Heterogeneous network (different platforms)

Two Main Architectures  Hybrid Peer-to-Peer  Preserves some of the traditional C/S architecture. A central server links between clients, stores indices tables, etc  Pure Peer-to-Peer  All nodes are equal and no functionality is centralized

Hybrid P2P  A main server is responsible for various administrative operations:  Users’ login and logout  Storing metadata  Directing queries  Example: Napster

Examples - Napster  Napster is a program for sharing information (mp3 music files) over the Internet  Created by Shawn Fanning in 1999 although similar services were already present (but lacked popularity and functionality)

Napster Sharing Style: hybrid center+edge “slashdot” song5.mp3 song6.mp3 song7.mp3 “kingrook” song4.mp3 song5.mp3 song6.mp3 song5.mp3 1. Users launch Napster and connect to Napster server 3. beastieboy enters search criteria 4. Napster displays matches to beastieboy 2. Napster creates dynamic directory from users’ personal.mp3 libraries Title User Speed song1.mp3 beasiteboy DSL song2.mp3 beasiteboy DSL song3.mp3 beasiteboy DSL song4.mp3 kingrook T1 song5.mp3 kingrook T1 song5.mp3 slashdot 28.8 song6.mp3 kingrook T1 song6.mp3 slashdot 28.8 song7.mp3 slashdot beastieboy makes direct connection to kingrook for file transfer song5 “beastieboy” song1.mp3 song2.mp3 song3.mp3

What About Communication Between Servers?  Each Napster server creates its own mp3 exchange community:  rock.napster.com, dance.napster.com, etc…  Creates a separation which is bad  We would like multiple servers to share a common ground. Reduces the centralization nature of each server, expands searchability

Various HP2P Models – 1. Chained Architecture  Chained architecture – a linear chain of servers  Clients login to a random server  Queries are submitted to the server  If the server satisfies the query – Done  Otherwise – Forward the query to the next server  Results are forwarded back to the first server  The server merges the results  The server returns the results to the client  Used by OpenNap network

2. Full Replication Architecture  Replication of constantly updated metadata  A client logs on to a random server  The server sends the updated metadata to all servers  Result:  All servers can answer queries immediately

3. Hash Architecture  Each server holds a portion of the metadata  Each server holds the complete inverted list for a subset of all words  Client directs a query to a server that is responsible for at least one of the keywords  That server gets the inverted lists for all the keywords from the other servers  The server returns the relevant results to the client

4. Unchained Architecture  Independent servers which do not communicate with each other  A client who logs on to one server can only see the files of other users at the same local server  A clear disadvantage of separating users into distinct domains  Used by Napster

Pure P2P  All nodes are equal  No centralized server  Example: Gnutella

 A completely distributed P2P network  Gnutella network is composed of clients  Client software is made of two parts:  A mini search engine – the client  A file serving system – the “server”  Relies on broadcast search

Gnutella - Operations  Connect – establishing a logical connection  PingPong – discovering new nodes (my friend’s friends)  Query – look for something  Download – download files (simple HTTP)

Gnutella – Form an Overlay Connect OK Ping Pong

How to find a node?  Initially, ad hoc ways  , online chat, news groups…  Bottom line: you got to know someone!  Set up some long-live nodes  New comer contacts the well-known nodes  Useful for building better overlay topology

Gnutella – Search Green Toad I have Toad A – look nice Toad B – too far A B I have

On a larger scale, things get more complicated

Gnutella – Scalability Issue  Can the system withstand flooding from every node?  Use TTL to limit the range of propagation  5 ^ 5 = 3125, how much can you get ?  Creates an “horizon” of computers  The promise is an expectation that you can change horizon everyday when login

The Differences  While the pure P2P model is completely symmetric, in the hybrid model elements of both PP2P and C/S coexist  Each model has its disadvantages  PP2P is still having problems locating information  HP2P is having scalability problems as with ordinary server oriented models

P2P – Summary  The current settings allowed P2P to enter the world of PCs  Controls the niche of sharing resources  The model is being studied from the academic and commercial point of view  There are still problems out there…

End Of Part I

Part II Roy Werber Idan Gelbourt Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

A P2P Problem  Every application in a P2P environment must handle an important problem: The lookup problem  What is the problem?

A Peer-to-peer Storage Problem  1000 scattered music enthusiasts  Willing to store and serve replicas  How do you find the data?

The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ? Dynamic network with N nodes, how can the data be found?

Centralized Lookup (Napster) Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state and a single point of failure Key=“title” Value=MP3 data… N4N4 Hard to keep the data in the server updated

Flooded queries (Gnutella) N4N4 Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“title” Value=MP3 data… Lookup(“title”) Not scalable

So Far  Centralized : - Table size – O(n) - Number of hops – O(1)  Flooded queries: - Table size – O(1) - Number of hops – O(n)

We Want  Efficiency : O(log(N)) messages per lookup  N is the total number of servers  Scalability : O(log(N)) state per node  Robustness : surviving massive failures

How Can It Be Done?  How do you search in O(log(n)) time?  Binary search  You need an ordered array  How can you order nodes in a network and data items?  Hash function!

Chord: Namespace  Namespace is a fixed length bit string  Each object is identified by a unique ID  How to get the ID? Shark SHA-1 Object ID:DE11AC SHA-1 Object ID:AABBCC :8080

Chord Overview  Provides just one operation :  A peer-to-peer hash lookup:  Lookup(key)  IP address  Chord does not store the data  Chord is a lookup service, not a search service  It is a building block for P2P applications

Chord IDs  Uses Hash function:  Key identifier = SHA-1(key)  Node identifier = SHA-1(IP address)  Both are uniformly distributed  Both exist in the same ID space  How to map key IDs to node IDs?

Mapping Keys To Nodes 0 M - an item - a node

Consistent Hashing [Karger 97] N32 N90 N105 K80 K20 K5 Circular 7-bit ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

“Finger Table” Allows Log(n)-time Lookups N80 1/128 ½ ¼ 1/8 1/16 1/32 1/64 Circular 7-bit ID space N80 knows of only seven other nodes.

Finger i Points to Successor of N+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/ N120

Lookups Take O(log(n)) Hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

Joining: Linked List Insert N36 N40 N25 1. Lookup(36) K30 K38 1. N36 wants to join. He finds his successor

Join (2) N36 N40 N25 2. N36 sets its own successor pointer K30 K38

Join (3) N36 N40 N25 3. Copy keys from N40 to N36 K30 K38 K30

Join (4) 4. Set N25’s successor pointer Update finger pointers in the background Correct successors produce correct lookups N36 N40 N25 K30 K38 K30

Join: Lazy Finger Update Is OK N36 N40 N25 N2 K30 N2 finger should now point to N36, not N40 Lookup(K30) visits only nodes < 30, will undershoot

Failures Might Cause Incorrect Lookup N120 N113 N102 N80 N85 N80 doesn’t know correct successor, so incorrect lookup N10 Lookup(90)

Solution: Successor Lists  Each node knows r immediate successors  After failure, will know first live successor  Correct successors guarantee correct lookups  Guarantee is with some probability

Choosing the Successor List Length  Assume 1/2 of nodes fail  P(successor list all dead) = (1/2) r  i.e. P(this node breaks the Chord ring)  Depends on independent failure  P(no broken nodes) = (1 – (1/2) r ) N  If we choose : r = 2log(N) makes prob. = 1 – 1/N

Chord Properties  Log(n) lookup messages and table space.  Well-defined location for each ID.  No search required.  Natural load balance.  No name structure imposed.  Minimal join/leave disruption.  Does not store documents…

Experimental Overview  Quick lookup in large systems  Low variation in lookup costs  Robust despite massive failure  See paper for more results Experiments confirm theoretical results

Chord Lookup Cost Is O(log N) Number of Nodes Average Messages per Lookup Constant is 1/2

Failure Experimental Setup  Start 1,000 CFS/Chord servers  Successor list has 20 entries  Wait until they stabilize  Insert 1,000 key/value pairs  Five replicas of each  Stop X% of the servers  Immediately perform 1,000 lookups

Massive Failures Have Little Impact Failed Lookups (Percent) Failed Nodes (Percent) (1/2) 6 is 1.6%

Chord Summary  Chord provides peer-to-peer hash lookup  Efficient: O(log(n)) messages per lookup  Robust as nodes fail and join  Good primitive for peer-to-peer systems

Wide-area Cooperative Storage With CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley

What Can Be Done With Chord  Cooperative Mirroring  Time-Shared Storage  Makes data available when offline  Distributed Indexes  Support Napster keyword search

How to Mirror Open-source Distributions?  Multiple independent distributions  Each has high peak load, low average  Individual servers are wasteful  Solution: aggregate  Option 1: single powerful server  Option 2: distributed service  But how do you find the data?

Design Challenges  Avoid hot spots  Spread storage burden evenly  Tolerate unreliable participants  Fetch speed comparable to whole-file TCP  Avoid O(#participants) algorithms  Centralized mechanisms [Napster], broadcasts [Gnutella]  CFS solves these challenges

CFS Overview CFS – Cooperative File System:  P2P read-only storage system  Read-only – only the owner can modify files  Completely decentralized node clientserver node clientserver Internet

CFS - File System  A set of blocks distributed over the CFS servers  3 layers:  FS – interprets blocks as files (Unix V7)  Dhash – performs block management  Chord – maintains routing tables used to find blocks

Chord  Uses 160-bit identifier space  Assigns to each node and block an identifier  Maps block’s id to node’s id  Performs key lookups (as we saw earlier)

Dhash – Distributed Hashing  Performs blocks management on top of chord :  Block’s retrieval,storage and caching  Provides load balance for popular files  Replicates each block at a small number of places (for fault-tolerance)

CFS - Properties  Tested on prototype :  Efficient  Robust  Load-balanced  Scalable  Download as fast as FTP  Drawbacks  No anonymity  Assumes no malicious participants

Design Overview FS Dhash Chord Dhash Chord DHash stores, balances, replicates, caches blocks DHash uses Chord [SIGCOMM 2001] to locate blocks

Client-server Interface  Files have unique names  Files are read-only (single writer, many readers)  Publishers split files into blocks  Clients check files for authenticity FS Clientserver Insert file f Lookup file f Insert block Lookup block node server node

Naming and Authentication 1.Name could be hash of file content  Easy for client to verify  But update requires new file name 2.Name could be a public key  Document contains digital signature  Allows verified updates w/ same name

CFS File Structure Public key Root block signature H(D) D Directory block H(F) F Inode block H(B1) B1 B2 H(B2) Data block

File Storage  Data is stored for an agreed-upon finite interval  Extensions can be requested  No specific delete command  After expiration – the blocks fade

Storing Blocks  Long-term blocks are stored for a fixed time  Publishers need to refresh periodically  Cache uses LRU (Least Recently Used) disk: cacheLong-term block storage

Replicate Blocks at k Successors N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68  Replica failure is independent

Lookups Find Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N Lookup(BlockID=17) RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch

First Live Successor Manages Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Copy of 17

DHash Copies to Caches Along Lookup Path N40 N10 N5 N20 N110 N99 N80 N60 Lookup(BlockID=45) N50 N RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache 4.

Naming and Caching N32 Client 1 Client 2  Every hop is smaller,the chance of collision when doing lookup is high  Caching is efficient

Caching Doesn’t Worsen Load N32 Only O(log N) nodes have fingers pointing to N32 This limits the single-block load on N32

Virtual Nodes Allow Heterogeneity – Load Balancing  Hosts may differ in disk/net capacity  Hosts may advertise multiple IDs  Chosen as SHA-1(IP Address, index)  Each ID represents a “virtual node”  Host load proportional to # v.n.’s  Manually controlled Node A N60N10N101 Node B N5

Server Selection By Chord N80 N48 100ms 10ms Each node monitors RTTs to its own fingers Tradeoff: ID-space progress vs delay N25 N90 N96 N18N115 N70 N37 N55 50ms 12ms Lookup(47)

Why Blocks Instead of Files?  Cost: one lookup per block  Can tailor cost by choosing good block size  Benefit: load balance is simple  For large files  Storage cost of large files is spread out  Popular files are served in parallel

CFS Project Status  Working prototype software  Some abuse prevention mechanisms  Guarantees authenticity of files, updates, etc.  Napster-like interface in the works  Decentralized indexing system  Some measurements on RON testbed  Simulation results to test scalability

Experimental Setup (12 nodes)  One virtual node per host  8Kbyte blocks  RPCs use UDP Caching turned off Proximity routing turned off

CFS Fetch Time for 1MB File Average over the 12 hosts No replication, no caching; 8 KByte blocks Fetch Time (Seconds) Prefetch Window (KBytes)

Distribution of Fetch Times for 1MB Fraction of Fetches Time (Seconds) 8 Kbyte Prefetch 24 Kbyte Prefetch40 Kbyte Prefetch

CFS Fetch Time vs. Whole File TCP Fraction of Fetches Time (Seconds) 40 Kbyte Prefetch Whole File TCP

Robustness vs. Failures Failed Lookups (Fraction) Failed Nodes (Fraction) (1/2) 6 is Six replicas per block;

Future work  Test load balancing with real workloads  Deal better with malicious nodes  Indexing  Other applications

CFS Summary  CFS provides peer-to-peer r/o storage  Structure: DHash and Chord  It is efficient, robust, and load-balanced  It uses block-level distribution  The prototype is as fast as whole-file TCP 

The End