Wide-area cooperative storage with CFS

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
Introducing: Cooperative Library Presented August 19, 2002.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Lecture 3 1.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Presented by: Tianyu Li
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Distributed Hash Tables
(slides by Nick Feamster)
Accessing nearby copies of replicated objects
Peer-to-Peer (P2P) File Systems
Peer-to-Peer Storage Systems
Chord and CFS Philip Skov Knudsen
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Wide-area cooperative storage with CFS

Overview CFS = Cooperative File System Peer-to-peer read-only file system Distributed hash table for block storage Lookup performed by Chord

Design Overview CFS clients contain 3 layers: A file system client Dhash storage layer Chord lookup layer CFS servers contain 2 layers:

Overview con’t Disk blocks:DHash blocks::Disk Addresses:Block Identifiers CFS file systems are read-only as far as clients are concerned can be modified by its publisher

File System Layout Insert file system blocks into the CFS system using a content hash as its identifier Then signs the root block with its private key Inserts the root block into CFS using the corresponding public key as its identifier

Publisher Updates Updating the file system’s root block to point to the new data Authentication by checking to make sure that the same key signed both old and new block Timestamps prevent replays of old data File systems are updated without changing the root blocks identifier

CFS properties Decentralized control Scalability Availability Load balance Persistence Quotas Efficiency

Chord Layer Same Chord protocol as mentioned earlier with a few modifications Server Selection Node ID authentication

Quick Chord Overview Consistent Hashing Successor lists finger tables Node joins/leaves Successor lists O(N) finger tables O(log N)

Server Selection Chooses next node to contact from finger table Gets the closest node to the destination What about network latency? Introduced measuring and storing latency in the finger tables Calculated when acquiring finger table entries Reasoning: RPCs to different nodes will incur varying latency so you want to choose on that minimizes this

Node ID authentication Idea: network security All chord IDs must be in the form h(x) H = SHA-1 has function x = the nodes IP + virtual node index When a new node joins the system Existing node will send a message to the new node The ID must match the claimed IP + virtual node index hash to be accepted

DHash Layer Handles Uses Chord to locate blocks Storing and retrieving blocks Distribution Replication Caching of blocks Uses Chord to locate blocks Key CFS design: split each file system into blocks and distribute those across many servers

Replication DHash replicates each block on k servers immediately after the blocks sucessor Why? …even if the block’s successor fails, the block is still available Server independence guaranteed because location on the ring is determined by hash of IP not by physical location

Replication con’t Could save space by storing coded pieces of blocks but….storage space is not expected to be a highly-constrained resource Placement of replicas allows a client to select the replica with the fastest download Result from Chord lookup will be the immediate predecessor to the node with X This node’s successor table contains entries for the latencies of the nearest Y nodes

Caching Caching blocks prevents overloading servers with popular data Using Chord… Clients contact server closer and closer to the desired location Once source is found or an intermediate cached copy is found all servers just contacted receive file to cache Replaces cached blocks in least-recently-used order

Load Balance Virtual servers…1 real server acting as several “virtual” servers Administrator can configure the number of based on server’s storage and network capabilities Possible Side-effect: creating more hops in Chord algorithm More nodes = more hops Solution: allow virtual server’s to look at each other’s tables

Quotas Control amount of data a publisher can inject Based on reliable ID of publishers won’t work because it requires centralized administration CFS uses quotas based on IP address of publishers Each server imposes a 0.1% limit…so as the capacity grows the total data amount grows Not easy to subvert this system because publishers must respond to initial confirmation requests

Updates and Deletion Only allows the publisher to modify data 2 conditions for acceptance Marked as a content-hash block  supplied key = SHA-1 hash of the blocks content Marked as a signed block  signed by public key = SHA-1 hash is the block’s CFS key No explicit delete Publishers must refresh blocks if they want them stored CFS deletes blocks that have not been refreshed recently

Experimental Results – Real Life 12 machines over the internet – US, the Netherlands, Sweden and South Korea

Lookup Range of servers 10,000 blocks 10,000 lookups for random blocks Distribution is roughly linear on log plot …so O(log N)

Load Balance Theoretical: Actual: 64 physical servers 1,6 and 24 virtual servers each Actual: 10,000 blocks 64 actual 6 virtual each

Caching Single block (1) 1,000 server system Average without:5.7 (10 look-ups)

Storage Space Control Varying number of virtual servers 7 physical servers 1, 2, 4, 8, 16, 32, 64 and 128 virtual 10,000 blocks

Effect of Failure 1,000 blocks 1,000 server system Each block has 6 replicas Fraction of servers fail before the stabilization algorithm is run

Effect of Failure con’t Same set-up as before X% of servers fail

Conclusion Highly scalable, available, secure read-only file system Uses peer-to-peer Chord protocol for lookup Uses replication and caching to achieve availability and load balance Simple but effective protection against inserting large amount of malicious data