Wide-Area Cooperative Storage with CFS Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides liberally borrowed from the SOSP 2001 CFS presentation.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert MorrisDavid, Liben-Nowell, David R. Karger, M. Frans Kaashoek,
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
1 High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two Nov. 24, 2003 Byung-Gon Chun.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Structure Overlay Networks and Chord Presentation by Todd Gardner Figures from: Ion Stoica, Robert Morris, David Liben- Nowell, David R. Karger, M. Frans.
Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Wide-Area Cooperative Storage with CFS Morris et al. Presented by Milo Martin UPenn Feb 17, 2004 (some slides based on slides by authors)
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Presented by: Tianyu Li
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Persistence of Data in a Dynamic Unreliable Network
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Peer-to-Peer (P2P) File Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Peer-to-Peer Storage Systems
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
Presentation transcript:

Wide-Area Cooperative Storage with CFS Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides liberally borrowed from the SOSP 2001 CFS presentation And High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two by Rodrigo Rodrigues, Charles Blake and Barbara Liskov presented at HotOS 2003 By Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, *Ion Stoica MIT and *Berkeley node Internet node

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:2 Design Goals Spread storage burden evenly (Avoid hot spots) Tolerate unreliable participants Fetch speed comparable to whole-file TCP Avoid O(#participants) algorithms –Centralized mechanisms [Napster], broadcasts [Gnutella] Simplicity –Does simplicity imply provable correctness? More precisely, could you build CFS correctly? –What about performance? CFS attempts to solve these challenges –Does it?

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:3 CFS Summary CFS provides peer-to-peer r/o storage Structure: DHash and Chord Claims efficient, robust, and load-balanced –Does CFS achieve any of these qualities? It uses block-level distribution The prototype is as fast as whole-file TCP Storage promise  Redundancy promise –  data must move as members leave! –  lower bound on bandwidth usage

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:4 Client-server interface Files have unique names Files are read-only (single writer, many readers) Publishers split files into blocks and place blocks into a hash table. Clients check files for authenticity [SFSRO] FS Clientserver Insert file f Lookup file f Insert block Lookup block node server node

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:5 Server Structure DHash stores, balances, replicates, caches blocks DHash uses Chord [SIGCOMM 2001] to locate blocks Why blocks instead of files? easier load balance (remember complexity of PAST) DHash Chord Node 1Node 2 DHash Chord

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:6 CFS file system structure The root-block is identified by a public key –signed by corresponding private key Other blocks identified by hash of their contents What is wrong with this organization? –Path of blocks from data-block to root-block modified for every update. –This is okay because system is read-only. root-block public key signature H(D)  directory block  D H(F) inode block  F data block B1 data block B2 H(B2) H(B1)

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:7 DHash/Chord Interface lookup() returns list with node IDs closer in ID space to block ID –Sorted, closest first server DHash Chord Lookup(blockID)List of finger table with

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:8 DHash Uses Other Nodes to Locate Blocks N40 N10 N5 N20 N110 N99 N80 N50 N60 N68 Lookup(BlockID=45)

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:9 Storing Blocks Long-term blocks are stored for a fixed time –Publishers need to refresh periodically Cache uses LRU disk: cacheLong-term block storage

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:10 Replicate blocks at r successors N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 r = 2 log N Node IDs are SHA-1 of IP Address Ensures independent replica failure

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:11 Lookups find replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N Lookup(BlockID=17) RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:12 First Live Successor Manages Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Copy of 17 Node can locally determine that it is the first live successor

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:13 DHash Copies to Caches Along Lookup Path N40 N10 N5 N20 N110 N99 N80 N60 Lookup(BlockID=45) N50 N RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:14 Virtual Nodes Allow Heterogeneity Hosts may differ in disk/net capacity Hosts may advertise multiple IDs –Chosen as SHA-1(IP Address, index) –Each ID represents a “virtual node” Host load proportional to # v.n.’s Manually controlled Sybil attach! Node A N60N10N101 Node B N5

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:15 Experiment (12 nodes)! (pre-planetlab) One virtual node per host 8Kbyte blocks RPCs use UDP To vu.nl lulea.se ucl.uk To kaist.kr,.ve Caching turned off Proximity routing turned off

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:16 CFS Fetch Time for 1MB File Average over the 12 hosts No replication, no caching; 8 KByte blocks Fetch Time (Seconds) Prefetch Window (KBytes)

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:17 Distribution of Fetch Times for 1MB Fraction of Fetches Time (Seconds) 8 Kbyte Prefetch 24 Kbyte Prefetch40 Kbyte Prefetch

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:18 CFS Fetch Time vs. Whole File TCP Fraction of Fetches Time (Seconds) 40 Kbyte Prefetch Whole File TCP

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:19 Robustness vs. Failures Failed Lookups (Fraction) Failed Nodes (Fraction) (1/2) 6 is Six replicas per block;

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:20 Revisit Assumptions P2P Purist Ideals –Cooperation, Symmetry, Decentralized How realistic are these assumptions? In what domains are they valid? Fastest Flaky Slower Flaky SlowFasterStableFast Slowtest s GB/Node of Idle Cheap Disk Distributed Data Store w/ all the *ilities: High Availability Good Scalability High Reliability Maintainability Flexibility Fault-Tolerant DHT

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:21 BW for Redundancy Maintenance Assume average system size, N, stable –P(Leave)/Time = Leaves/Time/N = 1/Lifetime –Join = Leave forever rate = 1/Lifetime –Leaves induce redundancy replacement replacement size x replacement rate –Joins cost the same  Maintenance BW > 2 x Space/Lifetime Space/node < ½ BW/node x Lifetime Quality WAN storage scales with WAN BW and member quality

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:22 BW for Redundancy Maintenance II maintenance BW  200 Kbps lifetime = Median 2001-Gnutella session = 1 hour served space = 90 MB/node << donatable storage!

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:23 Peer Dynamics The peer-to-peer “dream” –Reliable storage from many unreliable components Robust lookup perceived as critical Bandwidth to maintain redundancy is the hard problem [Blake and Rodrigues 03]

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:24 Need Too Much BW to Maintain Redundancy 10M users; 25% avail.; 1 week membership; 100G donation => 50 kbps Wait! It gets worse… HW trends High Availability Scalable Storage Dynamic Membership Must Pick Two

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:25 Proposal #1 Server-to-Server DHTs Reduce to a Solved Problem? Not really… –Self-Configuration –Symmetry –Scalability –Dynamic Load balance

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:26 Proposal #2 Complete routing information Possible complications: –Memory Requirements –Bandwidth [Gupta,Liskov,Rodrigues 03] –Load balance Multi-hop optimization makes sense only when many very dynamic members serve a little data –(multi-hop not required if N < per host / 40 bytes)

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:27 Proposal #3 Decouple networking layer from data layer Layer of indirection –a.k.a. distributed directory, location pointers, pointers, etc Combines a little of proposal #1 and 2 –DHT no longer decides who, what, when, where, why, and how for storage maintenance. –Separate policy from mechanism.

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:28 Appendix: Chord Hashes a Block ID to its Successor N32 N10 N100 N80 N60 Circular ID Space Nodes and blocks have randomly distributed IDs Successor: node with next highest ID B33, B40, B52 B11, B30 B112, B120, …, B10 B65, B70 B100 Block ID Node ID

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:29 Appendix: Basic Lookup N32 N10 N5 N20 N110 N99 N80 N60 N40 “Where is block 70?” “N80” Lookups find the ID’s predecessor Correct if successors are correct

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:30 Appendix: Successor Lists Ensure Robust Lookup N32 N10 N5 N20 N110 N99 N80 N60 Each node stores r successors, r = 2 log N Lookup can skip over dead nodes to find blocks N40 10, 20, 32 20, 32, 40 32, 40, 60 40, 60, 80 60, 80, 99 80, 99, , 110, 5 110, 5, 10 5, 10, 20

P2P Systems 2003©2003 Hakim Weatherspoon/UC BerkeleyCFS:31 Appendix: Chord Finger Table Allows O(log N) Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128 See [SIGCOMM 2000] for table maintenance