P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introducing: Cooperative Library Presented August 19, 2002.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Wide-Area Cooperative Storage with CFS Morris et al. Presented by Milo Martin UPenn Feb 17, 2004 (some slides based on slides by authors)
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
The Impact of DHT Routing Geometry on Resilience and Proximity K. Gummadi, R. Gummadi..,S.Gribble, S. Ratnasamy, S. Shenker, I. Stoica.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Ian Clarke, Oskar Sandberg, Brandon Wiley,Theodore W. Hong Presented by Zhengxiang.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Dr. Yingwu Zhu.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Malugo – a scalable peer-to-peer storage system..
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Data Structure of Chord  For each peer -successor link on the ring -predecessor link on the ring -for all i ∈ {0,..,m-1} Finger[i] := the peer following.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer Information Systems Week 12: Naming
CS 268: Lecture 22 (Peer-to-Peer Networks)
Distributed Hash Tables
DHT Routing Geometries and Chord
Distributed P2P File System
Peer-to-Peer (P2P) File Systems
Peer-to-Peer Storage Systems
Chord and CFS Philip Skov Knudsen
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar

DHT Overview DHTs provide a simple primitive  put (key,value)  get (key)  Data/Nodes distributed over a key-space High-level idea: Move closer towards the object in the key-space  Typically O(logN) lookup time  Typically O(logN) neighbors in routing table

How to build applications over DHTS DHTs provide mechanisms for retrieval  Chord etc. don’t store keys and values How to build a file-system over DHTs?  Practical implementation issues  Storage granularity  Replication  Network latency  Reliability  …. Case Study: CFS

What we would like to have.. Decentralized control  Ordinary internet hosts Scalability  Performance should scale with #nodes Availability  Resilience to node failures Load balancing  Irrespective of workload/node capacities

CFS Design

Design Overview Filesystem structure  Similar to UNIX  Instead of disk blocks/addresses use DHash block and block identifies  Each block is either data or meta-data  Parent block contains ids of children Insert the file blocks into CFS  Hash of each blocks content as its id  Root blocks need to be signed for integrity

CFS file system structure

Chord: Server selection Original Chord idea only guarantees O(logN) lookup time  But these O(logN) could be very long physical latencies  E.g., Inter-continental links! Server selection added in CFS  Estimate total cost of using each node in the set of potential next hops  Assume latencies are transitive

DHash Layer Performs fetches for clients Distributes file blocks among servers Maintains cached and replicated copies Load balance: Control over how much data each server stores Management: Quotas/Updates/Deletes

DHash Design: Block vs. File ? What granularity to store file-system objects? File-level storage + lower lookup cost - load imbalance Block-level storage + balance load, efficient for large popular files - more lookups per file CFS chooses block-level storage  Network bandwidth for lookup is low  Lookup latency is hidden by pre-fetching  File-level storage uses capacity inefficiently

DHash: Replication Replicate each block on k servers  After the successor on the ring Independence  Servers close on key- space unlikely to be correlated in network Also makes it easy to speed up downloads

DHash Caching Caching to avoid hot-spots When a lookup succeeds, the client sends a copy to each server along the Chord-path  Just second-to-last server?  At each lookup step, check if cached copy exists Replacement: Least recently used policy  Blocks farther away in ID-space likely to be discarded earlier Cache consistency not a problem!  Data is indexed by a content-hash

DHash: Load Balance Servers may have different network and storage capacities Introduce notion of “virtual server”  Each physical node may host multiple CFS servers  Configured depending on capacity May increase lookup latency?  Shortcuts by sharing Chord routing information

DHash: Management Quotas to prevent abuse  Per-IP limit data you can enter into CFS Updates  Read-only semantics  Only publisher can update the file system Deletes  No explicit delete: data stored for an agreed interval and discarded unless explicitly requested  Automatically gets rid of malicious inserts

Some interesting P2P/DHT topics Different geometries possible  Ring e.g., Chord  Hypercube e.g., CAN  Prefix-trees e.g., PRR trees  Butterfly network e.g., Viceroy

Some interesting P2P/DHT topics Can we do better than the O(logN) time?  Use aggressive caching  Use object popularity Can we do better than O(logN) space?

Some interesting P2P/DHT topics Can we add locality to DHTs?  Support range-queries  So that file blocks get hashed to nearby locations  Locality-sensitive hashing  Network latency to be explicitly modeled

Some interesting P2P/DHT topics P2P streaming?  ESM etc. take the approach of building “latency- optimized” overlays instead of general purpose DHTs  Why -- DHTs don’t have network locality

Some interesting P2P/DHT topics Anonymous storage  Freenet  Publius  Probabilistic routing and caching  Cant predict where the object came from!