1 Defragmenting DHT-based Distributed File Systems Jeffrey Pang, Srinivasan Seshan Carnegie Mellon University Phillip B. Gibbons, Michael Kaminsky Intel.

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Distributed Hash Tables: An Overview
Colyseus: A Distributed Architecture for Online Multiplayer Games
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
TAP: A Novel Tunneling Approach for Anonymity in Structured P2P Systems Yingwu Zhu and Yiming Hu University of Cincinnati.
MOAT: A Multi-Object Assignment Toolkit Haifeng Yu Intel Research Pittsburgh / CMU Joint work with: Phillip B. Gibbons Intel Research Pittsburgh.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Toward an Optimal Social Network Defense Against Sybil Attacks Haifeng Yu National University of Singapore Phillip B. Gibbons Intel Research Pittsburgh.
OverCite: A Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li, Isaac G. Councill, M. Frans Kaashoek, Robert Morris MIT Computer Science and.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Haifeng Yu National University of Singapore
IPSN/SPOTS 2007 Beacon Location Service A Location Service for Point-to-Point Routing in Wireless Sensor Networks EECS Department University of California,
Applications over P2P Structured Overlays Antonino Virgillito.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Mercury: Scalable Routing for Range Queries Ashwin R. Bharambe Carnegie Mellon University With Mukesh Agrawal, Srinivasan Seshan.
Lee Center Workshop, May 19, 2006 Distributed Objects System with Support for Sequential Consistency.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste Computer Science Department Carnegie Mellon University.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.
Integrating Semantics-Based Access Mechanisms with P2P File Systems Yingwu Zhu, Honghao Wang and Yiming Hu.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Evaluating a Defragmented DHT Filesystem Jeff Pang Phil Gibbons, Michael Kaminksy, Haifeng Yu, Sinivasan Seshan Intel Research Pittsburgh, CMU.
Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Wide-area cooperative storage with CFS
SybilGuard: Defending Against Sybil Attacks via Social Networks Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman Presented by Ryan.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Distributed Hash Table Systems Hui Zhang University of Southern California.
A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Load Balancing in Structured P2P System Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica IPTPS ’03 Kyungmin Cho 2003/05/20.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Information-Centric Networks06a-1 Week 6 / Paper 1 Untangling the Web from DNS –Michael Walfish, Hari Balakrishnan and Scott Shenker –Networked Systems.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Ditto - A System for Opportunistic Caching in Multi-hop Mesh Networks Fahad Rafique Dogar Joint work with: Amar Phanishayee, Himabindu Pucha, Olatunji.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
A Sybil-Proof Distributed Hash Table Chris Lesniewski-LaasM. Frans Kaashoek MIT 28 April 2010 NSDI
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
Load Rebalancing for Distributed File Systems in Clouds.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
A Case Study in Building Layered DHT Applications
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Early Measurements of a Cluster-based Architecture for P2P Systems
Be Fast, Cheap and in Control
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

1 Defragmenting DHT-based Distributed File Systems Jeffrey Pang, Srinivasan Seshan Carnegie Mellon University Phillip B. Gibbons, Michael Kaminsky Intel Research Pittsburgh Haifeng Yu National University of Singapore

2 Traditional DHTs Each node responsible for random key range Objects are assigned random keys

3 Defragmented DHT (D2) Related objects are assigned contiguous keys

4 Why Defragment DHTs? Traditional DHTD2 objects in two tasks: Improves Task Locality Task Success Rate –Depend on fewer servers when accessing files Task Performance –Fewer DHT lookups when accessing files x Fail! x x x Success x Fail! x Success

5 Why Defragment DHTs? Traditional DHTD2 objects in two tasks: Improves Task Locality Task Success Rate –Depend on fewer servers when accessing files Task Performance –Fewer DHT lookups when accessing files lookup

6 Why Defragment DHTs? Traditional DHTD2 objects in two tasks: Improves Task Locality Task Success Rate –Depend on fewer servers when accessing files Task Performance –Fewer DHT lookups when accessing files lookup

7 D2 Contributions Simple, effective locality techniques Real defragmented DHT implementation Evaluation with real file system workloads Answers to three principle questions…

8 Questions Can task locality be maintained simply? Does locality outweigh parallelism? Can load balance be maintained cheaply?

9 Questions Can task locality be maintained simply? Does locality outweigh parallelism? Can load balance be maintained cheaply?

10 Technique #1: Locality Preserving Keys Preserve locality in DHT placement –Assign related objects nearby keys Group related objects –Leverage namespace locality –Preserve in-order traversal of file system –E.G., files in same directory are related

11 Assigning Object Keys Bill Bob Docs … 611… 612… useridpath encodeblockid bid

12 Practical Concern Key distribution is not uniformly random –  Object placement is no longer uniform –  Load imbalance if node IDs are random Solution: –Dynamically balance load (Discussed later in talk)

13 DHT Nodes Accessed

14 Task Availability Evaluation How much does D2 reduce task failures? Task = sequence of related accesses –E.G., ‘ ls -l ’ accesses directory and files’ metadata –Estimate tasks: Evaluated infrastructure DHT scenario: –Faultload: PlanetLab failure trace (247 nodes) –Workload: Harvard NFS trace Task < 1sec access time

15 Systems Compared 8KB blocks Traditional-File (PAST) Traditional (CFS) D2

16 Task Unavailability What fraction of tasks fail? median

17 Questions Can task locality be maintained simply? Does locality outweigh parallelism? Can load balance be maintained cheaply?

18 Technique #2: Lookup Cache Task locality  objects on a few nodes –Cache lookup results to avoid future lookups Client-side lookup cache –Lookups give: IP Address  Key Range –If key in a cached Key Range, avoid lookup

19 Performance Evaluation How much faster are tasks in D2 than Traditional? Deploy real implementation on Emulab –Same infrastructure DHT scenario as before –Emulate world-wide network topology –All DHTs compared use lookup cache Task replay modes: –Sequential: accesses dependent on each other E.G., ‘make’ –Parallel: accesses not dependent on each other E.G., ‘cat *’

20 Performance Speedup How much faster is D2 vs. Traditional?

21 Performance Speedup How much faster is D2 vs. Traditional-File?

22 Questions Can task locality be maintained simply? Does locality outweigh parallelism? Can load balance be maintained cheaply?

23 Technique #3: Dynamic Load Balancing Why? –Object placement is no longer uniform –Random node IDs  imbalance Simple dynamic load balancing –Karger [IPTPS’04], Mercury [SIGCOMM’04] –Fully distributed –Converges to load balance quickly

24 Dynamic Load Balancing

25 Load Balance How balanced is storage load over time?

26 Load Balancing Overhead How much data is transferred to balance load? MB/node Writes Load Balance Writes Load Balance  50%  cost of 1 or 2 more replicas

27 Conclusions Can task locality be maintained simply? –Yes: Namespace locality enough to improve availability by an order of magnitude Does locality outweigh parallelism? –Yes: Real workloads observe % speedup Can load balance be maintained cheaply? –Maintain Balance? Yes: better than traditional –Cheaply? Maybe: 1 byte for every 2 bytes written Defragmentation: a useful DHT technique

28 Thank You.

29 Backup Slides Or, more graphs than anyone cares for…

30 Related Work MOAT [Yu et al., NSDI ‘06] –Only studied availability –No mechanism for load balance –No evaluation of file systems SkipNet [Harvey et al., USITS ‘03] –Provides administrative locality –Can’t do both locality + global load balance Mercury [Bharambe et al., SIGCOMM ‘04] –Dynamic load balancing for range queries –D2 uses Mercury as the routing substrate –D2 has independent storage layer, load balancing optimizations (see paper)

31 Encoding Object Names SHA-1 Hash SHA1(data) data Traditional key encoding: Example intel.pkey/home/bob/Mail/INBOX … b-tree-like 8kb blocks hash(data) SHA1(pkey)dir1block no.dir2...filever. hash D2 key encoding: 160 bits16 bits 64 bits 480 bits depth: 12 width: 65kpetabytes 160 bits

32 DHT Nodes Accessed How many nodes does a user access?

33 Task Unavailability What fraction of tasks fail? Application Tasks Human Tasks

34 Task Unavailability How many nodes are accessed per task?

35 Task Unavailability (Per-User) What fraction of each user’s tasks fail?

36 DHT Lookup Messages How many lookup messages are used?

37 Lookup Cache Utilization How many lookups miss the cache?

38 Performance Speedup (Per-Task) How much faster is D2 vs. Traditional?

39 Performance Speedup (Per-User) How much faster is D2 vs. Traditional?

40 Load Balance (Webcache) How balanced is storage load over time?

41 Load Balancing Overhead How much data is transferred to balance load?