MIT Consistent Hashing: Load Balancing in a Changing World David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy.

Slides:



Advertisements
Similar presentations
Paging: Design Issues. Readings r Silbershatz et al: ,
Advertisements

CMPT 431 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Key Algorithms in a Content Delivery System Akamai Technologies and Carnegie Mellon University Bruce Maggs.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
ICS 421 Spring 2010 Indexing (2) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 2/23/20101Lipyeow Lim.
Module 8: Concepts of a Network Load Balancing Cluster
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Hash Tables1 Part E Hash Tables  
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Secure Overlay Services Adam Hathcock Information Assurance Lab Auburn University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Cache Memory By Sean Hunter.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Consistent Hashing: Load Balancing in a Changing World
PMIT-6102 Advanced Database Systems
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
User side and server side factors that influence the performance of the website P2 Unit 28.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
CS294, YelickLoad Balancing, p1 CS Distributed Load Balancing
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Administering Shared Folders Understanding Shared Folders Planning Shared Folders Sharing Folders Combining Shared Folder Permissions and NTFS Permissions.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: Memory.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Configuring File Services. Using the Distributed File System Larger enterprises typically use more file servers Used to improve network performce Reduce.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
OLE Slide No. 1 Object Linking and Embedding H OLE H definition H add other information to documents H copy.
 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
Towards a Scalable and Robust DHT Baruch Awerbuch Johns Hopkins University Christian Scheideler Technical University of Munich.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Md Tareq Adnan Centralized Approach : Server & Clients Slow content must traverse multiple backbones and long distances Unreliable.
System Models Advanced Operating Systems Nael Abu-halaweh.
Consistent Hashing. Hashing E.g., h(x) = (((a x + b) mod P) mod |B|), where P is prime, P > |U| a,b chosen uniformly at random from Z P x is a serial.
CS 545 – Fundamentals of Stream Processing – Consistent Hashing
CT1503 Network Operating System
Large-scale file systems and Map-Reduce
Ramya Kandasamy CS 147 Section 3
Wireless Sensor Networks 7. Geometric Routing
Ivy Eva Wu.
Edge computing (1) Content Distribution Networks
Outline Midterm results summary Distributed file systems – continued
High Performance Computing
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Database System Architectures
Consistent Hashing and Distributed Hash Table
Presentation transcript:

MIT Consistent Hashing: Load Balancing in a Changing World David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy

MIT Caches can Load Balance l Numerous items in central server. l Requests can swamp server. l Distribute items among caches. l Clients get items from caches. l Server gets only 1 request per item Items distributed among caches Users get items from caches Server

MIT Who Caches What? l Each cache should hold few items »else cache gets swamped by clients l Each item should be in few caches »else server gets swamped by caches »and cache invalidations/updates expensive l Browser must know right cache »fast, local computation

MIT A Solution: Hashing Example: y = ax+b (mod n ) l Intuition: Assigns items to “random” caches »few items per cache l Easy to compute which cache holds an item Server items assigned to caches by hash function. Users use hash to compute cache for item.

MIT Problem: Adding Caches l Suppose a new cache arrives. l How work it into hash function? l Natural change: y=ax+b (mod n+1) l Problem: changes bucket for every item »every cache will be flushed »servers get swamped with new requests Goal: when add bucket, few items move

MIT Problem: Inconsistent Views l Each client knows about a different set of caches: its view l View affects choice of cache for item »Same item may hash to many places: caches swamp server with request for item »Many items may hash to same place: clients swamp cache l Goal: despite views, items evenly distributed into a few caches each

MIT Solution: Consistent Hashing l Use standard hash function to map caches and items to points in unit interval. »“random” points spread uniformly l Item assigned to nearest cache in view Cache (Bucket) item Computation easy as standard hash function

MIT Properties l All buckets get roughly same number of items (like standard hashing). When k th bucket is added only a 1 / k fraction of items move. »and only from a few caches When a cache is added, minimal reshuffling of cached items is required.

MIT Multiple View Properties l Despite multiple views, each cache gets few items »no cache overloaded l Despite multiple views, each item only in few caches. »server protected, cache updates easy System tolerates multiple, inconsistent views of caches (also fault tolerant).

MIT Load Balancing l Task: distribute items into buckets »Data to memory locations »Files to disks »Tasks to processors »Web pages to caches (our motivation) l Goal: even distribution

MIT Problem: No Synchronization view l Each user knows about a different set of caches: a view l View affects assignment of items to caches l Problems when there are multiple views: items View 1 View 2 View 4 View 3 l The items assigned to a specific cache are different in each view. l These sets could be essentially disjoint for standard hash functions. l Over all views, cache is responsible for too many items. ‡ Cache not large enough to contain active set of items items assigned to one cache over 4 views

MIT Multiple Views: Cont.. l Item may be assigned to different caches in different views. l Standard hash function may assign item to a different cache in every view. l Result: item requested from many caches »Server swamped with requests for copies of the item. »Hard to update cached copies item View 1 View 2 View 3 View 4 item assigned to different caches in each of 4 views

MIT Problem: Adding Caches l New cache means new hash function »natural change: y = ax+b (mod n+1) l Standard hash functions completely redistribute items when the range of function changes: »Every cache will be flushed »server is swamped with requests since items are reshuffled between caches. »Need to broadcast the new hash function to all users at the same time –some kind of global synchronization?...