Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
MSN 2004 Network Memory Servers: An idea whose time has come Glenford Mapp David Silcott Dhawal Thakker.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
A Scalable Content Addressable Network (CAN)
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
File System Implementation
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Look-up problem IP address did we see the IP address before?
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Hashing in Networked Systems COS 461: Computer Networks Spring 2011 Mike Freedman LB Server.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
CEN Network Fundamentals Chapter 19 Binding Protocol Addresses (ARP) To insert your company logo on this slide From the Insert Menu Select “Picture”
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Brocade Landmark Routing on P2P Networks Gisik Kwon April 9, 2002.
Data Structures & Algorithms and The Internet: A different way of thinking.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Peer to Peer Network Design Discovery and Routing algorithms
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSE 486/586 Distributed Systems Distributed Hash Tables
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables
(slides by Nick Feamster)
Accessing nearby copies of replicated objects
A Scalable content-addressable network
P2P Systems and Distributed Hash Tables
EECS 498 Introduction to Distributed Systems Fall 2017
Consistent Hashing and Distributed Hash Table
CSE 486/586 Distributed Systems Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Precept 6 Hashing & Partitioning 1 Peng Sun

Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2

Limitations of Round Robin Packets of a single connection spread over several servers 3

Multipath Load Balancing Balance load over multiple paths Round-robin? 4

Limitations of Round Robin Different RTT on paths Packet reordering 5

Data Partitioning Spread a large amount of data on multiple servers Random? Very hard to retrieve 6

Goals in Distributing Traffic Deterministic Flow-level consistency Easy to retrieve content from servers Low cost Very fast to compute/look up Uniform load distribution 7

Hashing to the Rescue Map items in one space into another space in deterministic way 8 H. Potter R. Weasley H. Granger T. M. Riddle Keys Hash Function … Hashes

Basic Hash Function Modulo Simple for uniform data Data uniformly distributed over N. N >> n Hash fn = mod n What if non-uniform? Typically split data into several blocks e.g., SHA-1 for cryptography 9

Hashing for Server Load Balancing Load Balancing Virtual IP / Dedicated IP Approach One global-facing virtual IP for all servers Hash clients’ network info (srcIP/port) Direct Server Return (DSR) 10

Load Balancing with DSR Reverse traffic doesn’t pass LB Greater scalability 11 LB Server Cluster Switches

Equal-Cost Multipath Routing Balancing flows over multiple paths Path selection via hashing # of buckets = # of outgoing links Hash network Info (src/dst IP) to links 12

Data Partitioning Hashing approach Hash data ID to buckets Data stored on machine for the bucket Cost: O(# of buckets) Non-hashing, e.g., “Directory” Data can be stored anywhere Maintenance cost: O(# of entries) 13

But… Basic hashing is not enough Map data onto k=10 servers with (dataID) mod k What if one server is down? Change to mod (k-1) ? Need to shuffle the data! 14

Consistent Hashing Servers are also in the Key Space (uniformly) Red Nodes: Servers’ positions in the key space Blue Nodes: Data’s position in the key space Which Red Node to use: Clockwise closest Bucket 14

Features of Consistent Hashing Smoothness: Addition/removal of bucket does not cause movement among existing buckets (only immediate buckets) Spread and load: Small set of buckets that lie near object Balanced: No bucket has disproportionate number of objects 16

Another Important Problem How to quickly answer YES or NO? Is the website malicious? Is the data in the cache? 17

Properties We Desire Really really quick for YES or NO Okay for False Positive Say Yes, but actually No Never False Negative Say No, but actually Yes 18

Bloom Filter Membership Test: In or Not In k independent hash functions for each data If all k spots are 1, the item is in. 19

Bloom Filter Only use a few bits Fast and memory-efficient Never gives a false negative Possible to have false positives 20

Demo of Bloom Filter 21 Start with an m bit array, filled with 0s. To insert, hash each item k times. If H i (x) = a, set Array[a] = To check if y is in set, check array at H i (y). All k values must be Possible to have a false positive: all k values are 1, but y is not in set.

Application of Bloom Filter Google Chrome uses BF: First look whether website is malicious Storage services (i.e., Apache Cassandra) Use BF to check cache hit/miss Lots of other applications… 22

23 Thanks!

Backup 24

Hashing in P2P File Sharing Two Layers: Ultrapeer and Leaf Leaf sends hash table of content to Ultrapeer Search request floods Ultrapeer network Ultrapeer checks hash table to find leaf 25

Applying Basic Strategy Consider problem of data partition: Given document X, choose one of k servers to store it Modulo hashing Place X on server i = (X mod k) Problem? Data may not be uniformly distributed Place X on server i = (hash(X) mod k) Problem? What happens if a server fails or joins (k  k±1)? 26

Use of Hashing Equal-Cost Multipath Routing Network Load Balancing P2P File Sharing Data Partitioning in Storage Services 27