School of Computing Clemson University Fall, 2012

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
An Engineering Approach to Computer Networking
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Internetworking.
Secure Overlay Services Adam Hathcock Information Assurance Lab Auburn University.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
MULTICASTING Network Security.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Computer Networks Layering and Routing Dina Katabi
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
S305 – Network Infrastructure Chapter 5 Network and Transport Layers.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Mr C Johnston ICT Teacher
Lecture (Mar 23, 2000) H/W Assignment 3 posted on Web –Due Tuesday March 28, 2000 Review of Data packets LANS WANS.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
IP Addressing. A 32-bit logical naming convention A dotted-decimal notation is used: – –Each number represents 8 bits. Number is Part.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer Information Systems Week 12: Naming
School of Computing Clemson University Fall, 2012
School of Computing Clemson University Fall, 2012
Review session For DS final exam.
Network Layer: IP Addressing
IP Routers – internal view
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
Distributed Hash Tables
CS 332: Algorithms Hash Tables David Luebke /19/2018.
(slides by Nick Feamster)
School of Computing Clemson University Fall, 2012
Subnetting Basics benefits Reduced network traffic
Chapter 5 Network and Transport Layers
Net 323: NETWORK Protocols
18-WAN Technologies and Dynamic routing
Intra-Domain Routing Jacob Strauss September 14, 2006.
Hash Tables Part II: Using Buckets
Hash Table.
DHT Routing Geometries and Chord
8PM – Quickly Overview Final Project
Part-D1 Priority Queues
A Scalable content-addressable network
5.2 FLAT NAMING.
P2P Systems and Distributed Hash Tables
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Communication Networks NETW 501
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
An Engineering Approach to Computer Networking
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Information Systems Week 12: Naming
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Lecture-Hashing.
Presentation transcript:

School of Computing Clemson University Fall, 2012 Lecture 6. Hashing Applications III CpSc 212: Algorithms and Data Structures Brian C. Dean School of Computing Clemson University Fall, 2012

Hashing in Networking: Routing Tables A router is a computer that forwards incoming packets along the appropriate output port based on an internal “routing table”. Routers need to operate quickly – packets must be forwarded as fast as possible… Routing table: 1.2.3.4 → port 3 5.6.7.8 → port 2 … Incoming packets from network Output ports

Hashing in Networking: Routing Tables Routing tables also contain rules that apply to entire blocks of destination IP addresses. Example: “1.2.*.*/16” stands for the block of IP addresses with 1.2 as their initial 16 bits. Multiple rules can now apply to an incoming packet Routing table: 1.2.0.0/16 → port 3 1.2.3.0/24 → port 2 … Incoming packets from network Output ports

Hashing in Networking: Routing Tables Routing tables also contain rules that apply to entire blocks of destination IP addresses. Example: “1.2.0.0/16” stands for the block of IP addresses with 1.2 as their initial 16 bits. Multiple rules can now apply to an incoming packet; most specific rule should be used. Routing table: 1.2.0.0/16 → port 3 1.2.3.0/24 → port 2 … Packet with destination 1.2.3.4 Output ports

Aside: Bit Fiddling An IP address is stored in a 4-byte (32-bit) unsigned integer. E.g. “1.2.3.4” is the integer 16909060, stored in binary as: A = 00000001000000100000001100000100 We often want to get / set / toggle individual runs of bits in a binary number like this. E.g., “set the 3rd ‘octet’ to 5 instead of 3”. “1” “2” “3” “4”

Aside: Bit Fiddling A & B computes the bitwise “AND” of A and B: A>>x shifts A right by x bits (divide by 2x) Example: extract the 3rd octet of A: (A&B)>>8 = [ …zeros… ] 00000011 Similarly, A<<x shifts A left by x bits (which multiplies by 2x)

Aside: Bit Fiddling A | B computes the bitwise “OR” of A and B: A ^ B computes the bitwise “XOR” of A and B: ~A is the bitwise complement of A (toggles all bits from 0 to 1 and vice-versa). How do we zero out the 3rd octet of A? A = A & (~(255 << 8)); How do we set the 3rd octet of A to a new value x? A = (A & (~(255 << 8))) | (x << 8)

XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a.

XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b;

XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b; Simple problem: given an integer array A[1…N], all the numbers in A except one occur an even number of times. Find the number appearing an odd number of times.

XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b; Simple problem: given an integer array A[1…N], all the numbers in A except one occur an even number of times. Find the number appearing an odd number of times. Solution: XOR(A[1] … A[N])

Another Nice Problem Alice and Betty are sitting across from each-other. Each is wearing a hat bearing the number 0 or 1. (and the numbers could be the same). They can’t see their own hats. Alice and Betty write down guesses A and B for their own number. If it turns out that A or B is correct, they win! Is there a strategy that is guaranteed to win?

Load Balancing Large websites often consist of multiple severs sitting behind a router. We would like to balance the load assigned to these servers… Server 1 Server 2 Incoming packets from network Router Server 3 Server 4

Load Balancing Random assignment would balance the load. However, it doesn’t consistently assign the same incoming source IP to the same server (useful if servers maintain “shopping carts” or other state). Good alternative solution: map packet with source IP address A to server h(A). Router Server 1 Server 2 Server 4 Server 3 Incoming packets from network

Consistent Hashing: Motivation Ok, so we are mapping packets with source IP address A to server h(A). What if servers fail unexpectedly, or are removed / added to the website. How can we update our assignments now? Router Server 1 Server 2 Server 4 Server 3 Incoming packets from network

Consistent Hashing: One Approach Hash source IPs AND servers to a circle. I.e., instead of mapping keys → table cells, map both the keys and cells to a common space! Server 1 Each IP address is now assigned to the next server in clockwise rotation from it on the circle. Does this fix the problem of servers failing? Server 3 9.10.11.12 1.2.3.4 5.6.7.8 Server 4 Server 2

Consistent Hashing: One Approach Why not map several instances of each server to the circle… Now load from a failing server is spread more uniformly across the other servers. So we have consistency in our assignments as well as some fault tolerance. Server 2 Server 1 Server 1 9.10.11.12 1.2.3.4 5.6.7.8 Server 1 Server 1 Server 3

Distributed Hash Tables (DHTs) What if we want to store a huge set of (key, value) pairs using a distributed network so that we can still perform insert, remove, and find? We can’t just replicate the entire table across every server in the network. We would ideally like a decentralized solution, which does not depend on a small set of “root” servers that effectively know the address of the server on which every object is stored.

Distributed Hash Tables: Birthday Paradox Solution? Suppose there are N total servers. What if we store each object in a hash table on only X servers, chosen randomly. To look up an object, we pick a random set of X servers and query them all. What is a good choice for X?

The “Chord” DHT Hash object keys and severs to a circle. Objects stored on the next clockwise server. Problem: there is no centralized “root” server that sees this entire picture, and can tell us our next clockwise server. We instead must be able to initiate an operation (insert, remove, find) by contacting an arbitrary server… Server 1 Server 3 Key 3 Key 1 Key 2 Server 4 Server 2

The “Chord” DHT Each server tracks the next server after it along the circle (actually, the next few, for fault tolerance purposes). Now, starting from any server, we can walk around the circle, server by server, until we stop at the first server whose hash is just past the hash of the key of the object we want to find. However, this can be slow if there are many servers (just like a linked list is slow…) How might we speed it up? Server 1 Server 3 Key 3 Key 1 Key 2 Server 4 Server 2

The “Chord” DHT Use longer-range links, like in a skip list! Each node points to a collection of servers around the circle… If each server maintains O(log N) links to successors at distances 1, 2, 4, 8, etc., then we can reach our final destination with only O(log N) hops! Server 1 “+1” “+8” Key 1 “+2” Key 2 “+4”

Hashing as a Segway Towards Machine Learning… h( ) = ?

Hashing as a Segway Towards Machine Learning…

Distilling Objects to Simpler “Feature Vectors”…

Distilling Objects to Simpler “Feature Vectors”… Unclassified object

Distilling Objects to Simpler “Feature Vectors”… Unclassified object