Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)

Slides:



Advertisements
Similar presentations
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Advertisements

Compressing Forwarding Tables Ori Rottenstreich (Technion, Israel) Joint work with Marat Radan, Yuval Cassuto, Isaac Keslassy (Technion, Israel) Carmi.
Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Data Structures Hash Tables
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.
Data Types and Data Structures
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
Cuckoo Hashing and CAMs Michael Mitzenmacher. Background For the past several years, I have had funding from Cisco to research hash tables and related.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Advanced Algorithms for Massive Datasets Basics of Hashing.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions Kevin Quinn Fall 2015.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Author : N. Sertac Artan, Haowei Yuan, and H. Jonathan Chao Publisher/Conf : IEEE GLOBECOM 2008 Speaker : Chen Deyu Data :
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Online Bipartite Matching with Augmentations Presentation by Henry Lin Joint work with Kamalika Chaudhuri, Costis Daskalakis, and Robert Kleinberg.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
CSCI 210 Data Structures and Algorithms
The Variable-Increment Counting Bloom Filter
CS 332: Algorithms Hash Tables David Luebke /19/2018.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Indexing and Hashing Basic Concepts Ordered Indices
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)

Hash Tables for Networking Devices  Hash tables and hash-based structures are often used in high-speed devices  Heavy-hitter flow identification  Flow state keeping  Flow counter management  Virus signature scanning  IP address lookup algorithms

Hash tables  In theory, hash tables are particularly suitable: O(1) memory accesses per operation (element insertion/query/deletion) for reasonable load  But in practice, there is a big difference between an average of 1.1 memory accesses per operation, and an average of 4  Why not only 1 memory access?  Collisions

Hash Tables for Networking Devices 123  Collisions are unavoidable  wasted memory accesses  For load≤1, let a and d be the average and worst- case time (number of memory accesses) per element insertion Objective: Minimize a and d Memory

Why We Care  On-chip memory: memory accesses  power consumption  Off-chip memory: memory accesses  lost on/off-chip pin capacity  Datacenters: memory accesses  network & server load  Parallelism does not help reduce these costs  d serial or parallel memory accesses have same cost

Traditional Hash Table Schemes  Example 1: linked lists (chaining) Memory

Traditional Hash Table Schemes  Example 1: linked lists (chaining)  Example 2: linear probing (open addressing)  Problem: the worst-case time cannot be bounded by a constant d Memory

High-Speed Hardware  Enable overflows: if time exceeds d → overflow list  Can be stored in expensive CAM  Otherwise, overflow elements = lost elements  Bucket contains h elements  E.g.: 128-bit memory word  h=4 elements of 32 bits  Assumption: Access cost (read & write word) = 1 cycle Memory h CAM 9

Possible Settings  Static setting - Insertions and queries only  Dynamic setting – Insertions, deletions, and queries.  Generalized setting – Balancing between the buckets’ load.

Problem Formulation Memory h CAM 9 Given average a and worst-case d of memory accesses per operation, Minimize overflow rate  Given average a and worst-case d of memory accesses per operation, Minimize overflow rate 

Example: Power of d-Random Choices  d hash functions: pick least loaded bucket.  Break ties u.a.r. [Azar et al.]  Intuition: can reach low  … but average time a = worst-case time d  wasted memory accesses Memory h CAM

Other Examples  d-left [Vöcking]  Same as d-random, but break ties to the left.  Cuckoo [Pagh et al.]  Whenever collision occurs, moves stored elements to their other choices.  Typically, uses much more than d memory accesses on average.

Outline  Static Case  Overflow Lower Bound  Optimal Schemes: SIMPLE, GREEDY, MHT.  Dynamic Case  Comparison with Static Case.  Overflow Lower Bound  Overflow Fraction Depending on d.

Overflow Lower Bound  Objective: given any online scheme with average a and worst-case d, find lower-bound on overflow . [h=4, load=n/(mh)=0.95, fixed d] No scheme can achieve (capacity region)

Overflow Lower Bound  Result: closed-form lower-bound formula  Given n elements in m buckets of height h:  Valid also for non-uniform hashes  For n=m and h=1, we get simply  Defines a capacity region for high- throughput hashing

Lower-Bound Example [h=4, load=n/(mh)=0.95] For 3% overflow rate, throughput can be at most 1/a = 2/3 of memory rate

Overflow Lower Bound  Example: d-left scheme: low overflow , but high average memory access rate a [h=4, load=n/(mh)=0.95, m=5,000]

The SIMPLE Scheme  SIMPLE scheme: single hash function  Looks like truncated linked list Memory h CAM

Performance of SIMPLE Scheme [h=4, load=0.95, m=5,000] The lower bound can actually be achieved for a=1

The GREEDY Scheme  Using uniform hashes, try to insert each element greedily until either inserted or d Memory h CAM d=2

Performance of GREEDY Scheme [d=4, h=4, load=0.95, m=5,000] The GREEDY scheme is always optimal until a co

Performance of GREEDY Scheme [d=4, h=4, load=0.95, m=5,000] Overflow rate worse than 4-left, but better throughput (1/a)

The MHT Scheme  MHT (Multi-Level Hash Table) [Broder&Karlin]: d successive subtables with their d hash functions Memory h CAM st Subtable2 nd Subtable3 rd Subtable

Performance of MHT Scheme  Optimality of MHT until cut-off point a co (MHT)  Proof that subtable sizes fall geometrically  Confirmed in simulations [d=4, h=4, load=0.95, m=5,000] Overflow rate close to 4-left, with much better throughput (1/a)

Outline  Static Case  Overflow Lower Bound  Optimal Schemes: SIMPLE, GREEDY, MHT.  Dynamic Case  Comparison with Static Case.  Overflow Lower Bound  Overflow Fraction Depending on d.

Dynamic vs. Static  Dynamic hash tables are harder to model than the static ones [Kirsch et al.]  But past studies show same asymptotic behavior with infinite buckets (insertions only vs. alternations)  traditional hashing using linked lists – maximum bucket size of approx. log n / log log n [ Gonnet, 1981]  d-random, d-left schemes – maximum bucket size of log log n / log 2 + O(1) [ Azar et al.,1994; Vöcking, 1999]  As a designer, using the static model seems natural.  Even if real-life devices have finite buckets

Degradation with Finite Buckets  Finite buckets are used.  Surprising result: degradation in performance FiniteInfinite H(1) = 3H(2) = 3 Remove 1 Element “2” is lost although its corresponding bucket is empty

Comparing Static and Dynamic  Static setting: insertions only  n = number of elements  m = number of buckets  Dynamic setting: alternations between element insertions and deletions of randomly chosen elements.  fixed load of c = n / (mh)  Fair comparison  Given an average number of memory accesses a, minimize overflow fraction .

Overflow Lower Bound  Overflow lower bound of where r = ach.  Also holds for non-uniformly distributed hash functions (under some constraints).  The lower bound is tight (Simple, Greedy)

Numerical Example  For h=1 and c=1 (100% load) we get a lower bound of 1/(1+a).  To get an overflow fraction of 1%, one needs at least 99 memory accesses per element.  Infeasible for high-speed networking devices  Compared to a tight upper bound of e -a in the static case. [Kanizo et al., INFOCOM 2009]  need ~4.6 memory accesses.

Outline  Static Case  Overflow Lower Bound  Optimal Schemes: SIMPLE, GREEDY, MHT.  Dynamic Case  Comparison with Static Case.  Overflow Lower Bound  Overflow Fraction Depending on d.

Overflow Fraction Depending on d  So far, we relaxed the constraint on d.  We considered n elements with an average of a memory accesses, as n  a distinct elements.  To take into account d, we must consider each element along with its own hash values.

Graph Theory Approach  Consider a bipartite graph.  Left vertices = Elements  Right vertices = Buckets (assume h=1).  Edge = The bucket is one of the element’s d choices

Graph Theory Approach  We get a random bipartite graph where each left vertex has degree d.  Expected maximum size matching = Expected number of elements that can be inserted to the table, that is, a lower bound.  We derived an explicit expression for d=2.  Upper bound can be achieved by Cuckoo hashing (equivalent to finding maximum size matching).

Summary  We found lower and upper bounds on the achievable overflow fraction both for the static and dynamic cases.  Static models are not necessarily exact with dynamic hash tables.  Improved lower bound for d=2 and a characterization of the performance of Cuckoo hashing.

Thank you.