Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Hashing. 2 Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Hash Tables1 Part E Hash Tables  
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Lecture 10: Search Structures and Hashing
Hashing. 2 Preview A hash function is a function that: When applied to an Object, returns a number When applied to equal Objects, returns the same number.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Disk Storage, Basic File Structures, and Hashing
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
ADSA: Hashing/ Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing,
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Doctoral Dissertation Proposal: Acceleration of Network Processing Algorithms Sailesh Kumar Advisors: Jon Turner, Patrick Crowley Committee: Roger Chamberlain,
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing. Searching Consider the problem of searching an array for a given value If the array is not sorted, the search requires O(n) time If the value.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing.
Hashing.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advance Database System
Hashing.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing.
Algorithms: Design and Analysis
Hashing.
Hashing.
Hashing.
Hashing.
Lecture No.42 Data Structures Dr. Sohail Aslam.
Presentation transcript:

Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley

2 - Sailesh Kumar - 10/21/2015 Overview n Overview of Hash Tables and Segmented Hash Table n Analysis and Limitations »Increased memory references n Adding Bloom Filters per segment n Selective Filter Insertion Algorithm n Simulation Results and Analysis n Conclusion

3 - Sailesh Kumar - 10/21/2015 Hash Tables n Consider the problem of searching an array for a given value »If the array is not sorted, the search requires O(n) time »If the array is sorted, we can do a binary search –O(lg n) time »Can we do in O(1) time –Hash table –Use hash function to map elements to table cells

4 - Sailesh Kumar - 10/21/2015 Hash Tables n Suppose our hash function gave us the following values: »hash("apple") = 5 hash("watermelon") = 3 hash("grapes") = 8 hash("cantaloupe") = 7 hash("kiwi") = 0 hash("strawberry") = 9 hash("mango") = 6 hash("banana") = 2 »hash("honeydew") = 6 n This is called collision »Now what kiwi banana watermelon apple mango cantaloupe grapes strawberry

5 - Sailesh Kumar - 10/21/2015 Collision Resolution Policies n Linear Probing »Successively search for the first empty subsequent table entry n Linear Chaining »Link all collided entries at any bucket as a linked-list n Double Hashing »Uses a second hash function to successively index the table

6 - Sailesh Kumar - 10/21/2015 Performance Analysis n Average performance is O(1) n However, worst-case performance is O(n) n In fact the likelihood that a key is at a distance > 1 is pretty high These keys will take twice time to be probed These will take thrice the time to be probed Pretty high probability that throughput is half or three times lower than the peak throughput

7 - Sailesh Kumar - 10/21/2015 Hashing in Network Processors n High query latency (memory access) »Hide latency with multiple threads Query requests Hash Table

8 - Sailesh Kumar - 10/21/2015 Segmented Hashing n Uses power of multiple choices »has been proposed earlier by Azar et al n A N-way segmented hash »Logically divides the hash table array into N equal segments »Maps the incoming keys onto a bucket from each segment »Picks the bucket which is either empty or has minimum keys k i h( ) k i is mapped to this bucket k i+1 h( ) k i+1 is mapped to this bucket A 4-way segmented hash table 1 2

9 - Sailesh Kumar - 10/21/2015 Segmented Hash Performance n More segments improves the probabilistic performance »With 64 segments, probability that a key is inserted at distance > 2 is nearly zero even at 100% load »Improvement in average case performance is still modest

10 - Sailesh Kumar - 10/21/2015 An obvious Deficiency n Even though distance of keys are one, every query requires at least N memory probes »Average probes are O(N) compared to O(1) of a naive table –If things are bandwidth limited, N times lower throughput n In order to ensure O(1) operations, segmented hash table uses on-chip Bloom filters »On-chip memory requirements are quite modest, 1-2 bytes per hash table bucket n Each segment has a Bloom filter, which supports membership queries »These on-chip filters are queried before actually making an off-chip hash table memory reference

11 - Sailesh Kumar - 10/21/2015 Adding per Segment Filters k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits We can select any of the above three segments and insert the key into the corresponding filter

12 - Sailesh Kumar - 10/21/2015 False Positive Rates n With Bloom Filters, there is likelihood of false positives »A filter might say that the key is present in its segment, while key is actually not present n With N segments, clearly the false positive rates will be at least N times higher »In fact, it will be even higher, because we have to also consider several permutations of false positives n We propose Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitudes

13 - Sailesh Kumar - 10/21/2015 Selective Filter Insertion Algorithm k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits Insert the key into segment 4, since fewer bits are set. Fewer bits are set => lower false positive With more segments (or more choices), our algorithm sets far fewer bits in the Bloom filter

14 - Sailesh Kumar - 10/21/2015 Selective Filter Insertion Results

15 - Sailesh Kumar - 10/21/2015 Selective Filter Insertion Details n First we build the set of segments where the arriving key can be inserted, we call it {minSet} »i.e. these segments will have minimum and equal collision chain length at the corresponding hash index n A naive or greedy algorithm will choose the segment, where least number of bits are set in the Bloom filter »Leads to unbalanced segments »An already loaded segment is likely to receive further keys because its filter array is more likely to have fewer transitions »Our simulations suggest that an enhancement in the insertion algorithm reduces the false positive further by up to an order of magnitude

16 - Sailesh Kumar - 10/21/2015 Selective Filter Insertion Enhancement n Our aim is to try to keep the segments balanced while also trying to reduce the bit transitions in the Bloom filters 1. Label segments in the set {minSet} eligible if its occupancy is less than (1+δ) times the occupancy of the least occupied segment. Parameter δ is typically set at 0.1 to If no segment remains eligible, select the least occupied segment from {minSet} 3. Otherwise choose a segment from {minSet}, which has minimum bit transitions 4. If multiple such segments exist, choose the least occupied one 5. If multiple such segments are again found, break the tie with a round-robin arbitration policy

17 - Sailesh Kumar - 10/21/2015 Simulation Results n 64K buckets, 32 bits/entry Bloom filter. n Simulation runs for 500 phases. »During every phase, 100,000 random searches are performed. Between every phase 10,000 random keys are deleted and inserted.

18 - Sailesh Kumar - 10/21/2015 Effectiveness of Modified Bloom Filters n Plotting average memory references at different successful search rates. »Lower memory references reflects the effectiveness of filters. Load is kept at 80%.

19 - Sailesh Kumar - 10/21/2015 Questions?