Concurrent Hashing and Natural Parallelism

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hash Table indexing and Secondary Storage Hashing.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Hashing as a Dictionary Implementation Chapter 19.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
CSS446 Spring 2014 Nan Wang.  To understand the implementation of linked lists and array lists  To analyze the efficiency of fundamental operations.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Concurrent Hashing and Natural Parallelism Chapter 13 in The Art of Multiprocessor Programming Instructor: Erez Petrank Presented by Tomer Hermelin.
Chapter 5 Record Storage and Primary File Organizations
Sets and Maps Chapter 9.
Review Array Array Elements Accessing array elements
CHP - 9 File Structures.
Data Structures Using C++ 2E
Background on the need for Synchronization
CSCI 210 Data Structures and Algorithms
Dynamic Hashing (Chapter 12)
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
ITEC 202 Operating Systems
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Extra: B+ Trees CS1: Java Programming Colorado State University
Concepts of programming languages
Subject Name: File Structures
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hashing Exercises.
Search by Hashing.
Hashing and Natural Parallism
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Design and Analysis of Algorithms
Disk Storage, Basic File Structures, and Hashing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hashing CS2110.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Multicore programming
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Hashing Alexandra Stefan.
A Robust Data Structure
Cs212: Data Structures Computer Science Department Lecture 7: Queues.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Database Systems (資料庫系統)
Database Design and Programming
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Multicore programming
Sets and Maps Chapter 9.
Multicore programming
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Data Structures Unsorted Arrays
Data Structures & Algorithms
Chapter 6: Synchronization Tools
Collision Handling Collisions occur when different elements are mapped to the same cell.
Internal Representation of Files
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Concurrent Hashing and Natural Parallelism Unlike the other data structures we saw such as queue and list, which seem harder to parallel, hash tables are more intuitive. The problem of implementing a concurrent hash table seems easier, but as we progress through the presentation we will see that it is not trivial, because in many method calls we access different and unrelated locations in the table, therefore needing minimal synchronization. The basic principal of concurrent implementation is “we can buy more memory, but not more time”, so an algorithm which required more memory (with in reason), will be preferable, rather than one which takes plenty of time. Yoav Zuriel

Throwback Thursday Hash tables are an efficient way to implement a set Hash tables implement add, remove and contains There are two ways to implement: Open addressing Chain hashing (Closed addressing) Hash tables are sets where these methods of add, remove and contains take constant average time. They achieve it by using more memory and converting items to integers, to insert to the table. This method is called hashing (Java provides hashCode() to each object). The hash table we implement with an array called the table, where each entry is a reference to one or more items. In the serial implementation open addressing meant that each entry pointed to a single item. In chain hashing each entry is a bucket (typically a linked list or a set) of items with the same hash code.

Handling Collisions Open addressing: use a different hash function to hopefully find a new empty table entry Chain hashing: store in the same bucket When the table is “too full” a resize is required. Open addressing solves collisions with a different hash function, so hopefully a new empty entry will be mapped. On the other hand, when dealing with collisions using closed addressing we fill the same bucket until it is “too full”. Both implementations require to resize the table when it is “too full”. When talking about open addressing the table is too full when an empty slot cannot be found, and in chain hashing we resize when a bucket reaches a predetermined threshold. According to evidence there are much more “add”s rather than “remove”s, so we’ll talk only about extensible hashing (where the table only grows) Picture taken from: https://www.istockphoto.com/il/photo/breaking-billiard-ball-gm601158318-103394435?esource=SEO_GIS_CDN_Redirect

Chain Hashing (Closed Addressing) The concurrent implementation for chain hashing are easier so we will talk about them first.

Chain Hashing Base Class This abstract class implements the following: Basic constructor(initial capacity) contains(item), add(item) and remove(item) And requires derived classes to implement: acquire(item), release(item) policy() resize() We start by defining an abstract class called BaseHashSet, and all our future implementations will inherent from it. The method acquire catches the lock relevant to its parameter, item, and release, unlocks it. The method policy decides whether to call resize() to double the size of the hash table or not. Those four functions are not implemented in this class, and will change depending on the granularity of locking. We want acquire to be reentrant.

When to Resize in Chain Hashing? There are two options regarding when to resize the table When the average bucket size is bigger than a fixed threshold: 𝑠𝑒𝑡𝑆𝑖𝑧𝑒 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 >𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 Using two thresholds Bucket threshold Global threshold Bucket threshold (resize if a certain percentage of buckets exceed this limit) Global threshold (resize if one bucket exceeds this limit) For example: If more than 0.25 of the buckets exceed the Bucket Threshold, resize. Or if one bucket exceeds the Global Threshold.

Naïve Implementation: Coarse-Grained Hash Table One (reentrant) lock to rule them all

policy() and resize() policy(): Checks whether the average size of a bucket is bigger than 4 resize(): Lock Check no-one beat us to it Double the size and insert all the items with the new hash Because we haven’t locked the table yet (line 19), we need to check that no one resized the table before as. If so, no need to resize it ourselves. When resizing we need to enter all the existing items again to the table using the hash function.

Second Implementation: A Striped Hash Table Lock striping, a fixed size array of locks Lock i protects table entry j if 𝑗=𝑖 𝑚𝑜𝑑 𝐿 where L is the size of the locks’ array The size of the lock array is initially the size of the table, but as the table grows, it stays with the same size, and each lock is responsible for more entries. Why we do not resize the locks’ array? It is a waste of space to have a lock for each table entry, especially where the table is big and has low contention. Straightforward resizing the locks’ array is difficult, because the locks themselves are being used Screenshots taken from “the Art of Multiprocessor Programming”

StripedHashSet(), acquire() and release() Fields: An array of locks Constructor: Initialize the lock array acquire(): Lock the relevant lock using the lock size release(): Unlock the relevant lock

resize() Lock the lock array in an ascending order Check no-one beat us to it Double the table size Insert all the items with the new hash Unlock all the locks Like the coarse-grained hash table in resize we acquire all the locks, but in this situation, in order to avoid deadlocks, we acquire them one by one in an ascending order. It cannot deadlock with add, remove or contains, because those functions only lock one lock at a time, and two resizes cannot deadlock because the locks are acquired in the same ascending order and both start without any locks. In addition, here again we save the old capacity to make sure that the table is not resized more than once. The resize operation is a halt operation because it acquires all the locks. It wasn’t a problem in the coarse grained hash table because of its locking system, but here resize has a significant impact on the concurrency of the hash table.

Third Implementation: A Refined Hash Table The size of the locks’ array is not fixed, and grows as the table does Adding a field called owner to indicate when the table is being resized As we saw in the striped hash table, it is complicated to resize the locks’ array while in use, so we’ll use another synchronizing mechanism to protect this array.

AtomicMarkableReference<Thread> owner Combines a Boolean value, and a thread reference, and updates atomically By default false. When true refers to the thread resizing Acts as a mutual exclusion flag between resize and other methods We add a marker that combines a Boolean value, and a thread reference, which normally be false. When the marker is true, it will have a reference to the thread which is resizing the table, so it is used as a mutual exclusion flag between resize and add methods, so no updates occur while resizing, and vice versa. To make the changes atomic we use the special class AtomicMarkableReference. Because resize is rare, and it is the only function to write to owner, the value of owner should be cached.

RefinableHashSet<T>() Fields: An array of locks AtomicMarkableReference owner Constructor: Initialize the lock array Initialize owner with false and null

resize() Try to claim ownership of the table If fail, someone else is resizing so no need to resize Check no-one beat us to it Spin until all the locks are free Double the table’s size, and the lock array’s size Insert all the items with the new hash Give up ownership Screenshots taken from “the Art of Multiprocessor Programming”

acquire() and release(): Spin until no thread is resizing Save the old lock array and the relevant lock Check no-one is resizing and the lock is valid Otherwise, try locking again release(): Unlock the relevant lock The method get with the object owner, writes the Boolean value to mark[0] and returns the Thread value.

Fourth Implementation: Lock-Free Hash Table By resizing the table a bit with each add(), the method resize() will not lock the whole table It is not enough to make the buckets lock-free In the previous implementations, once resize occurs, no other methods can progress. To avoid it, we will make the resize incremental, with each add. To make the hash table lock-free, we need to atomically move entries from the old buckets to the new ones, so lock-free buckets are not enough. And why is that? If the move is not done atomically, entries might be temporarily lost or duplicated.

Recursive Split Ordering Different approach to hashing Each bucket is a reference to a lock free list As the list grows, we move the buckets around so no item is too far from the start of a bucket An invariant of the algorithm is once an item is inserted, it must not be moved. In the implementation presented here we will has a different approach than the one before, instead of moving the items between the buckets, the buckets will move between the items In order to achieve this invariant we use the recursive split order algorithm

Recursive Split Ordering 4 2 6 1 5 3 7 Let’s look at a simple example, Suppose the items 0 through 7 need to be hashed into the table. If we have one bucket, it can only point to the head of the list. Once we have another bucket, we expect it to split the list in 2, and with 4 buckets, each of the two new buckets will recursively split each of those halves, so that in the end, there are only a constant number of items between any two bucket pointers. In order to allow us to redirect the bucket pointers in this recursive manner, we have to insert the items in such a special order, and order that allows to recursively split buckets. What is this magical order? Taken from this chapter’s companion slides

Recursive Split Ordering 1/2 4 2 6 1 5 3 7 1 Taken from this chapter’s companion slides

Recursive Split Ordering 1/4 1/2 3/4 4 2 6 1 5 3 7 1 2 3 Taken from this chapter’s companion slides

Recursive Split Ordering LSB 00 LSB 10 LSB 01 LSB 11 4 2 6 1 5 3 7 1 2 3 Taken from this chapter’s companion slides

Split Ordered Hashing 000 001 010 011 100 101 110 111 4 2 6 1 5 3 7 1 2 Talk about sentinel nodes 3 Taken from this chapter’s companion slides

Concept The array reference to logical buckets In order to find an item, we traverse the list. The buckets are used as short-cuts Bucket nodes should be well distributed We insert new buckets as the table grows to maintain the distribution of the buckets. The well distribution of the buckets ensures finding items in a constant time As before, the capacity of the table is a power of 2, but in this case, we begin with one bucket, indexed 0, and a capacity of 2 items. As we said, when modifying the table we size it up a bit by bit, so there is no explicit resize() method. If the capacity is 2 𝑖 , then bucket b contains items with hash code k, if 𝑘≡𝑏 𝑚𝑜𝑑 2 𝑖 When resizing the table, an item might change its bucket, for instance, 𝑘≡𝑏 𝑚𝑜𝑑 2 𝑖 ≡𝑏+ 2 𝑖 𝑚𝑜𝑑 2 𝑖 . The solution is to position these two buckets, one next to the other, so the split will happen just by inserting a bucket node. As we can see, because of this method, the items are ordered by their bit-reversed value, this order is called recursive split-ordering. To avoid the “corner case” of deleting the item referenced by a bucket, we add a different node, called sentinel node, which cannot be deleted (the rectangle one) Screenshots taken from “the Art of Multiprocessor Programming”

Initialization of Buckets 16 4 1 9 7 15 1 2 When we allocate a new block of buckets they’re uninitialized, and the work of initializing a bucket and redirecting it’s pointer is done incrementally. Buckets are initialized when they are first accessed. This is important in real-time applications since it means that there is no prolonged resized phase. 3 Taken from this chapter’s companion slides

Initialization of Buckets 16 4 1 9 7 15 3 1 2 When we allocate a new block of buckets they’re uninitialized, and the work of initializing a bucket and redirecting it’s pointer is done incrementally. Buckets are initialized when they are first accessed. This is important in realtime applications since it means that there is no prolonged resized phase. explain 3 Need to initialize bucket 3 to split bucket 1 3 in list but not connected to bucket yet Now 3 points to sentinel – bucket has been split Taken from this chapter’s companion slides

Must initialize bucket 2 Adding 10 10 = 2 mod 4 16 4 1 9 3 7 2 2 1 2 3 Must initialize bucket 2 Then can add 10 Taken from this chapter’s companion slides

BucketList<T> Implements the concept of bucket being a lock-free list Has makeOrdinaryKey(), makeSentinelKey() and getSentinel() Also implements all the basic methods (add, remove and contains) This class is essentially is the same as LockFreeList, but has two differences: When adding a new item to a BucketList the recursive split order is kept (to ensure the hashCode is positive we take only the lowest 3 bytes) In LockFreeList we only hade two sentinel nodes, and in here we have a sentinel node in the beginning of each bucket, and when the table resizes, a bucket contains another bucket.

Utilities makeOrdinaryKey(): Take the lowest 3 bytes, set the MSB and reverse it makeSentinelKey(): Return the lowest 3 bytes reversed

getSentinal() getSentinel(): Find the node with the closest value If the key matches, return the bucket Otherwise, create a new node Insert it into the list Check the state is the same Otherwise, try again When using the find method, current is the closest node to have key as its key. If no node in the list has that key, current’s key will be bigger than key. If current has the requested key we return it. Otherwise, we need to create a need to create the new node and insert it to the list. First we set the new node’s next, and then we set prev to point to node. We use CAS to make sure that we do not insert the same sentinel twice. If something changed we try again.

LockFreeHashSet<T> No locks at all A fixed size array of BucketList<T> Two atomic integers bucketSize, setSize

LockFreeHashSet<T>() Fields: Array of bucketList Bucket size Item counter Constructor: Initialize the bucketList array Initialize bucket[0] Initialize counters

Utilities initializeBucket(): Get the parent index Initialize it if need be (recursive call) Create the new sentinel node using getSentinel getBucketList: Return the requested bucket (might initialize it) The use of shortcuts can be seen in line 33

Recursive Initialization 7 = 3 mod 4 Add 7 to the list 8 12 1 3 = 1 mod 2 Must initialize bucket 1 But EXPECTED depth is constant Could be log n depth 1 2 Initially the table has 4 buckets (can be achieved by adding the right items, and then removing them) Only bucket 0 is initialized Insert 7, so we need to initialize bucket 3, which needs bucket 1 to be initialized (which needs bucket 0, but bucket 0 is initialized) We initialize bucket 1 by adding a sentinel node Bucket 3 is initialized in the same way, and 7 is inserted we can’t postpone initializing bucket 1 because 7 is 1 modulo 2, and someone may look for 7 thinking that there are only 2 buckets. Must initialize bucket 3 3 Taken from this chapter’s companion slides

add() Get the appropriate bucket Return false if b contains x Update the number of items in the table Check if the number of buckets needs to be doubled. AKA policy + resize Screenshots taken from “the Art of Multiprocessor Programming”

Open Addressing Open addressing seems to be a bit more complicated because each entry should hold only one item. In order to implement this kind of algorithm we’ll use the cuckoo hashing algorithm.

Cuckoo Hashing An open addressing hashing algorithm, where the newly added item displaces any earlier item occupying the same slot. Two independent hash functions ℎ 0 , ℎ 1 For simplicity’s sake, we have two tables with size of 𝑘, where ℎ 0 is used for the first table, and ℎ 1 is used for the second. ℎ 0 , ℎ 1 :𝐾𝑒𝑦𝑅𝑎𝑛𝑔𝑒→0,1,…,𝑘−1 Picture taken from: https://www.rspb.org.uk/birds-and-wildlife/bird-and-wildlife-guides/bird-a-z/c/cuckoo/

Why do we check if contains(x)? Basic Methods contains(): checks if 𝑡𝑎𝑏𝑙𝑒 0 ℎ 0 𝑥 or 𝑡𝑎𝑏𝑙𝑒 1 ℎ 1 𝑥 equals to 𝑥. Remove, does the same as contains, just removes 𝑥 if found. add(): Check if table contains 𝑥 Try to swap in and out items LIMIT times, if cell is taken Questions: Why do we check if contains(x)? Why do we have a limit? The first if swaps table[0][hash0(x)] with x, and if the previous value is null, we didn’t take any item’s slot. Otherwise, we move the item we just swapped to table[1][hash1(x)] (not the same x), and try again. We loop LIMIT times to hopefully find a new place for every item. If not we resize the table and add the last item we could find an entry to. We check the contains in the beginning of the function to avoid duplicates. In the previous implementations, x would end up in only one entry, in this case it could be in two, so we check first. We have a limit to avoid cycles of displacing which will never end, and to keep the complexity constant.

Example of Displacement Add 14 to the table 12 3 39 We try to add the key 14, which is mapped to the taken entry of 5 in table[0]. Now we take 23, and map it to the taken entry 1. It’s 12’s turn, and it is mapped to slot number 3, and takes 39 place. Finally, 39 is place in slot number 6 in table[1]. 23 14 Screenshots taken from “the Art of Multiprocessor Programming”

Example of Displacement Relocate 23 12 3 39 We try to add the key 14, which is mapped to the taken entry of 5 in table[0]. Now we take 23, and map it to the taken entry 1. It’s 12’s turn, and it is mapped to slot number 3, and takes 39 place. Finally, 39 is place in slot number 6 in table[1]. 23 14 Screenshots taken from “the Art of Multiprocessor Programming”

Example of Displacement Relocate 12 23 12 3 39 We try to add the key 14, which is mapped to the taken entry of 5 in table[0]. Now we take 23, and map it to the taken entry 1. It’s 12’s turn, and it is mapped to slot number 3, and takes 39 place. Finally, 39 is place in slot number 6 in table[1]. 14 Screenshots taken from “the Art of Multiprocessor Programming”

Example of Displacement Relocate 39 23 3 12 39 We try to add the key 14, which is mapped to the taken entry of 5 in table[0]. Now we take 23, and map it to the taken entry 1. It’s 12’s turn, and it is mapped to slot number 3, and takes 39 place. Finally, 39 is place in slot number 6 in table[1]. In practice cuckoo hashing is very attractive for its simplicity. Contains and remove are constant and over time, it can be shown that add is constant as well. 14 Screenshots taken from “the Art of Multiprocessor Programming”

Cuckoo Hashing Base Class Break each method to phases (add, remove, displace) The table is made with fixed size probes, which grow up to PROBE_SIZE When no method calls are in progress, no probe holds more than THRESHOLD items A simple add may cause a long sequence of displacements, so in the concurrent implementation we divide each method to phases. In this case we might end up with more than one item in an entry, so each entry will be a probe, a fixed sized set of items. When no method calls are in progress each probe holds up to THRESHOLD, but when there is a method in progress it is guaranteed that no probe holds more than PROBE_SIZE items.

Cuckoo Hashing Base Class This abstract class implements the following: Basic constructor() contains(item), add(item) and remove(item) relocate(table, probe) And requires derived classes to implement: acquire(item), release(item) resize() This abstract class as BaseHashSet<T> implements add, remove and contains, and requires acquire, release and resize. In the case of cuckoo hashing we do not need policy() to know when to resize, we resize when one probe is full, and an item is trying to be inserted to it, Or when the displacements fails. In addition, the displacement is implemented in the method relocate()

Concurrent Add Add 13 to the table 13 12 4 40 24 23 5 Screenshots taken from “the Art of Multiprocessor Programming”

Concurrent Displacement 12 4 40 24 Add 14 to the table 14 13 23 5 Screenshots taken from “the Art of Multiprocessor Programming”

Concurrent Displacement 12 4 40 24 Relocate 23 13 23 5 14 Screenshots taken from “the Art of Multiprocessor Programming”

Concurrent Displacement 12 4 40 24 Relocate 23 13 23 5 14 Screenshots taken from “the Art of Multiprocessor Programming”

remove() and contains() Look at both relevant probes, 𝑡𝑎𝑏𝑙𝑒 0 ℎ 0 𝑥 and 𝑡𝑎𝑏𝑙𝑒 1 ℎ 1 𝑥 , and check if 𝑥 is present Remove does the same as contains, just remove 𝑥 if found Contains is implemented the same way

add() Acquire the lock for 𝑥 Return false, if 𝑥 is present Check if one of the relevant probes have less than THRESHOLD If so, insert 𝑥 and return true Check if one of the probes have less than PROBE_SIZE If so, insert 𝑥 and mark to balance the probe Otherwise, resize the table, and add(x) If the balancing failed, resize

relocate() Get the first item in the probe to balance (called y) Do LIMIT iterations Try to remove 𝑦 If the other probe has less than THRESHOLD, relocate and end If the other probe has less than PROBE_SIZE, relocate and mark to balance Otherwise, return false Failed to remove 𝑦, if the current probe still needs to be resized, continue, otherwise return true Return false

First Implementation: Striped Cuckoo Hashing Same as StripedHashSet Two dimensional array of locks, so 𝑙𝑜𝑐𝑘𝑠[𝑖][𝑗] protects 𝑡𝑎𝑏𝑙𝑒[𝑖][𝑘] where 𝑗=𝑘 𝑚𝑜𝑑 𝐿 . The coarse grained implementation is trivial, so it won’t be covered in this presentation.

resize() Acquire all the locks in locks[0] in an ascending order Check no-one beat us to it Double the table size Insert the items with the new hash In order to acquire lock[1][j] we need to acquire lock[0][k], which is locked already.

Second Implementation: Refined Cuckoo Hashing Same as the refined hash set

Summary Chain Hashing Cuckoo Hashing Coarse-grained Striped hash table Refined hash table Lock-free We have seen 6 implementations here, starting with the most naïve one, up to the most complicated one, and then back to more basic implementations for cuckoo hashing. Our implementation discussed 4 different points of view, the first for low contention, to have one lock for the whole table. The second, to have a fixed size locks array, with medium contention. After that, we wanted the locks array to grow as the table does, so we introduce the refined implementation, so each entry had its own lock. Finally, we have the lock free implementation where we decided that items should not be moved after adding them, but rather the buckets themselves should move. We used the recursive split order to keep the hash table sorted, so initializing buckets would be easier. The cuckoo hashing provided similar implementations to the chain hashing, but we didn’t cover all there is.