Multicore programming

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables1  
Advertisements

Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hash Tables1 Part E Hash Tables  
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables1   © 2010 Goodrich, Tamassia.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Chapter 9 slide 1 Introduction to Search Algorithms Search: locate an item in a list (array, vector, table, etc.) of information Two algorithms (methods):
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
An algorithm of Lock-free extensible hash table Yi Feng.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
CSE 1342 Programming Concepts
Review Array Array Elements Accessing array elements
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
COP Introduction to Database Structures
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Week 8 - Wednesday CS221.
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Queues Queues Queues.
Data Structures Using C++ 2E
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Search by Hashing.
Expander: Lock-free Cache for a Concurrent Data Structure
Linked Lists head One downside of arrays is that they have a fixed size. To solve this problem of fixed size, we’ll relax the constraint that the.
Hash Tables.
Data Structures 2018 Quiz Answers
Advanced Associative Structures
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Hashing.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Lecture 6: Transactions
Introduction to Database Systems
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hashing Alexandra Stefan.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Multicore programming
Multicore programming
Multicore programming
Data Structures Sorted Arrays
Multicore programming
Database Design and Programming
Multicore programming
CS210- Lecture 5 Jun 9, 2005 Agenda Queues
Multicore programming
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Multicore programming
Data Structures & Algorithms
slides created by Marty Stepp
Multicore programming
Collision Handling Collisions occur when different elements are mapped to the same cell.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Dictionaries and Hash Tables
Presentation transcript:

Multicore programming Unordered sets: concurrent hash tables Week 2 - Wednesday Trevor Brown http://tbrown.pro

Last time Queues Relaxed data structures Ordered vs unordered sets Strict ordering not so helpful, and non-scalable Relaxed data structures Somewhat ordered for better scalability Multi-Queue: scalable, and ordering can be “close to” FIFO w.h.p Ordered vs unordered sets Fixed size, probing, insert-only hash tables (more this class)

Simplest hash table possible: Locking, Insert-only, probing, fixed capacity Array of Bucket: data[] Bucket: (Lock, Key) Simplest possible locking protocol: Lock bucket before any access (read or write) to its key Insert(7) hashes to 2 7 9 3 Insert(3) hashes to 5 Insert(9) hashes to 2

Implementation and correctness Reasoning about correctness: Important lemmas Buckets change only once, from NULL to non-NULL (not sketching here) If we probe a sequence of cells at times t1 < t2 < … < tn, and see non-NULL values, then we will see the same values if we do all probing atomically at time tn or later Linearization points (LPs) Return@8: last read of data[index] Return@12: write to data[index] Return@16: last read of data[index] int insert(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 lock(index); 5 int found = data[index]; 6 if (found == key) { 7 unlock(index); 8 return false; 9 } else if (found == NULL) { 10 data[index] = key; 11 unlock(index); 12 return true; 13 } 14 unlock(index); 15 } 16 return FULL;

Linearizability proof sketch Case Return@16 (LP: last read of data[index]) Easy case We returned at 16 after seeing data[index] != NULL for every index Non-NULL buckets don’t change So, when we do the last read of data[index], all buckets are full! So, our insert would return FULL if performed atomically at the LP int insert(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 lock(index); 5 int found = data[index]; 6 if (found == key) { 7 unlock(index); 8 return false; 9 } else if (found == NULL) { 10 data[index] = key; 11 unlock(index); 12 return true; 13 } 14 unlock(index); 15 } 16 return FULL;

Linearizability proof sketch Case Return@8 (LP: last read of data[index]) Suppose we return@8 in the k-th iteration In the first k-1 iterations, we see data[index] != NULL and != key Claim: if our insert(key) ran atomically at the LP, it would perform k iterations, and see the same values as it sees in the concurrent execution In its k-th iteration, the atomic insert(key) reads data[index] at the same time as the concurrent insert(key), so it sees data[index] == key So, it returns false int insert(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 lock(index); 5 int found = data[index]; 6 if (found == key) { 7 unlock(index); 8 return false; 9 } else if (found == NULL) { 10 data[index] = key; 11 unlock(index); 12 return true; 13 } 14 unlock(index); 15 } 16 return FULL; Insert(key) time … LP: Iteration k reads data[index] Iteration 0 reads data[index] Iteration k-1 reads data[index] != NULL, != key, & does not change! != NULL, != key, & does not change! The values read in previous iterations are unchanged!

Linearizability proof sketch Case Return@12 (LP: write to data[index]) Argument is similar to the previous case: Suppose we return@12 in the k-th iteration In the first k-1 iterations, we see data[index] != NULL and != key Claim: if our insert(key) ran atomically at the LP, it would perform k iterations, and see the same values as it sees in the concurrent execution In its k-th iteration, the atomic insert(key) reads data[index] at the same time as the concurrent insert(key), so it sees data[index] == NULL So, it returns true int insert(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 lock(index); 5 int found = data[index]; 6 if (found == key) { 7 unlock(index); 8 return false; 9 } else if (found == NULL) { 10 data[index] = key; 11 unlock(index); 12 return true; 13 } 14 unlock(index); 15 } 16 return FULL;

Using lock-free techniques Instead of: Locking data[index], then If it is NULL, writing to it Use CAS to atomically change data[index] from NULL to the new value If we fail, someone else inserted there If it’s our value, we return false (because the value is now already present) Otherwise, we go to the next cell and retry int insert(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 int found = data[index]; 5 if (found == key) { 7 return false; 8 } else if (found == NULL) { 9 if (CAS(&data[index], NULL, key)) { 10 return true; 11 } else if (data[index] == key) { 12 return false; 13 } 14 } 15 } 16 return FULL; Correctness is fairly similar to the lock-based algorithm (some additional reasoning, esp. when the CAS fails) Progress argument?

An alternative design: Chaining Array of concurrent linked lists (Any concurrent linked list works) Insert(7) hashes to 2 Insert(3) hashes to 5 Advantages? Disadvantages? 7 3 Insert(9) hashes to 2 Reduced need for table expansion Deletion is easy (handled by the list) 9 Lists add overhead (vs ~one CAS) Cache misses!

Hashing numeric strings with: bad hash functions Hashing numeric strings with: SDBM https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Hashing numeric strings with: bad hash functions Hashing numeric strings with: DBJ2A

Hashing numeric strings with: bad hash functions Hashing numeric strings with: FNV1

Hashing numeric strings with: Good hash functions Hashing numeric strings with: FNV1-A

Hashing numeric strings with: Good hash functions Hashing numeric strings with: Murmur2

What about deletion? Insert(7) hashes to 2 For example, in our lock-based, fixed size, probing hash table: Insert(3) hashes to 5 7 9 3 Insert(9) hashes to 2 Delete(7) hashes to 2 Incorrect! Delete(9) hashes to 2

A workaround: Tombstones Introduce a special key value called a TOMBSTONE To delete a key Instead of setting key = NULL, set it to TOMBSTONE TOMBSTONE essentially acts like a regular key (that never matches a key you are trying to find/insert/delete) Insertions must still probe past it to find NULL Deletions must still probe past it to find the desired key

How tombstones fix our problem Insert(7) hashes to 2 Insert(3) hashes to 5 7 9 3 Insert(9) hashes to 2 Delete(7) hashes to 2 Delete(9) hashes to 2

Lock-based delete with tombstones bool erase(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 lock(index); 5 int found = data[index]; 6 if (found == NULL) { 7 unlock(index); 8 return false; 9 } else if (found == key) { 10 data[index] = TOMBSTONE; 11 unlock(index); 12 return true; 13 } 14 unlock(index); 15 } 16 return false; Insert and contains remain the same They already skip past keys that do not match the argument key TOMBSTONE acts like such a key

A Lock-free version bool erase(int key) 1 int h = hash(key); 2 for (int i=0;i<capacity;++i) { 3 int index = (h+i) % capacity; 4 int found = data[index]; 5 if (found == NULL) { 6 return false; 7 } else if (found == key) { 8 return CAS(&data[index], key, TOMBSTONE); 9 } 10 } 11 return false; If CAS succeeds, we deleted key If CAS fails, another thread concurrently deleted key (return false, since we did not delete it)

Downsides of tombstones? TOMBSTONEs are never cleaned up Cleaning them up seems hard Cannot remove a TOMBSTONE until there are no keys in the table that probed past the TOMBSTONE when they were inserted Thought: if we do table expansion, there’s no need to copy over TOMBSTONEs… 9 3 Probes when 9 was inserted

Summary Unordered sets Deletion in concurrent hash tables Fixed size, probing, insert-only hash tables Lock-based (linearizability proof) and lock-free Probing vs chaining Deletion in concurrent hash tables