Expander: Lock-free Cache for a Concurrent Data Structure

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Wait-Free Queues with Multiple Enqueuers and Dequeuers
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AAA A A A AA A Proving that non-blocking algorithms don't block.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
CS252: Systems Programming Ninghui Li Program Interview Questions.
Chapter 4 : File Systems What is a file system?
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Previously… Processes –Process States –Context Switching –Process Queues Threads –Thread Mappings Scheduling –FCFS –SJF –Priority scheduling –Round Robin.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Memory management.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
G Robert Grimm New York University Scheduler Activations.
Simple, Fast and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Maged M. Michael & Michael L. Scott Presented by Ahmed Badran.
Company LOGO Lock-free and Wait-free Slot Scheduling Algorithms Pooja Aggarwal Smruti R. Sarangi Computer Science, IIT Delhi, India 1.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Lecture by: Prof. Pooja Vaishnav.  Language Processor implementations are highly influenced by the kind of storage structure used for program variables.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
IBM’s OS/2 by Chris Axford Chris Evans Elizabeth McGinnis Erik Swensson.
Novel Paradigms of Parallel Programming
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Copyright ©: Nahrstedt, Angrave, Abdelzaher
CPS216: Data-intensive Computing Systems
Model and complexity Many measures Space complexity Time complexity
Hazard Pointers C++ Memory Ordering Issues
Processes and Threads Processes and their scheduling
Håkan Sundell Philippas Tsigas
Department of Computer Science, University of Rochester
A Lock-Free Algorithm for Concurrent Bags
Lecture 45 Syed Mansoor Sarwar
Practical Non-blocking Unordered Lists
Anders Gidenstam Håkan Sundell Philippas Tsigas
Concurrent Data Structures Concurrent Algorithms 2016
Concurrent Data Structures Concurrent Algorithms 2017
Designing Parallel Algorithms (Synchronization)
Multicore programming
Yiannis Nikolakopoulos
Lecture 2 Part 2 Process Synchronization
Multicore programming
Software Transactional Memory Should Not be Obstruction-Free
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
Multicore programming
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
Multicore programming
Multicore programming
CSE 332: Concurrency and Locks
The Gamma Database Machine Project
Mr. M. D. Jamadar Assistant Professor
MV-RLU: Scaling Read-Log-Update with Multi-Versioning
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Expander: Lock-free Cache for a Concurrent Data Structure Pooja Aggarwal (IBM Research, bangalore) Smruti R. Sarangi (IIT Delhi)

Each thread executes a method on the object Concurrent Object Each thread executes a method on the object Threads Concurrent Object method request response method Thread 1 Thread 2 Linearizability Thread 3 Timeline

Blocking vs Non-blocking Algorithms Blocking algorithm with a lock Non-blocking algorithm lock val = acc.balance acc.balance += 100 unlock return val val = acc.balance. fetchAndAdd (100) return val

Few Definitions Blocking algorithm with locks Lock-free algorithms Only one thread is allowed to hold a lock and enter the critical section Lock-free algorithms There are no locks At any point of time at least one thread makes progress and completes the operation Wait-free algorithms Every request completes in a finite amount of time.

Comparison Between Locks, Lock-free, and Wait-free Finite Execution Time Disjoint Access Parallelism Starvation Freedom With Locks Lock-free Wait-free

Why not make all algorithms wait-free? Locks Lock-free Wait-free Easy to program Starvation, Blocking, No disjoint access parallelism Need a black belt in programming

Example of Temporary Fields while(true) { val = acc.balance; newval = val + 100; if (CAS (acc.balance, val, newval)) break; } while(true) { <val, ts> = acc.balance; newval = val + 100; newts = ts + 1; if (CAS (acc.balance, <val,ts>, <newval,newts>)) break; } CAS  CompareAndSet CAS (val, old, new)  if (val == old) val = new; Temporary Field value timestamp balance

Packing Temporary Fields Value Field1 Field2 Field3 Memory Word Size of a field is restricted by the size of the memory word

Redirection Value Pointer Field1 Field2 Field3 Lot of memory wastage. Object Value Pointer Field1 Field2 Field3 Lot of memory wastage.

Examples of Packing and Redirection Paper Authors Temporary Fields Wait-free multiword CAS Sundell index, thread id, descriptor Universal construction Anderson et al. thread id, valid bit, count Wait-free slot scheduler Aggarwal et al. request id, thread id, round, timestamp, slot number, state Redirection Paper Authors Temporary Fields Wait-free queue Kogan et al. enqueue id, dequeue id Wait-free priority queue Israeli et al. value, type, freeze bit Wait-free linked list Timnat et al. mark bit and success bit

Idea of the Expander Program Memory Space Expander The Expander works as a cache It caches the value of a memory word and the temporary fields If the word is not required, its value is flushed to memory, and the temporary fields are removed Advantages: Any number of temporary fields with arbitrary sizes Makes packing feasible, and eliminates the memory overhead of redirection

Design of the Expander Node of type MemCell Memory Word value Hash memIndex dataState tmpFields Hash Table (listHead) next version state reference timestamp mark bit

Basic Operations kGet  Get the value of a memory word along with temporary fields kSet  Set the value of a memory word and its fields kCAS  Compare the value and all the fields, and if all of the match, then do a kSet free  Remove the entry of the Expander

FSM of an Expander’s Node kSet/kCAS CLEAN DIRTY kGet kSet/kCAS/kGet free free kGet FLUSH WRITEBACK kSet/kCAS/kGet

kGet Is the word there in the Expander? No Yes Return the value and the temporary fields Read the value from memory Use default values for the temporary fields

kCAS node  lookUpOrAllocate() do all the values/fields match? Yes No node.state == FLUSH help delete No Yes set a new dataState create a new version set state to DIRTY return false if not successful return false else return true

if state == WRITEBACK or FLUSH free if state == WRITEBACK or FLUSH Yes No return false state = WRITEBACK write to memory state = FLUSH help delete

Proofs and Example All methods are linearizable and lock-free If we consider a special set of wait-free algorithms where a request is guaranteed to complete if at least one thread handling it has completed N steps  the algorithm remains wait-free with the Expander The paper shows a wait-free queue with the Expander – code changes in only 8 lines

Evaluation Setup Details of the Machine Software 1-6 temporary fields Dell PowerEdge R820 Server Four socket, each socket has an 16-thread 2.2 GHz Xeon Chip 16 MB L2 cache, and 64 GB main memory Software Ubuntu 12.10 and Java 1.7 Use Java’s built-in totalMemory and freeMemory calls 1-6 temporary fields Additional: 1-62 bits Packing Redirection Wait-free multi-word compare-and-set Wait-free slot scheduler Slot scheduler for SSD devices (RADIR) Wait-free queue Lock-free linked list and binary search tree Lock-free skiplist

Slowdown (with 64 threads)

Reduction in Memory Usage (Redirection) 5-35% reduction in the memory footprint

Conclusions The Expander has many advantages Makes algorithms with packing feasible Reduces the memory footprint of algorithms with redirection by up to 35% Minimal modifications in the code Most wait-free algorithms remain wait-free with the Expander