Transactional Libraries

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Guy Golan-GuetaTel-Aviv University Nathan Bronson Stanford University Alex Aiken Stanford University G. Ramalingam Microsoft Research Mooly Sagiv Tel-Aviv.
Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Impossibilities for Disjoint-Access Parallel Transactional Memory : Alessia Milani [Guerraoui & Kapalka, SPAA 08] [Attiya, Hillel & Milani, SPAA 09]
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Presented by: Dmitri Perelman.  Intro  “Don’t touch my read-set” approach  “Precedence graphs” approach  On avoiding spare aborts  Your questions.
IDIT KEIDAR DMITRI PERELMAN RUI FAN EuroTM 2011 Maintaining Multiple Versions in Software Transactional Memory 1.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Evaluating Database-Oriented Replication Schemes in Software Transacional Memory Systems Roberto Palmieri Francesco Quaglia (La Sapienza, University of.
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Elastic Transactions Unleashing Concurrency of Transactional Memory Pascal Felber Vincent Gramoli Rachid Guerraoui.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
An Introduction to Software Transactional Memory
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
WG5: Applications & Performance Evaluation Pascal Felber
Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
A Relativistic Enhancement to Software Transactional Memory Philip Howard, Jonathan Walpole.
Alpine Verification Meeting 2008 Model Checking Transactional Memories Vasu Singh (Joint work with Rachid Guerraoui, Tom Henzinger, Barbara Jobstmann)
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Adaptive Software Lock Elision
Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
Isotope: Transactional Isolation for Block Storage
Faster Data Structures in Transactional Memory using Three Paths
A Lock-Free Algorithm for Concurrent Bags
Expander: Lock-free Cache for a Concurrent Data Structure
Practical Non-blocking Unordered Lists
Anders Gidenstam Håkan Sundell Philippas Tsigas
Threads and Memory Models Hal Perkins Autumn 2011
Chapter 9: Virtual-Memory Management
Concurrent Data Structures Concurrent Algorithms 2017
A Qualitative Survey of Modern Software Transactional Memory Systems
Lecture 6: Transactions
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Does Hardware Transactional Memory Change Everything?
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Software Transactional Memory Should Not be Obstruction-Free
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
Getting to the root of concurrent binary search tree performance
CSE451 Virtual Memory Paging Autumn 2002
Deferred Runtime Pipelining for contentious multicore transactions
Lecture 1: Introduction
CSC 143 Binary Search Trees.
CSE 153 Design of Operating Systems Winter 19
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Concurrency control (OCC and MVCC)
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Transactional Libraries Alexander Spiegelman*, Guy Golan-Gueta†, and Idit Keidar†* *Technion †Yahoo Research

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Multi-Threading is Everywhere Multi core machines Too much data for sequential programs Many clients Concurrent is fast

Data Structures (DS) Essential building blocks in modern SW Map, skiplist, queue, etc.

But Are They “Thread Safe”? Correct under concurrency? OR ? לדוגמא רוצים למנוע שמכניסים שני איברים במקביל ורואים רק אחד

“Thread-Safe” Concurrent DS Libraries Widely used in real life software Numerous research papers Concurrent Skipklist [Herlihy, Lev, Luchangco, Shavit: A simple optimistic skiplist algorithm] [Fraser: Practical lock freedom] … Concurrent queue [Michael, Scott: Simple, fast, and practical nonblocking and blocking concurrent queue algorithms] [Gramoli, Guerraoui: Reusable concurrent data types] Concurrent binary tree [Bronson, Casper, Chafi, Olukotun: A practical concurrent binary search tree] [Drachsler, Vechev, Yahav: Practical concurrent binary search trees via logical ordering] API example

Concurrent Data Structure Libraries (CDSLs) Each operation executes atomically Custom-tailored implementation preserves semantics balancemap.get (key=Yoni) newBalance balance + deposit map.set(key=Yoni, newBalance) repQ.enq(“Yoni’s balance=new”) queue map App API example

Are They Really Thread Safe?? Oops! Atomic operations are not enough balance map.get (key=Yoni) balance map.get(key=Yoni) newBalance balance + deposit map.set(key=Yoni, newBalance) repQ.enq(“Yoni’s balance=new”) API example

Software Transactional Memory (STM) TL2, TinySTM, SwissTM,… Transactions (TXs) = atomic sections including multiple DS operations Begin_TX balancemap.get(key=Yoni) newBalance  balance + deposit map.set(key=Yoni, newBalance) repQ.enq(“Yoni’s balance=new”) End_TX STM map queue App begin end abort API example DS does not have to be concurrent.

CDSL vs STM Why? CDSL STM Performance   Exploit DS semantics Used in practice Generality Composability

STM Overhead Explained Common STM solution: Each object has a version TX reads a “consistent snapshot” Validates versions have not changed by commit time Otherwise aborts TX updates occur during commit Sources of overhead: Global version clock (GVC)  high contention Read- and write-sets  tracking and validation overhead Conflicts  aborts

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

TDSL: Bringing Transactions into CDSL queue map App STM programmability: TXs span any number of operations Legacy “singleton” operations: fast, abort-free abort end begin Transactional Library Missing: Generality of STM Custom-tailored, performance of CDSL

TDSL Benefit 1: Programmability Support for legacy code Fast, abort-free singletons Power of transactions Begin_TX valmap.get(key=Yoni) new val+deposit map.set(key=Yoni,new) repQ.enq(“Yoni’s balance=new”) End_TX queue map App Transactional Library

TDSL Benefit 2: Semantic Optimization Use known transactional solutions But, take advantage of semantics & structure of each DS to reduce aborts & overhead Reduce read-set Do some of the work non-transactionally The idea is to support TXs but avoid unnecessary aborts.

Example of Abort Reduction Two concurrent put operations Put(23) traverses the list, reads nodes that put(4) updates Conflict, STM would abort at least one But no semantic conflict 4 23 We will see in a few slides that in standard STM one of this operations will abort. 2 5 9 17 34

Example of Abort Reduction Two concurrent put operations Put(23) traverses the list, reading nodes put(4) updates Conflict, STM would abort at least one But no semantic conflict 4 23 For example: linklist We will see in a few slides that in standard STM one of this operations will abort. 2 5 9 17 34

TDSL Benefit 3: Mix & Match Maps allow lots of concurrency Amenable to optimistic synchronization Queues do not Pessimistic synchronization better STM picks one Our TDSL can mix & match pessimistic queue optimistic map App The idea is to support TXs but avoid unnecessary aborts. Transactional Library

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Skiplist Roadmap Add STM-like transaction support to simple linked list Based on TL2 STM algorithm Optimization: remove redundant validation and tracking Use semantics and structure Optimization: shorten transactions Non-transactional index Lazy GC

Step 1: Standard STM Take a simple linked list Add TL2 mechanism to support TXs: Add a version to each node Get versions from global version clock (GVC) Maintain read-set with read versions Defer updates, track in write-set To commit: validate read-set, lock write-set, increment GCV, update 2 5 9 17 34

Step 1: Standard STM insert(10): read-set write-set Local memory 2 5 9 9 10 17 Read Local memory Shared memory Validate GVC 2 5 9 17 34

Step 2: Reduce Read-Set Exploit structure and semantics insert(10): write-set 10 9 9 10 Read Local memory Shared memory Validate GVC 2 5 9 17 34

Step 3: Non-Transactional Index Imagine we could guess the right node So, we could be faster and save aborts during the traversal GVC 34 2 5 9 17

Step 3: Non-Transactional Index getSmaller Returns some node with a smaller key For performance, not much smaller Implemented as (concurrent) skiplist For example remove(n) App Index add(n) getSmaller (key)

Step 3: Non-Transactional Index insert(10): App Index 10 getSmaller(10) return 5 Start traverse from 5 After we read the global clock GVC 2 5 9 17 34

Step 3: Non-Transactional Index Updated outside the TX (if completes successfully) Reduces aborts, overhead But: May return nodes with smaller keys than predecessor  longer traversals May return removed nodes Subtle, more details in the paper

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Composition One GVC shared among all objects Commit: Lock all write sets Validate all read sets Increase GVC Update objects and release locks להגיד: מימשנו גם queue.

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Fast Abort-Free Singletons Use the index Do not increment GVC No contention As fast as in CDSL Make transactions aware of singletons using designated fields Details in the paper להזכיר למה זה חשוב לנו. ממה התחלנו בעצם..

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Composing TDSLs Extend API to support: TX-begin Divide TX-commit into three phases: TX-lock, TX-verify, and TX-finalize TX-abort Perform TX-begin and each of the commit phase in all libraries Wait for completion in all libraries before moving to the next phase If catch abort in some library, perform TX-abort in all Some semantics have to be satisfied Based on theory of PLDI’15 paper by Ziv et al. [Ziv, Aiken, Golan-Gueta, Ramalingam, Sagiv. Composing concurrency control]

Compose With Existing Libraries Modify library to support the above API E.g., TL2 STM naturally supports it E.g., use standard 2 phase locking Some of the phases can be empty E.g., empty TX-verify in pessimistic two-phase locking Together we get general transactions Ones supported fully by a custom-tailored TDSL are fast Others less so

Agenda Motivation Concurrent Data Structure Libraries (CDSLs) vs Transactional Memory Introducing: Transactional Data Structure Libraries (TDSL) Example TDSL algorithm Skiplist Composition of multiple objects Fast abort-free singletons Library composition Evaluation

Singletons As good as baseline Do not support transactions We use synchrobench. our skiplist rotating nohotspot fraser optimistic baseline Do not support transactions

Transactions We have less overhead We have less aborts משויים מול מימוש שמשתמש ב.STM בזכות חסכון ב overhead. our skiplist SeqTL2 FriendlyTL2

Transactions (Aborts) משויים מול מימוש שמשתמש ב.STM בזכות החיסכון ב aborts. our skiplist SeqTL2 FriendlyTL2

Intruder Testing multiple skiplists and queues in a real application cannot support intruder our skiplist SeqTL2 FriendlyTL2

Conclusion We introduced a new concept for concurrent programing Composable DSs supporting TX as well as fast singletons Implemented library with two DSs Map (based on skiplist) and queue We hope that the community will adopt this concept Build and use more such libraries