NB-FEB: A Universal Scalable Easy- to-Use Synchronization Primitive for Manycore Architectures Phuong H. Ha (Univ. of Tromsø, Norway) Philippas Tsigas.

Slides:



Advertisements
Similar presentations
Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures.
Advertisements

Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Impossibilities for Disjoint-Access Parallel Transactional Memory : Alessia Milani [Guerraoui & Kapalka, SPAA 08] [Attiya, Hillel & Milani, SPAA 09]
Software Transactional Memory and Conditional Critical Regions Word-Based Systems.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Performance and power consumption evaluation of concurrent queue implementations 1 Performance and power consumption evaluation of concurrent queue implementations.
Safety Definitions and Inherent Bounds of Transactional Memory Eshcar Hillel.
Inherent limitations on DAP TMs 1 Inherent Limitations on Disjoint-Access Parallel Transactional Memory Hagit Attiya, Eshcar Hillel, Alessia Milani Technion.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Lecture 36: Programming Languages & Memory Management Announcements & Review Read Ch GU1 & GU2 Cohoon & Davidson Ch 14 Reges & Stepp Lab 10 set game due.
Computer Architecture II 1 Computer architecture II Lecture 9.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
More on Locks: Case Studies
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
WG5: Applications & Performance Evaluation Pascal Felber
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
A Non-Blocking, Contention-Friendly Skip List School of Information Technologies Dr Vincent Gramoli | Lecturer Joint work with Tyler Crain (IRISA) and.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.
Java & C++ Comparisons How important are classes and objects?? What mechanisms exist for input and output?? Are references and pointers the same thing??
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Håkan Sundell Philippas Tsigas
A Lock-Free Algorithm for Concurrent Bags
Concurrent Data Structures Concurrent Algorithms 2017
Yiannis Nikolakopoulos
Design and Implementation Issues for Atomicity
Userspace Synchronization
CSE 153 Design of Operating Systems Winter 19
Presentation transcript:

NB-FEB: A Universal Scalable Easy- to-Use Synchronization Primitive for Manycore Architectures Phuong H. Ha (Univ. of Tromsø, Norway) Philippas Tsigas (Chalmers Univ., Sweden) Otto J. Anshus (Univ. of Tromsø, Norway) Presentation at OPODIS ’09 December 15-18, 2009, Nimes, France

Problem Manycores require scalable strong synchronization primitives.  Conventional strong primitives do not scale well enough for manycores [UCB Landscape].  Contention on a synchronization variable increases with the number of processing cores. OPODIS '09, Nimes, France 2 cores16 cores1000 cores

Desired features New synch. primitives for manycores should be:  Scalable 1000s of cores  Universal powerful enough to support any kind of synchronization (like CAS, LL/SC)  Feasible able to implement in hardware  Easy-to-use OPODIS '09, Nimes, France

Our main contributions A novel synch. primitve with all these features OPODIS '09, Nimes, France Non-blocking Full/Empty Bit (NB-FEB) NBFEB-STM: a non-blocking STM

Road-map NB-FEB  Feasible  Universal  Scalable  Easy-to-use NBFEB-STM: a non-blocking STM OPODIS '09, Nimes, France

Feasibility Key idea: slight modifications of a widely deployed primitive  A variant of the original FEB that always returns a value instead of waiting for a conditional flag OPODIS '09, Nimes, France Test-Flag-and-Set TFAS( x, v) { (o, flag o )  (x, flag x ); if flag x = false then (x, flag x )  (v, true); end if return (o, flag o ); } Store-And-Clear SAC( x, v) { (o, flag o )  (x, flag x ); (x, flag x )  (v, false); return (o, flag o ); } Store-And-Set SAS( x, v) { (o, flag o )  (x, flag x ); (x, flag x )  (v, true); return (o, flag o ); } Load Load( x) { return (x, flag x ); } Original FEB: Store-if-Clear-and-Set SICAS(x,v) { Wait for flag x to be false; (x, flag x )  (v, true); }

Universality Key idea: write-once objects with 3+ states  TFAS  Wait-free consensus,  n OPODIS '09, Nimes, France Decision  ( , false); TFAS_Consensus( proposal) { (first,  )  TFAS(Decision, proposal); if first =  then return proposal; else return first; }

Scalability Key idea: Combinability  eliminates contention & reduce load  Ex: TFAS OPODIS '09, Nimes, France x=  TFAS(x,1) TFAS(x,2) TFAS(x,3) TFAS(x,4) TFAS(x,1) TFAS(x,3) TFAS(x,1) x=1   1  TFAS( var x, value v) atomically { (o, flag o )  (x, flag x ); if flag x = false then (x, flag x )  (v, true); end if return (o, flag o ); } TFAS( var x, value v) atomically { (o, flag o )  (x, flag x ); if flag x = false then (x, flag x )  (v, true); end if return (o, flag o ); } Note: CAS or LL/SC is not combinable

NB-FEB combining logic OPODIS '09, Nimes, France (x, [v 1 ])Successive primitive with parameter (x, [v 2 ]) LoadSACSASTFAS Load SAC(v 2 )SAS(v 2 )TFAS(v 2 ) SACSAC(v 1 )SAC(v 2 )SAS(v 2 ) SASSAS(v 1 )SAC(v 2 )SAS(v 2 )SAS(v 1 ) TFASTFAS(v 1 )SAC(v 2 )SAS(v 2 )TFAS(v 1 )

Easy-to-use Key idea: abstractions for productivity-layer programmers  Non-blocking software transactional memory NBFEB-STM OPODIS '09, Nimes, France

Road-map NB-FEB  Feasible  Universal  Scalable  Easy-to-use NBFEB-STM: a non-blocking STM OPODIS '09, Nimes, France

NBFEB-STM Models  Objects are accessed within transactions  No nested transactions  Garbage collected programming languages (e.g. Java) Features  Obstruction-free STM  Eliminate conventional synch. hot spots in STMs  Optimal space complexity  (N) OPODIS '09, Nimes, France

Challenge 1: TFAS-SAC interleaving CAS-based STMs NBFEB-STM Need SAC to clear pointer’s flag Overlapping TFAS 1 & TFAS 2 both may succeed due to SAC’s interference.  violate TMObj’s semantics OPODIS '09, Nimes, France TMObj Old New TM 0 Old New TM 1 Copy CAS 1 Old New TM 2 Copy CAS 2 locator

Key idea 1 Keep a linked list of locators  write-once pointer next OPODIS '09, Nimes, France locator Old New TM 0  Old New TM 1  Old New TM 2   eliminate SAC interference TFAS

Challenge 2: Space complexity OPODIS '09, Nimes, France CAS-based STMsNBFEB-STM TMObj Old New TM 0 Old New TM 1 Old New TM 2 CAS 2 locator Old New TM 0 Old New TM 1 Old New TM 2

Key idea 2 Only the head is needed for further accesses  break the list of obsolete locators OPODIS '09, Nimes, France locator Old New TM 0  Old New TM 1  Old New TM 2  pipi pipi SAC  Optimal space complexity  (N)

Challenge 3: Find the head OPODIS '09, Nimes, France locator Old New TM 0  Old New TM 1  Old New TM 2  pipi pipi Head X

Key idea 3 No nested transactions  one active locator / thread OPODIS '09, Nimes, France i 0 1 … … N TMObj locator Old New TM 0  Old New TM 1  Old New TM 2  pipi pipi SAC

Correctness NBFEB-STM fulfills the essential aspects of TM [Guerraoui, PPoPP ’08]  Instantaneous commit  Precluding inconsistent views  Preserving real-time order OPODIS '09, Nimes, France

Conclusions Introduce a novel non-blocking full/empty bit primitive (NB-FEB)  Scalable, universal, feasible and easy-to-use Provide an abstraction, NBFEB-STM, built on top of the primitive. OPODIS '09, Nimes, France

Thanks for your attention!