Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.
Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Scalable and Lock-Free Concurrent Dictionaries
(C) Ph. Tsigas © Ph. Tsigas Algorithm Engineering of Parallel Algorithms and Parallel Data Structures Philippas Tsigas.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Håkan Sundell, Chalmers University of Technology 1 Space Efficient Wait-free Buffer Sharing in Multiprocessor Real-time Systems Based.
Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
CS6963 L19: Dynamic Task Queues and More Synchronization.
Computer Systems/Operating Systems - Class 8
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Håkan Sundell, Chalmers University of Technology 1 Using Timing Information on Wait-Free Algorithms in Real-Time Systems (2 papers)
Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.
Concurrency, Mutual Exclusion and Synchronization.
Håkan Sundell, Chalmers University of Technology 1 Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
CSE 425: Concurrency II Semaphores and Mutexes Can avoid bad inter-leavings by acquiring locks –Guard access to a shared resource to take turns using it.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,
Java Thread and Memory Model
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Module 6: Process Synchronization Background The.
Department of Computer Science and Software Engineering
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
2005MEE Software Engineering Lecture 7 –Stacks, Queues.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Håkan Sundell Philippas Tsigas
Chapter 5: Process Synchronization
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
Chapter 4: Threads.
Chapter 4: Threads.
Yiannis Nikolakopoulos
NOBLE: A Non-Blocking Inter-Process Communication Library
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
CSE 153 Design of Operating Systems Winter 19
Presentation transcript:

Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas Tsigas Computing Science Chalmers University of Technology

Håkan Sundell, Chalmers University of Technology 2 Systems Multi-processor systems: cache-coherent shared memory –UMA –NUMA Desktop computers

Håkan Sundell, Chalmers University of Technology 3 Synchronization A significant part of the work performed by today’s parallel applications is spent on synchronization Mutual exclusion (Locks) –Blocking –Convoy effects –Deadlocks

Håkan Sundell, Chalmers University of Technology 4 Convoy effects The slowdown of one process may cause the whole system to slowdown

Håkan Sundell, Chalmers University of Technology 5 Research Non-blocking synchronization has been researched since the 70’s –Lock-free –Wait-free Non-blocking are based on usage of –atomic synchronization primitives –shared memory

Håkan Sundell, Chalmers University of Technology 6 Non-blocking Synchronization Lock-Free Synchronization –Retries until not interfered by other operations Usually detecting interference by using some kind of shared variable indicating busy-state or similar. –Guarantees live-ness but not starvation-free. Change flag to unique value, or remember current state... do the operation while preserving the active structure... Check for same value or state and then validate changes, otherwise retry

Håkan Sundell, Chalmers University of Technology 7 Non-blocking Synchronization Wait-free synchronization –All concurrent operations can proceed independently of the others. –Every process always finishes the protocol in a bounded number of steps, regardless of interleaving –No starvation

Håkan Sundell, Chalmers University of Technology 8 Practice Non-blocking synchronization is still not used in many practical applications Non-blocking solutions are often –complex –having non-standard or un-clear interfaces –non-practical Many results show that non-blocking improves the performance of parallel applications significantly… ? ?

Håkan Sundell, Chalmers University of Technology 9 Non-blocking Synchronization – Practice P. Tsigas, Y. Zhang “Evaluating the Performance of Non-Blocking Synchronization on Modern Shared Memory Multiprocessors”, ACM Sigmetrics 2001

Håkan Sundell, Chalmers University of Technology 10 Schedule –Goals –Design –Examples –Experiments –Status –Conclusions and Future work NOBLE: Brings Non-blocking closer to Practice

Håkan Sundell, Chalmers University of Technology 11 Goals Create a non-blocking inter-process communication interface that have these properties: –Attractive functionality –Programmer friendly –Easy to adapt existing solutions –Efficient –Portable –Adaptable for different programming languages

Håkan Sundell, Chalmers University of Technology 12 Design: Attractive functionality Data structures for multi-threaded usage –Queues. –Stacks. –Singly linked lists. –Snapshots. Data structures for multi-process usage –Shared Register. Clear specifications enqueue and dequeue push and pop first, next, insert, delete and read update and scan read and write

Håkan Sundell, Chalmers University of Technology 13 Design: Programmer friendly Hide the complexity as much as possible! Just one include file Simple naming convention: Every function is beginning with the NBL characters #include NBLQueueEnqueue() NBLQueueDequeue() …

Håkan Sundell, Chalmers University of Technology 14 Design: Easy to adapt solutions Support lock-based as well as non-blocking solutions. Several different create functions Unified functions for the operations, independent of the synchronization method NBLQueue *NBLQueueCreateLF(); NBLQueue *NBLQueueCreateLB(); NBLQueueFree(handle); NBLQueueEnqueue(handle,item); NBLQueueDequeue(handle);

Håkan Sundell, Chalmers University of Technology 15 Design: Efficient To minimize overhead, usage of function pointers In-line redirection typedef struct NBLQueue { void *data; void (*free)(void *data); void (*enqueue)(void *data,void *item); void *(*dequeue)(void *data); } NBLQueue; #define NBLQueueFree(handle) (handle->free(handle->data)) #define NBLQueueEnqueue(handle,item) (handle-> enqueue(handle->data,item)) #define NBLQueueDequeue(handle) (handle->dequeue(handle->data))

Håkan Sundell, Chalmers University of Technology 16 Design: Portable #define NBL... Noble.h #include “Platform/Primitives.h” … QueueLF.c #include “Platform/Primitives.h” … StackLF.c CAS, TAS, Spin-Locks … SunHardware.asm CAS, TAS, Spin-Locks... IntelHardware.asm... Platform dependent Platform in-dependent Exported definitions Identical on all platforms

Håkan Sundell, Chalmers University of Technology 17 Design: Adaptable for different programming languages Implemented in C, all compiled into a library file. C++ compatible include files and easy to make C++ wrappers class NOBLEQueue { private: NBLQueue* queue; public: NOBLEQueue(int type) {if(type==NBL_LOCKFREE) queue=NBLQueueCreateLF(); else … } ~NOBLEQueue() {NBLQueueFree(queue);} inline void Enqueue(void *item) {NBLQueueEnqueue(queue,item);}...

Håkan Sundell, Chalmers University of Technology 18 Examples When the data structure is not in use anymore: stack=NBLStackCreateLF(10000);... NBLStackFree(stack); Main NBLStackPush(stack, item); or item=NBLStackPop(stack); Threads #include... NBLStack* stack; Globals First create a global variable handling the shared data object, for example a stack: Create the stack with the appropriate implementation: When some thread wants to do some operation:

Håkan Sundell, Chalmers University of Technology 19 Examples stack=NBLStackCreateLB();... NBLStackFree(stack); Main NBLStackPush(stack, item); or item=NBLStackPop(stack); Threads #include... NBLStack* stack; Globals To change the synchronization mechanism, only one line of code has to be changed!

Håkan Sundell, Chalmers University of Technology 20 Experiment Set of random operations performed multithreaded on each data structure, with either low or high contention Comparing the different synchronization mechanisms and implementations available Varying number of threads from 1 – 30 Performed on multiprocessors: –Sun Enterprise with 64 CPUs, Solaris –Compaq PC with 2 CPUs, Win32

Håkan Sundell, Chalmers University of Technology 21 Experiments: Linked List Lock-Free nr.1 – J. Valois “Lock-Free Data Structures” Ph.D-thesis Lock-Free nr.2 - T. Harris “A Pragmatic Implementation of Non-Blocking Linked Lists.” 2001 Symposium on Distributed Computing. Lock-Based – Spin-locks (Test-And-Set).

Håkan Sundell, Chalmers University of Technology 22 Experiments: Linked List (high)

Håkan Sundell, Chalmers University of Technology 23 Experiments: Linked List (low)

Håkan Sundell, Chalmers University of Technology 24 Experiments: Linked List (high) - Threads

Håkan Sundell, Chalmers University of Technology 25 Experiments: Queues Lock-Free nr.1 – J. Valois “Lock-Free Data Structures” Ph.D-thesis Lock-Free nr.2 - P. Tsigas, Y. Zhang “A Simple, Fast and Scalable Non-Blocking Concurrent FIFO queue for Shared Memory Multiprocessor Systems”, ACM SPAA’01, Lock-Based – Spin-locks (Test-And-Set).

Håkan Sundell, Chalmers University of Technology 26 Experiments: Queues (high)

Håkan Sundell, Chalmers University of Technology 27 Experiments: Queues (low)

Håkan Sundell, Chalmers University of Technology 28 Experiments: Queues (high) - Threads

Håkan Sundell, Chalmers University of Technology 29 Status Multiprocessor support –Sun Solaris (Sparc) –Win32 (Intel x86) –SGI (Mips) – Testing phase –Linux (Intel x86) – Testing phase Extensive Manual Web site up and running,

Håkan Sundell, Chalmers University of Technology 30 Conclusions and Future work NOBLE: Easy to use, efficient and portable Non-blocking protocols always performs better than or similar to lock-based, especially on multi- processor systems. To do: –Use in real parallel applications –Extend with more shared data object implementations –Extend to other platforms, especially suitable for real- time systems