Håkan Sundell, Chalmers University of Technology 1 Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the.

Slides:



Advertisements
Similar presentations
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Advertisements

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Multiprocessors and Multithreading – classroom slides.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Scalable and Lock-Free Concurrent Dictionaries
(C) Ph. Tsigas © Ph. Tsigas Algorithm Engineering of Parallel Algorithms and Parallel Data Structures Philippas Tsigas.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Håkan Sundell, Chalmers University of Technology 1 Space Efficient Wait-free Buffer Sharing in Multiprocessor Real-time Systems Based.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
6: Process Synchronization 1 1 PROCESS SYNCHRONIZATION I This is about getting processes to coordinate with each other. How do processes work with resources.
Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.
1 Concurrency: Deadlock and Starvation Chapter 6.
PRASHANTHI NARAYAN NETTEM.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.
1 Concurrency: Deadlock and Starvation Chapter 6.
Introduction to Embedded Systems
Computer System Architectures Computer System Software
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Håkan Sundell, Chalmers University of Technology 1 Using Timing Information on Wait-Free Algorithms in Real-Time Systems (2 papers)
Concurrency, Mutual Exclusion and Synchronization.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
1 Contention Management and Obstruction-free Algorithms Niloufar Shafiei.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Operating Systems CSE 411 CPU Management Dec Lecture Instructor: Bhuvan Urgaonkar.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Background Computer System Architectures Computer System Software.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Big Picture Lab 4 Operating Systems C Andras Moritz
Chapter 6 Synchronization Dr. Yingwu Zhu. The Problem with Concurrent Execution Concurrent processes (& threads) often access shared data and resources.
Håkan Sundell Philippas Tsigas
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Topic 6 (Textbook - Chapter 5) Process Synchronization
Yiannis Nikolakopoulos
NOBLE: A Non-Blocking Inter-Process Communication Library
Concurrency: Mutual Exclusion and Process Synchronization
CSCI1600: Embedded and Real Time Software
Chapter 6: Synchronization Tools
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Håkan Sundell, Chalmers University of Technology 1 Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the degree of Licentiate of Philosophy Håkan Sundell Computing Science Chalmers University of Technology

Håkan Sundell, Chalmers University of Technology 2 Background ARTES project: ”Applications of wait/lock- free protocols to real-time systems” Started in March One active Ph.D.-student. Project leader: Philippas Tsigas

Håkan Sundell, Chalmers University of Technology 3 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Shared Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 4 Real-Time Systems Uni- or Multi-processor system Interconnection Network –e.g. The Controller Area Network (CAN). CPU

Håkan Sundell, Chalmers University of Technology 5 Real-Time Systems Shared Memory CPU Cache Cache bus Memory... - Uniform Memory Access (UMA) - Non-Uniform Memory Access (NUMA)

Håkan Sundell, Chalmers University of Technology 6 Real-Time Systems Cooperating Tasks –Timing Constraints Inter-task Communication: Shared Data Objects –Needs Synchronization ? ? ? T1 T2 T3

Håkan Sundell, Chalmers University of Technology 7 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Shared Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 8 Synchronization Synchronization using Locks –Uses semaphores, spinning, disabling interrupts –Negative Blocking Priority inversion Risk of deadlock –Positive Execution time guarantees easy to do, but pessimistic Take lock... do operation... Release lock

Håkan Sundell, Chalmers University of Technology 9 Non-blocking Synchronization Lock-Free Synchronization –Retries until not interfered by other operations Usually detecting interference by using some kind of shared variable indicating busy-state or similar. Change flag to unique value, or remember current state... do the operation while preserving the active structure... Check for same value or state and then validate changes, otherwise retry

Håkan Sundell, Chalmers University of Technology 10 Non-blocking Synchronization Lock-Free Synchronization –Negative No execution time guarantees, can continue forever - thus can cause starvation –Positive Avoids blocking and priority inversion Avoids deadlock Fast execution on average

Håkan Sundell, Chalmers University of Technology 11 Non-blocking Synchronization –Uses atomic synchronization primitives –Uses shared memory Wait-Free Synchronization –Always finish in a finite number of its own steps –Negative Complex algorithms Memory consuming Test&SetCompare&SwapCopyingHelpingAnnouncingSplitoperation???

Håkan Sundell, Chalmers University of Technology 12 Non-blocking Synchronization Wait-Free Synchronization –Positive Execution time guarantees Fast execution Avoids blocking and priority inversion Avoids deadlock Avoids starvation Same implementation on both single- and multiprocessor systems

Håkan Sundell, Chalmers University of Technology 13 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Shared Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 14 Shared Data Objects Correctness criteria for concurrent operations: linearizability –All concurrent executions can be transformed into an equivalent serial sequence of atomic operations preserving the partial order t Read Write titi tjtj tktk ser

Håkan Sundell, Chalmers University of Technology 15 Snapshot –A consistent momentous state of a set of several shared variables that are logically related –One reader (scanner) Reads the whole set of variables in one atomic step –Many writers (updaters) Writes to only one variable each time

Håkan Sundell, Chalmers University of Technology 16 Snapshot: Correctness Atomicity / Linearizability criteria t t Write Read Write Read YES cici cici = returned by scanner t Write Read cici NO

Håkan Sundell, Chalmers University of Technology 17 Snapshot: Correctness Atomicity / Linearizability criteria t Write Read cici NO = returned by scanner Write cici cjcj t NO

Håkan Sundell, Chalmers University of Technology 18 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 19 Used by writer Used by reader What are we evaluating Wait-free snapshot algorithm by Ermedahl et. al –3 register copies for each component –Uses the Test&Set atomic primitive for synchronization

Håkan Sundell, Chalmers University of Technology 20 Analysis Real-Time System: Measured schedulability Created “realistic” scenarios on a theoretic uni-processor system –Real RTOS parameters –Manual WCET-analysis on cycle level –1 scanner (5 components), 24 updaters (10 real-time tasks, 15 interrupts) –Fixed priority response time analysis –Schedulable without any synchronization –Adding lock/wait-free or semaphore synchronization

Håkan Sundell, Chalmers University of Technology 21 Analysis: Schedulability (%)

Håkan Sundell, Chalmers University of Technology 22 Experiments Simulation –RT-simulator written in Erlang by Ermedahl and Sjödin. Fixed priority preemptive scheduler Semaphores Messages –Subset of scenarios used in analysis

Håkan Sundell, Chalmers University of Technology 23 Experiments: Schedulability (%)

Håkan Sundell, Chalmers University of Technology 24 Experiments Multi-node: Simulation of CAN-bus 1 MHz –10 nodes connected using messages –Local snapshots on each node –1 super-snapshot task on 1 node –Subset of scenarios used for single-node analysis

Håkan Sundell, Chalmers University of Technology 25 Experiments: R snap for multi-node

Håkan Sundell, Chalmers University of Technology 26 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 27 Timing Information Previously used by Chen and Burns in –Assuming system with periodic fixed-priority scheduling –Notations from Standard Real-Time Response Time Analysis –Use information about Periods, T Worst-case Computation time, C Worst-case Response times, R

Håkan Sundell, Chalmers University of Technology 28 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 29 Snapshot Back to Basics: Unbounded Memory Protocol –The reader increases global index and scans backwards. t v????wnil v????w v????w c1c1 cici c Snapshotindex ? = previous values / nil w = writer position...

Håkan Sundell, Chalmers University of Technology 30 Snapshot Bounded Memory: Cyclical Buffers –Needed buffer length is dependent on how fast the updaters is compared to the scanner –Each component can have different buffer lengths

Håkan Sundell, Chalmers University of Technology 31 Timing Information Bounding –Needed buffer length for component k –Can be refined even further where T s is the period for the snapshot task T w is the period for the writer tasks

Håkan Sundell, Chalmers University of Technology 32 Experiments Using a Sun Enterprise multiprocessor computer 1 scanner task and 10 updater tasks, one on each CPU Comparing two wait-free snapshot algorithms –Using timing information –Using Test-and-Set synchronization

Håkan Sundell, Chalmers University of Technology 33 Experiments Scenarios with different ratios between scanner/updater: –Measuring response time for scan versus update operations Ratio500/ / / 50 50/ 50 50/ / / 500 Buffer length

Håkan Sundell, Chalmers University of Technology 34 Experiments Scan operation - Average Response Time

Håkan Sundell, Chalmers University of Technology 35 Experiments Update operation – Average Response Time

Håkan Sundell, Chalmers University of Technology 36 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Shared Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 37 Shared Register Target domain: Shared Memory (Even no cache coherency) Wait-Free Atomic Shared Buffer by Vitanyi et. al –A Matrix of 1-reader 1-writer registers –Each register contains a value/tag pair encoded as one value... R 21 R 22 … R 11 R Readers Writers R ij - written by processor i read by processor j tag value

Håkan Sundell, Chalmers University of Technology 38 Shared Register Algorithm: –Readers scans its column for highest tag and returns the corresponding value –Writers scan its column and writes the next tag together with the new value to its row Unbounded maximum size for the tag field in the value/tag pair –Assume 8 writer tasks with 10 ms period Maximum tag after one hour is which needs 22 bits!

Håkan Sundell, Chalmers University of Technology 39 Timing Information Analyzing the maximum difference between tags possible observable by a task at two consecutive invocations of the algorithm –In any possible execution: T max is the longest period R max is the longest response time T wr is the period of the writer tasks Recycling tags: –Newer tags can restart from zero when we reach a certain tag value –In order to be able to decide if newer tags are newer we need to have: v3v3 v4v4 v1v1 v2v2 0N v3v3 v4v4

Håkan Sundell, Chalmers University of Technology 40 Examples Example Task Scenario on 8 processors: Unbounded algorithm would have reached tag in one hour, needing >16 bits TaskPeriodTaskPeriod Wr11000Rd1500 Wr2900Rd2450 Wr3800Rd3400 Wr4700Rd4350 Wr5600Rd5300 Wr6500Rd6250 Wr7400Rd7200 Wr8300Rd8150

Håkan Sundell, Chalmers University of Technology 41 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 42 Background Multithreaded programming needs communication. Communicating using shared data structures like stacks, queues, lists and so on. This needs synchronization! Locks (Mutual exclusion) has several drawbacks, especially for Real-Time Systems. Non-blocking solutions are often complex to implement and have non-standard interfaces.

Håkan Sundell, Chalmers University of Technology 43 NOBLE: A Non-Blocking Inter- Process Communication Library Designed with the following properties: –Functionality – Stacks, Queues, Lists, Snapshot, Register… with clear specifications –Programmer friendly - #include, NBL –Easy to adapt existing solutions – Provides locks as well as non-blocking synchronization

Håkan Sundell, Chalmers University of Technology 44 NOBLE: A Non-Blocking Inter- Process Communication Library Designed with the following properties (cont.): –Efficient – Object oriented design “virtual functions and inheritance with base classes” in C –Portable – Modular design, platform-dependent code separated –Adaptable for different programming languages – C, C++, Standard dynamic linked library

Håkan Sundell, Chalmers University of Technology 45 Examples #include First create a global variable handling the shared data object, for example a stack: NBLStack *stack; stack=NBLCreateStackLF(10000); When some thread wants to do some operation: NBLStackPush(stack, item); or item=NBLStackPop(stack);

Håkan Sundell, Chalmers University of Technology 46 Examples When the data structure is not in use anymore: NBLStackFree(stack); To change the synchronization mechanism, only one line of code has to be changed! stack=NBLStackCreateLF(10000); replaced with stack=NBLStackCreateLB();

Håkan Sundell, Chalmers University of Technology 47 Experiment Set of random operations performed multithreaded on each data structure, with either low or high contention. Comparing the different synchronization mechanisms and implementations available. Varying number of threads from 1 – 30. Performed on multiprocessors: –Sun Enterprise with 64 CPUs, Solaris –Compaq PC with 2 CPUs, Win32

Håkan Sundell, Chalmers University of Technology 48 Experiments: Linked List (high)

Håkan Sundell, Chalmers University of Technology 49 Status Multiprocessor support –Sun Solaris (Sparc) –Win32 (Intel x86) –SGI (Mips) – Evaluation stage –Linux (Intel x86) – Evaluation stage Extensive Manual Web site up and running,

Håkan Sundell, Chalmers University of Technology 50 Schedule Introduction –Real-Time Systems –Synchronization Shared Data Objects: Snapshots –Evaluation The Effect of Using Timing Information –Snapshot –Register Software engineering part Conclusions & Future Work

Håkan Sundell, Chalmers University of Technology 51 Conclusions Contributions: –Evaluations of snapshot Non-blocking performs better than lock-based in all cases. Lock-free performs best on uni-processor systems. –The effect of using Timing Information Snapshot and Shared Register Algorithms can be simplified and increase the performance significantly. Efficient recycling of time-stamps is possible

Håkan Sundell, Chalmers University of Technology 52 Conclusions Contributions (cont.): –A library of non-blocking protocols Easy to use, efficient and portable Non-blocking protocols always performs better than lock- based, especially on multi-processor systems. Concluding judgment: –Non-blocking protocols are highly applicable to real- time systems. Lock-free protocols seems very promising and will be applicable to real-time systems with applied analysis

Håkan Sundell, Chalmers University of Technology 53 Future work NOBLE –Adapt to commercial RTOS (Enea OSE). –Extend to embedded systems Simpler uni- and multi-processor systems including 8-bit processors with/without or different support for atomic synchronization primitives. Timing Information –Create lock-free translations to fulfill real-time systems properties –General time-stamp recycling scheme –More non-blocking protocols