Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.

Slides:



Advertisements
Similar presentations
Synchronization NOTE to instructors: it is helpful to walk through an example such as readers/writers locks for illustrating the use of condition variables.
Advertisements

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
A Mile-High View of Concurrent Algorithms Hagit Attiya Technion.
CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman
Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
Locality in Concurrent Data Structures Hagit Attiya Technion.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Threads and Critical Sections Vivek Pai / Kai Li Princeton University.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
CS510 Concurrent Systems Introduction to Concurrency.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Concurrency, Mutual Exclusion and Synchronization.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
DISTRIBUTED SYSTEMS II REPLICATION CNT. Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Håkan Sundell, Chalmers University of Technology 1 Simple and Fast Wait-Free Snapshots for Real-Time Systems Håkan Sundell Philippas.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
6.852: Distributed Algorithms Spring, 2008 Class 13.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
CSE 425: Concurrency II Semaphores and Mutexes Can avoid bad inter-leavings by acquiring locks –Guard access to a shared resource to take turns using it.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
1 Interprocess Communication (IPC) - Outline Problem: Race condition Solution: Mutual exclusion –Disabling interrupts; –Lock variables; –Strict alternation.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-5 Process Synchronization Department of Computer Science and Software.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
CS 153 Design of Operating Systems Winter 2016 Lecture 7: Synchronization.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Agenda  Quick Review  Finish Introduction  Java Threads.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Background on the need for Synchronization
Håkan Sundell Philippas Tsigas
Memory Consistency Models
Memory Consistency Models
A Lock-Free Algorithm for Concurrent Bags
Concurrency: Mutual Exclusion and Process Synchronization
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Chapter 6: Synchronization Tools
CSE 332: Concurrency and Locks
Process/Thread Synchronization (Part 2)
Presentation transcript:

Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas

2 WHY PARALLEL PROGRAMMING IS ESSSENTIAL IN DISTRIBUTED SYSTEMS AND NETWORKING © Philippas Tsigas

3 Philippas Tsigas How did we reach there? Picture from Pat Gelsinger, Intel Developer Forum, Spring 2004 (Pentium at 90W)

4 3GHz Concurrent Software Becomes Essential 3GHz 6GHz 3GHz 12GHz 24GHz 1 Core 2 Cores 4 Cores 8 Cores Our work is to help the programmer to develop efficient parallel programs but also survive the multicore transition. © Philippas Tsigas 1) Scalability becomes an issue for all software. 2) Modern software development relies on the ability to compose libraries into larger programs.

5 DISTRIBUTED APPLICATIONS © Philippas Tsigas

6 Data Sharing: Gameplay Simulation as an example This is the hardest problem…  10,000’s of objects  Each one contains mutable state  Each one updated 30 times per second  Each update touches 5-10 other objects Manual synchronization (shared state concurrency) is hopelessly intractable here. Solutions? Slide: Tim Sweeney CEO Epic Games POPL 2006 © Philippas Tsigas

7 Distributed Applications Demand Quite High Level Data Sharing:  Commercial computing (media and information processing)  Control Computing (on board flight-control system) © Philippas Tsigas

8 NETWORKING © Philippas Tsigas

9 40 multithreaded packet-processing engines embedded-video/routers/popup.html  On chip, there are bit, 1.2-GHz packet-processing engines. Each engine works on a packet from birth to death within the Aggregation Services Router.  each multithreaded engine handles four threads (each thread handles one packet at a time) so each QuantumFlow Processor chip has the ability to work on 160 packets concurrently © Philippas Tsigas

10 DATA SHARING © Philippas Tsigas

11 Philippas Tsigas Blocking Data Sharing class Counter { int next = 0; synchronized int getNumber () { int t; t = next; next = t + 1; return t; } next = 0 A typical Counter Impl: Thread1: getNumber() t = 0 Thread2: getNumber() result=0 Lock released Lock acquired result=1 next = 1next = 2

12 Philippas Tsigas Do we need Synchronization? class Counter { int next = 0; int getNumber () { int t; t = next; next = t + 1; return t; } What can go wrong here? next = 0 Thread1: getNumber() t = 0 Thread2: getNumber() t = 0 result=0 next = 1 result=0 

13 Blocking Synchronization = Sequential Behavior © Philippas Tsigas

14 BS ->Priority Inversion  A high priority task is delayed due to a low priority task holding a shared resource. The low priority task is delayed due to a medium priority task executing.  Solutions: Priority inheritance protocols  Works ok for single processors, but for multiple processors … Task H: Task M: Task L: © Philippas Tsigas

15 Critical Sections + Multiprocessors  Reduced Parallelism. Several tasks with overlapping critical sections will cause waiting processors to go idle. Task 1: Task 2: Task 3: Task 4: © Philippas Tsigas

16  The BIGEST Problem with Locks? Blocking Locks are not composable All code that accesses a piece of shared state must know and obey the locking convention, regardless of who wrote the code or where it resides. © Philippas Tsigas

17 Interprocess Synchronization = Data Sharing  Synchronization is required for concurrency  Mutual exclusion (Semaphores, mutexes, spin-locks, disabling interrupts: Protects critical sections) - Locks limits concurrency - Busy waiting – repeated checks to see if lock has been released or not - Convoying – processes stack up before locks - Blocking Locks are not composable - All code that accesses a piece of shared state must know and obey the locking convention, regardless of who wrote the code or where it resides.  A better approach is to use data structures that are …

18 Philippas Tsigas A Lock-free Implementation

19 LET US START FROM THE BEGINING © Philippas Tsigas

20  Object in shared memory - Supports some set of operations (ADT) - Concurrent access by many processes/threads - Useful to e.g.  Exchange data between threads  Coordinate thread activities Shared Memory Data-structures P1 P2 P3 Op A Op B P4 Op B Op A

21 Borrowed from H. Attiya 21 Executing Operations P1P1 invocationresponse P2P2 P3P3

22 Interleaving Operations Concurrent execution

23 Interleaving Operations (External) behavior

24 Interleaving Operations, or Not Sequential execution

25 May 1, 2008 Concurrent COVA 25 Interleaving Operations, or Not Sequential behavior: invocations & response alternate and match (on process & object) Sequential specification: All the legal sequential behaviors, satisfying the semantics of the ADT - E.g., for a (LIFO) stack: pop returns the last item pushed

26 May 1, 2008 Concurrent COVA 26 Correctness: Sequential consistency [Lamport, 1979]  For every concurrent execution there is a sequential execution that - Contains the same operations - Is legal (obeys the sequential specification) - Preserves the order of operations by the same process

27 Sequential Consistency: Examples push(4) pop():4push(7) Concurrent (LIFO) stack push(4) pop():4push(7) Last In First Out

28 Sequential Consistency: Examples push(4) pop():7push(7) Concurrent (LIFO) stack Last In First Out

29 Safety: Linearizability  Linearizable data structure - Sequential specification defines legal sequential executions - Concurrent operations allowed to be interleaved - Operations appear to execute atomically  External observer gets the illusion that each operation takes effect instantaneously at some point between its invocation and its response time push(4) pop():4push(7) push(4) pop():4 push(7) Last In First Out concurrent LIFO stack T1T1 T2T2

30 Safety II An accessible node is never freed.

31 Liveness Non-blocking implementations - Wait-free implementation of a DS [Lamport, 1977]  Every operation finishes in a finite number of its own steps. - Lock-free (≠ FREE of LOCKS) implementation of a DS [Lamport, 1977]  At least one operation (from a set of concurrent operation) finishes in a finite number of steps (the data structure as a system always make progress)

32 Liveness II  every garbage node is eventually collected

33 Abstract Data Types (ADT)  Cover most concurrent applications  At least encapsulate their data needs  An object-oriented programming point of view  Abstract representation of data & set of methods (operations) for accessing it  Signature  Specification data

34 Implementing High-Level ADT data Using lower-level ADTs & procedures

35 Lower-Level Operations  High-level operations translate into primitives on base objects that are available on H/W  Obvious: read, write  Common: compare&swap (CAS), LL/SC, FAA

36 Our Research: Non-Blocking Synchronization for Accessing Shared Data We are trying to design efficient non-blocking implementations of building blocks that are used in concurrent software design for data sharing.