Lightweight Concurrency in GHC KC Sivaramakrishnan Tim Harris Simon Marlow Simon Peyton Jones 1.

Slides:



Advertisements
Similar presentations
Ch 7 B.
Advertisements

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Deadlocks, Message Passing Brief refresh from last week Tore Larsen Oct
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
CS 5204 – Operating Systems 1 Scheduler Activations.
The Kernel Abstraction. Challenge: Protection How do we execute code with restricted privileges? – Either because the code is buggy or if it might be.
Model for Supporting High Integrity and Fault Tolerance Brian Dobbing, Aonix Europe Ltd Chief Technical Consultant.
Concurrency in Shared Memory Systems Synchronization and Mutual Exclusion.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Parallelism and Concurrency Koen Lindström Claessen Chalmers University Gothenburg, Sweden Ulf Norell.
1 Threads CSCE 351: Operating System Kernels Witawas Srisa-an Chapter 4-5.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
CPU Scheduling. Schedulers Process migrates among several queues –Device queue, job queue, ready queue Scheduler selects a process to run from these queues.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Ceng Operating Systems Chapter 2.1 : Processes Process concept Process scheduling Interprocess communication Deadlocks Threads.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
Intro to OS CUCS Mossé Processes and Threads What is a process? What is a thread? What types? A program has one or more locus of execution. Each execution.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Multithreaded Programming.
Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Parallel Processing (CS526) Spring 2012(Week 8).  Thread Status.  Synchronization in Shared Memory Programming(Java threads ) ◦ Locks ◦ Barriars.
Modern Concurrency Abstractions for C# by Nick Benton, Luca Cardelli & C´EDRIC FOURNET Microsoft Research.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Chapter 4 Processes. Process: what is it? A program in execution A program in execution usually usually Can also have suspended or waiting processes Can.
Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.
Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Status of the vector transport prototype Andrei Gheata 12/12/12.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Thread Scheduling.
1 Chapter 2.1 : Processes Process concept Process concept Process scheduling Process scheduling Interprocess communication Interprocess communication Threads.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
CS333 Intro to Operating Systems Jonathan Walpole.
1 Object Oriented Programming Lecture XII Multithreading in Java, A few words about AWT and Swing, The composite design pattern.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.
1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego.
Introduction to Operating Systems and Concurrency.
QIAN XI COS597C 10/28/2010 Explicit Concurrent Programming in Haskell.
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Operating Systems: Internals and Design Principles
Processes & Threads Introduction to Operating Systems: Module 5.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
1 OS Review Processes and Threads Chi Zhang
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Managing Processors Jeff Chase Duke University. The story so far: protected CPU mode user mode kernel mode kernel “top half” kernel “bottom half” (interrupt.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.
Haskell on a Shared-Memory Multiprocessor Tim Harris Simon Marlow Simon Peyton Jones.
Critical Section Tools (HW interface) locks implemented in ISA – T&S, CAS (O.S. )Semaphores – Counting and Binary (a.k.a a mutex lock) (Augmented O.S.)
L6: Threads “the future is parallel” COMP206, Geoff Holmes and Bernhard Pfahringer.
R 1 Transactional abstractions: beyond atomic update Tim Harris r.
Scheduler activations Landon Cox March 23, What is a process? Informal A program in execution Running code + things it can read/write Process ≠
Parallelism and Concurrency Koen Lindström Claessen Chalmers University Gothenburg, Sweden Patrik Jansson.
Multithreading / Concurrency
Parallelism and Concurrency
Background on the need for Synchronization
CSE451 Basic Synchronization Spring 2001
Multithreading.
Multithreaded Programming
Thomas E. Anderson, Brian N. Bershad,
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
CS333 Intro to Operating Systems
CS 5204 Operating Systems Lecture 5
Chapter 3: Process Management
Presentation transcript:

Lightweight Concurrency in GHC KC Sivaramakrishnan Tim Harris Simon Marlow Simon Peyton Jones 1

GHC: Concurrency and Parallelism forkIO Bound threads Par Monad MVars STM Safe foreign calls Asynchronous exceptions 2

Concurrency landscape in GHC Capability 0 Capability N Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Black Holes Black Holes Safe FFI and more… preemptive, round-robin scheduler + work-sharing

Idea Haskell Code MVar+ STM+ OS Thread pool Concurrency Substrate RTS (C Code) LWT Scheduler+ Black Holes Black Holes Safe FFI Capability 0 Capability N 4 Capability 0 Capability N Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Black Holes Black Holes Safe FFI and more…

Capability 0 Capability N Capability 0 Capability N Contributions Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Haskell Code MVar+ STM+ OS Thread pool Concurrency Substrate Black Holes Black Holes Safe FFI Black Holes Black Holes Safe FFI and more… What should this be? How to unify these? Where do these live in the new design? 5 LWT Scheduler+ RTS (C Code)

Concurrency Substrate One-shot continuations (SCont) and primitive transactional memory (PTM) PTM is a bare-bones TM – Better composability than CAS PTM data PTM a data PVar a instance Monad PTM atomically :: PTM a -> IO a newPVar :: a -> PTM (PVar a) readPVar :: PVar a -> PTM a writePVar :: PVar a -> a -> PTM () SCont data SCont -- Stack Continuations newSCont :: IO () -> IO SCont switch :: (SCont -> PTM SCont) -> IO () getCurrentSCont :: PTM SCont switchTo :: SCont -> PTM () PTM data PTM a data PVar a instance Monad PTM atomically :: PTM a -> IO a newPVar :: a -> PTM (PVar a) readPVar :: PVar a -> PTM a writePVar :: PVar a -> a -> PTM () SCont data SCont -- Stack Continuations newSCont :: IO () -> IO SCont switch :: (SCont -> PTM SCont) -> IO () getCurrentSCont :: PTM SCont switchTo :: SCont -> PTM () 6

Switch switch :: (SCont -> PTM SCont) -> IO () 7 Current SCont SCont to switch to PTM!

Primitive scheduler actions – SCont {scheduleSContAction :: SCont -> PTM (), yieldControlAction :: PTM ()} – Expected from every user-level thread 8 Abstract Scheduler Interface Haskell Code MVar+ STM+ Concurrency Substrate Black Holes Black Holes Safe FFI How to unify these? LWT Scheduler+

Primitive Scheduler Actions (1) 9 scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN } otherwise -> …

Primitive Scheduler Actions (2) 10 scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN } otherwise -> … getScheduleSContAction :: SCont -> PTM (SCont -> PTM()) setScheduleSContAction :: SCont -> (SCont -> PTM()) -> PTM() getYieldControlAction :: SCont -> PTM (PTM ()) setScheduleSContAction :: SCont -> PTM () -> PTM () Substrate Primitives

Primitive Scheduler Actions (3) 11 scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN } otherwise -> … getScheduleSContAction :: SCont -> PTM (SCont -> PTM()) setScheduleSContAction :: SCont -> (SCont -> PTM()) -> PTM() getSSA = getScheduleSContAction setSSA = setScheduleScontAction getYieldControlAction :: SCont -> PTM (PTM ()) setScheduleSContAction :: SCont -> PTM () -> PTM () getYCA = getYieldControlAction setYCA = setYieldControlAction Substrate Primitives Helper functions

Building Concurrency Primitives (1) 12 yield :: IO () yield = atomically $ do s :: SCont <- getCurrentSCont -- Add current SCont to scheduler ssa :: (SCont -> PTM ()) <- getSSA s enque :: PTM () <- ssa s enque -- Switch to next scont from scheduler switchToNext :: PTM () <- getYCA s switchToNext

Building Concurrency Primitives (2) 13 forkIO :: IO () -> IO SCont forkIO f = do ns <- newSCont f atomically $ do { s :: SCont <- getCurrentSCont; -- Initialize new sconts scheduler actions ssa :: (SCont -> PTM ()) <- getSSA s; setSSA ns ssa; yca :: PTM () <- getYCA s; setYCA ns yca; -- Add to new scont current scheduler enqueAct :: PTM () <- ssa ns; enqueAct } return ns

Building MVars 14 An MVar is either empty or full and has a single hole newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup otherwise -> … atomically $ readPVar hole

Building MVars 15 An MVar is either empty or full and has a single hole Result will be here newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup otherwise -> … atomically $ readPVar hole

Building MVars 16 An MVar is either empty or full and has a single hole Result will be here If the mvar is empty (1)Append hole & wakeup info to mvar list (getSSA!) (2)Yield control to scheduler (getYCA!) If the mvar is empty (1)Append hole & wakeup info to mvar list (getSSA!) (2)Yield control to scheduler (getYCA!) newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup otherwise -> … atomically $ readPVar hole

Building MVars 17 An MVar is either empty or full and has a single hole Result will be here Wake up a pending writer, if any. wakeup is a PTM ()! MVar is scheduler agnostic! If the mvar is empty (1)Append hole & wakeup info to mvar list (getSSA!) (2)Yield control to scheduler (getYCA!) If the mvar is empty (1)Append hole & wakeup info to mvar list (getSSA!) (2)Yield control to scheduler (getYCA!) newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup otherwise -> … atomically $ readPVar hole

Interaction of C RTS and User-level scheduler Many “Events” that necessitate actions on the scheduler become apparent only in the C part of the RTS 18 Haskell Code MVar+ STM+ Concurrency Substrate LWT Scheduler+ Safe FFI Black Hole Asynchronous exceptions Finalizers

Interaction of C RTS and User-level scheduler Many “Events” that necessitate actions on the scheduler become apparent only in the C part of the RTS 19 Haskell Code MVar+ STM+ Concurrency Substrate LWT Scheduler+ Safe FFI Black Hole Asynchronous exceptions Capability X UT Pending upcall queue :: [PTM ()] Upcall Thread Finalizers Re-use primitive scheduler actions!

Blackholes 20 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked Thunk evaluating..

Blackholes 21 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked BH thunk “blackholed”

Blackholes 22 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked BH enters blackhole

Blackholes 23 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked BH

Blackholes 24 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked BH Yield control action

Blackholes 25 T1 T2 T3 Capability 0Capability 1 T T T T T T  Running  Suspended  Blocked V V Schedule SCont action finishes evaluation

Blackholes : The Problem 26 T2 BH T T T T T T  Running  Suspended  Blocked T1 Capability 0 Switch $ \T1 -> do -- return T2

Blackholes : The Problem 27 T2 BH T T T T T T  Running  Suspended  Blocked T1 Capability 0 Switch $ \T1 -> do -- return T2 enters blackhole In order to make progress, we need to resume to T2 But, in order to resume to T2, we need to resume T2 (Deadlocked!) – Can be resolved through runtime system tricks (Work in Progress!)

Conclusions Status – Mostly implemented (SConts, PTM, Simple schedulers, MVars, Safe FFI, bound threads, asynchronous exceptions, finalizers, etc.) – 2X to 3X slower on micro benchmarks (programs only doing synchronization work) To-do – Re-implement Control.Concurrent with LWC – Formal operational semantics – Building real-world programs Open questions – Hierarchical schedulers, Thread priority, load balancing, Fairness, etc. – STM on top of PTM – PTM on top of SpecTM – Integration with par/seq, evaluation strategies, etc. – and more… 28