Semi-Explicit Parallel Programming in Haskell Satnam Singh Microsoft Research Cambridge Leeds2009
0119
01 9
public class ArraySummer { private double[] a; // Encapsulated array private double sum; // Variable used to compute sum // Constructor requiring an initial value for array public ArraySummer(double[] values) { a = values; } // Method to compute the sum of segment of the array public void SumArray(int fromIndex, int toIndex, out double arraySum) { sum = 0; for (int i = fromIndex; i < toIndex; i++) sum = sum + a[i]; arraySum = sum; }
thread 1 thread 2 ThreadCreate thread.Start thread.Join
class Program { static void Main(string[] args) { const int testSize = ; double[] testValues = new double[testSize] ; for (int i = 0; i < testSize; i++) testValues[i] = i/testSize; ArraySummer summer = new ArraySummer(testValues) ; Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); double testSum ; summer.SumArray(0, testSize, out testSum); TimeSpan ts = stopWatch.Elapsed; Console.WriteLine("Sum duration (mili-seconds) = " + stopWatch.ElapsedMilliseconds); Console.WriteLine("Sum value = " + testSum); Console.ReadKey(); }
class Program { static void Main(string[] args) { const int testSize = ; double[] testValues = new double[testSize]; for (int i = 0; i < testSize; i++) testValues[i] = i / testSize; ArraySummer summer = new ArraySummer(testValues); Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); double testSumA = 0 ; double testSumB; Thread sumThread = new Thread(delegate() { summer.SumArray(0, testSize / 2, out testSumA); }); sumThread.Start(); summer.SumArray(testSize/2+1, testSize, out testSumB); sumThread.Join(); TimeSpan ts = stopWatch.Elapsed; Console.WriteLine("Sum duration (mili-seconds) = " + stopWatch.ElapsedMilliseconds); Console.WriteLine("Sum value = " + (testSumA+testSumB)); Console.ReadKey(); }
The Accidental Semi-colon ;
A ; B ; createThread (A) ; B; A B AB
Execution Model fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) “Thunk” for “fib 10” Pointer to the implementation Storage slot for the result Values for free variables
wombat and numbat wombat :: Int -> Int wombat n = 42*n numbat :: Int -> IO Int numbat n = do c <- getChar return (n + ord c) pure function side-effecting function Computation inside a ‘monad’
IO (), pronounced “IO unit” numbat :: IO () numbat = do c <- getChar putChar (chr (1 + ord c))
f (g + h) z!!2mapM f [a, b,..., g] infer type [Int] -> BoolIO String pure function deterministic stateful operation may be non-deterministic
Functional Programming to the Rescue? Why not evaluate every-sub expression of our pure functional programs in parallel? –execute each sub-expression in its own thread? The 80s dream does not work: –granularity –data-dependency
Infix Operators mod a b mod 7 3 = 1 Infix with backquotes: a `mod` b 7 `mod` 3 = 1
x `par` y x is sparked for speculative evaluation a spark can potentially be instantiated on a thread running in parallel with the parent thread x `par` y = y typically x used inside y blurRows `par` (mix blurCols blurRows)
x `par` (y + x) x y y is evaluated first x x is evaluated second x is sparked x fizzles
x `par` (y + x) x y y is evaluated on P1 x x is taken up for evaluation on P2 x is sparked on P1 P1P2
par is Not Enough pseq :: a -> b -> b pseq is strict in its first argument but not in its second argument Related function: – seq :: a -> b -> b –Strict in both arguments –Compiler may transform seq x y to seq y x –No good for controlling order for evaluation for parallel programs
Don Stewart Parallel fib with threshold cutoff = Threshold for parallel evaluation -- Sequential fib fib' :: Int -> Integer fib' 0 = 0 fib' 1 = 1 fib' n = fib' (n-1) + fib' (n-2) -- Parallel fib with thresholding fib :: Int -> Integer fib n | n < cutoff = fib' n | otherwise = r `par` (l `pseq` l + r) where l = fib (n-1) r = fib (n-2) -- Main program main = forM_ [0..45] $ \i -> printf "n=%d => %d\n" i (fib i)
Parallel fib performance
Parallel quicksort (wrong) quicksortN :: (Ord a) => [a] -> [a] quicksortN [] = [] quicksortN [x] = [x] quicksortN (x:xs) = losort `par` hisort `par` losort ++ (x:hisort) where losort = quicksortN [y|y <- xs, y < x] hisort = quicksortN [y|y = x]
What went wrong? cons cell Unevaluated thunk losort
forceList forceList :: [a] -> () forceList [] = () forceList (x:xs) = x `seq` forceList xs
Parallel quicksort (right) quicksortF [] = [] quicksortF [x] = [x] quicksortF (x:xs) = (forceList losort) `par` (forceList hisort) `par` losort ++ (x:hisort) where losort = quicksortF [y|y <- xs, y < x] hisort = quicksortF [y|y = x]
parSumArray :: Array Int Double -> Double parSumArray matrix = lhs `par` (rhs`pseq` lhs + rhs) where lhs = seqSum 0 (nrValues `div` 2) matrix rhs = seqSum (nrValues `div` 2 + 1) (nrValues-1) matrix
Strategies Haskell provides a collection of evaluation strategies for controlling the evaluation order of various data-types. Users have to define indicate how their own types are evaluated to a normal form. Algorithms + Strategy = Parallelism, P. W. Trinder, K. Hammond, H.-W. Loidl and S. L. Peyton Jones. tml/Strategies/strategies.htmlhttp:// tml/Strategies/strategies.html
Explicitly Creating Threads forkIO :: IO () -> ThreadID Creates a lightweight Haskell thread, not an operating system thread.
Inter-thread Communication putMVar :: MVar a -> IO () takeMVar :: MVar a -> IO a
MVars mv... putMVar mv v <- takeMVar mv... 52empty
Rendezvous threadA :: MVar Int -> MVar Float -> IO () threadA valueToSendMVar valueReceivedMVar = do -- some work -- new perform rendezvous by sending 72 putMVar valueToSendMVar send value v <- takeMVar valueToReadMVar putStrLn (show v)
Rendezvous threadB :: MVar Int -> MVar Float -> IO () threadB valueToReceiveMVar valueToSendMVar = do -- some work -- now perform rendezvous by waiting on value z <- takeMVar valueToReceiveMVar putMVar valueToSendMVar (1.2 * z) -- continue with other work
Rendezvous main :: IO () main = do aMVar <- newEmptyMVar bMVar <- newEmptyMVar forkIO (threadA aMVar bMVar) forkIO (threadB aMVar bMVar) threadDelay BAD!
fib again fib :: Int -> Int -- As before fibThread :: Int -> MVar Int -> IO () fibThread n resultMVar = putMVar resultMVar (fib n) sumEuler :: Int -> Int -- As before
fib fixed fibThread :: Int -> MVar Int -> IO () fibThread n resultMVar = do pseq f (return ()) putMVar resultMVar f where f = fib n
$ time fibForkIO +RTS -N1 real 0m40.473s user 0m0.000s sys 0m0.031s $ time fibForkIO +RTS -N2 real 0m38.580s user 0m0.000s sys 0m0.015s
43 “STM”s in Haskell data STM a instance Monad STM -- Monads support "do" notation and sequencing -- Exceptions throw :: Exception -> STM a catch :: STM a -> (Exception->STM a) -> STM a -- Running STM computations atomically :: STM a -> IO a retry :: STM a orElse :: STM a -> STM a -> STM a -- Transactional variables data TVar a newTVar :: a -> STM (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM ()
Transactional Memory do {...this...} orelse {...that...} tries to run “this” If “this” retries, it runs “that” instead If both retry, the do-block retries. GetEither() will thereby wait for there to be an item in either queue Q1 Q2 R void GetEither() { atomic { do { i = Q1.Get(); } orelse { i = Q2.Get(); } R.Put( i );}
ThreadScope GHC run-time can generate eventlogs. Instrument: –thread creating, start/stop, migration –GCs ThreadScope graphical viewer Q: how to mine / understand the information?
Lots Unsaid xperf / VTune correlation Verification Debugging Parallel garbage collection
Summary Three ways of writing parallel and concurrent programs in Haskell: –`par` and `pseq` (semi-explicit parallelism) –Mvars (explicit concurrency) –STM (explicit concurrency with transactions) Implicit concurrency Pure functional programming has pros and cons for parallel programming. Can mainstream languages take advantage of the same techniques? How can visualization help with performance tuning?