Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL
Introduction multi-core programming – not straightforward need better higher order abstractions libraries and tools have only begun using these new capabilites collections - everywhere
Goals efficient parallel implementations of most collection methods find common abstractions needed to implement them retain consistency with existing collection framework smoothly integrate new methods into existing framework
Scala Collection Framework most operations implemented in terms of an abstract method def foreach[U](f: T => U): Unit new collections are created using builders trait Builder[Elem, To]
Example the filter method: def filter(p: A => Boolean): Repr = { val b = newBuilder for (x <- this) if (p(x)) b += x b.result } List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0) Nil 246 Builder
Parallel operations parallel traversal should be easy for some data structures could filter be parallelized by having a concurrent builder? 3 problems: – order may not be preserved anymore – sequences? – performance concerns – there are more complicated methods such as span
Method span prefixElemssuffixElems um... not a good idea assume an array (keep it simple) array.span(_ >= 0)
Method reduce span seems inherently sequential we’ll get back to this, let’s try something simpler – reduce def reduce[U >: T](op: (U, U) => U): U takes an associative operator and applies it between all the elements (examples: adding, concatenation)
family to use Scala.Tell your friends and Scala.andusefriendstoyourfamilyTell Method reduce assume associative operator is concatenation val s = “Tell your friends and family to use Scala.” s.split(“ ”).toArray.reduce(_ + _) TellyourfriendsandfamilytouseScala. +
Method reduce we might have more processors this is a well known pattern from parallel programming but, we need a right abstraction
Method split we can implement methods such as reduce, foreach, count, find and forall assuming we can divide the collection new abstract operation def split: Seq[Repr] returns a non-trivial partition of the collection
Method split def split: Seq[Repr] how to implement? – copy elements – produce a wrapper – use data structure properties (e.g. tree)
Method filter this abstract method can be used to implement accessor methods for transformer methods such as filter this is not sufficient – collection results should be merged 1, 2, 3, 45, 6, 7, 8 2, 46, 8 2, 4, 6, 8 3, 1, 8, 02, 2, 1, 9 8, 02, 2 8, 0, 2, 2 2, 4, 6, 8, 8, 0, 2, 2
Method combine we need another abstraction def combine[Other >: Repr] (that: Other): Other creates a collection that contains all the elements of this collection and that collection
Method combine def combine[Other >: Repr] (that: Other): Other how to implement? – copy elements – use lazy evaluation to copy twice – use specialized data structures
Lazy collection evaluation merge occurs more than once each processor adds results to its own builder evaluation occurs in the root 1, 2, 3, 45, 6, 7, 8 2, 46, 8 3, 1, 8, 02, 2, 1, 9 8, 02, 2 merge copy allocate
Lazy collection evaluation advantages: – easier to apply to existing collections – for certain data structures copying is cheap (arrays) – merging is very cheap disadvantages: – copying occurs twice – affects cheap operations – garbage collection occurs more often
Specialized data structures some data structures such can be merged efficiently (trees, heaps, skiplists…) immutable vectors – immutable sequences with efficient splitting and concatenation
Method span each processors keeps 2 builders merge has 2 cases – counterexample in the left partition – no counterexample in the left partition
Method find some methods don’t always traverse the entire collection Array(1, 4, 9, 16, 9, 4, 1, 0).find(_ > 10) Some 16 in a parallel implementation, other processors should be informed that an element was found
Signalling trait inherited by all parallel collections allows processors to send signals contains an abort flag which is periodically checked – implemented as a volatile field Signalling Some 16
Signalling trait the abort flag can be used signal other processors they should stop it can be used for find, exists, forall, sameElements, … what about takeWhile ? array.takeWhile(_ < 100)
Signalling trait need to convey information about where the element has been found atomic index flag using compare and swap changes are monotonic! Signalling 9MAX
Load balancing processor availability and data processing cost may not be uniform fine grained division – more tasks than processors Done!
Work-stealing need to schedule tasks to processors – work stealing each processor has a task queue when it runs out of tasks – it steals from other queues proc 1proc 2 steal!
Adaptive work-stealing still, a large number of tasks can lead to an overhead adaptive partitioning
Adaptive work-stealing ensures better load balancing proc 1proc 2 steal!
Package hierarchy subpackage of collection package collection mutableimmutableparallel mutableimmutable
Class hierarchy consistent with existing collections clients can refer to parallel collections transparently Iterable MapSeqSetParallelIterable ParallelMapParallelSeqParallelSet
How to use be aware of side-effects var k = 0 array.foreach(k += _) parallel collections are not concurrent collections careful with small collections – cost of setup may be higher
How to use parallel ranges – a way to parallelize for-loops for (i <- (0 until 1000).par) yield { var num = i var lst: List[Int] = Nil while (num > 0) { lst ::= num % 2 num = num / 2 } lst }
Benchmarks microbenchmarks with low cost per- element operations foreach Sequential1227 ParallelArray Extra reduce Sequential949 ParallelArray Extra
Benchmarks microbenchmarks with low cost per- element operations filter Sequential611 ParallelArray Extra find Sequential1181 ParallelArray Extra
Current state an array - ParallelArray ranges - ParallelRange views - ParallelView working on – ParallelVector and ParallelHashMap
Conclusion good performance results nice integration with existing collections more parallel collections worked on will be integrated into Scala 2.8.1