Scala Parallel Collections Aleksandar Prokopec EPFL
Scala collections for { s <- surnames n <- names if s endsWith n } yield (n, s) McDonald
Scala collections for { s <- surnames n <- names if s endsWith n } yield (n, s) 1040 ms
Scala parallel collections for { s <- surnames n <- names if s endsWith n } yield (n, s)
Scala parallel collections for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s)
Scala parallel collections for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s) 2 cores 575 ms
Scala parallel collections for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s) 4 cores 305 ms
for comprehensions surnames.par.flatMap { s => names.par.filter(n => s endsWith n).map(n => (n, s)) }
for comprehensions nested parallelized bulk operations surnames.par.flatMap { s => names.par.filter(n => s endsWith n).map(n => (n, s)) }
Nested parallelism
Nested parallelism parallel within parallel composition surnames.par.flatMap { s => surnameToCollection(s) // may invoke parallel ops }
Nested parallelism going recursive def vowel(c: Char): Boolean =...
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield recursive algorithms
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array(""))
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array("")) 1545 ms
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray(""))
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 1 core 1575 ms
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 2 cores 809 ms
Nested parallelism going recursive def vowel(c: Char): Boolean =... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 4 cores 530 ms
So, I just use par and I’m home free?
How to think parallel
Character count use case for foldLeft val txt: String =... txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
Character count use case for foldLeft txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 } going left to right - not parallelizable! ABCDEF _ + 1
Character count use case for foldLeft txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 } going left to right – not really necessary 3210 ABC _ DEF _ + _ 6
Character count in parallel txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
Character count in parallel txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 } 3211 ABC _ ABC : (Int, Char) => Int
Character count fold not applicable txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 } 3213 ABC _ + _ ABC ! (Int, Int) => Int
Character count use case for aggregate txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)
3211 ABC Character count use case for aggregate txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _) _ + _ ABC _ + 1
Character count use case for aggregate aggregation element 3211 ABC _ + _ ABC txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _) B _ + 1
Character count use case for aggregate aggregation aggregation aggregation element 3211 ABC _ + _ ABC txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _) B _ + 1
Word count another use case for foldLeft txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
Word count initial accumulation txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) } 0 words so farlast character was a space “Folding me softly.”
Word count a space txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) } “Folding me softly.” last seen character is a space
Word count a non space txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) } “Folding me softly.” last seen character was a space – a new word
Word count a non space txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) } “Folding me softly.” last seen character wasn’t a space – no new word
Word count in parallel “softly.““Folding me “ P1P2
Word count in parallel “softly.““Folding me “ wc = 2; rs = 1wc = 1; ls = 0 P1P2
Word count in parallel “softly.““Folding me “ wc = 2; rs = 1wc = 1; ls = 0 wc = 3 P1P2
Word count must assume arbitrary partitions “g me softly.““Foldin“ wc = 1; rs = 0wc = 3; ls = 0 P1P2
Word count must assume arbitrary partitions “g me softly.““Foldin“ wc = 1; rs = 0wc = 3; ls = 0 P1P2 wc = 3
Word count initial aggregation txt.par.aggregate((0, 0, 0))
Word count initial aggregation txt.par.aggregate((0, 0, 0)) # spaces on the left# spaces on the right#words
Word count initial aggregation txt.par.aggregate((0, 0, 0)) # spaces on the left# spaces on the right#words ””
Word count aggregation aggregation... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res “““Folding me“ “softly.“““
Word count aggregation aggregation... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) “e softly.“ “Folding m“
Word count aggregation aggregation... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) “ softly.““Folding me”
Word count aggregation element txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) ”_””_” 0 words and a space – add one more space each side
Word count aggregation element txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) ” m” 0 words and a non-space – one word, no spaces on the right side
Word count aggregation element txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) ” me_” nonzero words and a space – one more space on the right side
Word count aggregation element txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) ” me sof” nonzero words, last non-space and current non-space – no change
Word count aggregation element txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) ” me s” nonzero words, last space and current non-space – one more word
Word count in parallel txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count using parallel strings? txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count string not really parallelizable scala> (txt: String).par
Word count string not really parallelizable scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)
Word count string not really parallelizable scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…) different internal representation!
Word count string not really parallelizable scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…) different internal representation! ParArray
Word count string not really parallelizable scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…) different internal representation! ParArray copy string contents into an array
Conversions going parallel // par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq} mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet}
Conversions going parallel // par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq} mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet} most other collections construct a new parallel collection!
Conversions going parallel sequentialparallel Array, ArrayBuffer, ArraySeqmutable.ParArray mutable.HashMapmutable.ParHashMap mutable.HashSetmutable.ParHashSet immutable.Vectorimmutable.ParVector immutable.Rangeimmutable.ParRange immutable.HashMapimmutable.ParHashMap immutable.HashSetimmutable.ParHashSet
Custom collections
Custom collection class ParString(val str: String)
Custom collection class ParString(val str: String) extends parallel.immutable.ParSeq[Char] {
Custom collection class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length
Custom collection class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str)
Custom collection class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter: Splitter[Char]
Custom collection class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter = new ParStringSplitter(0, str.length)
Custom collection splitter definition class ParStringSplitter(var i: Int, len: Int) extends Splitter[Char] {
Custom collection splitters are iterators class ParStringSplitter(i: Int, len: Int) extends Splitter[Char] { def hasNext = i < len def next = { val r = str.charAt(i) i += 1 r }
Custom collection splitters must be duplicated... def dup = new ParStringSplitter(i, len)
Custom collection splitters know how many elements remain... def dup = new ParStringSplitter(i, len) def remaining = len - i
Custom collection splitters can be split... def psplit(sizes: Int*): Seq[ParStringSplitter] = { val splitted = new ArrayBuffer[ParStringSplitter] for (sz <- sizes) { val next = (i + sz) min ntl splitted += new ParStringSplitter(i, next) i = next } splitted }
Word count now with parallel strings new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count performance txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) } new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) }) 100 ms cores: time: 137 ms 70 ms 35 ms
Hierarchy GenTraversable GenIterable GenSeq Traversable Iterable Seq ParIterable ParSeq
Hierarchy def nonEmpty(sq: Seq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
Hierarchy def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
Hierarchy def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res } side-effects! ArrayBuffer is not synchronized!
Hierarchy def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res } side-effects! ArrayBuffer is not synchronized! ParSeq Seq
Hierarchy def nonEmpty(sq: GenSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res.synchronized { res += s } res }
Thank you! Examples at: git://github.com/axel22/sd.git
Accessors vs. transformers some methods need more than just splitters foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, … map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Accessors vs. transformers some methods need more than just splitters foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, … map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, … These return collections!
Accessors vs. transformers some methods need more than just splitters foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, … map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, … Sequential collections – builders
Accessors vs. transformers some methods need more than just splitters foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, … map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, … Sequential collections – builders Parallel collections – combiners
Builders building a sequential collection Nil 246 ListBuilder += result
Combiners building parallel collections trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N : To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] }
Combiners building parallel collections trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N : To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] } Combiner
Combiners building parallel collections trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N : To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] } either use an efficient merge operation or do lazy evaluation
Parallel arrays 1, 2, 3, 45, 6, 7, 8 2, 46, 8 3, 1, 8, 02, 2, 1, 9 8, 02, 2 merge copy allocate
Parallel hash tables ParHashMap
Parallel hash tables ParHashMap e.g. calling filter
Parallel hash tables ParHashMap ParHashCombiner e.g. calling filter
Parallel hash tables ParHashMap ParHashCombiner
Parallel hash tables ParHashMap ParHashCombiner
Parallel hash tables ParHashMap ParHashCombiner How to merge?
Parallel hash tables buckets! ParHashCombiner ParHashMap 2 0 = = =
Parallel hash tables ParHashCombiner combine
Parallel hash tables ParHashCombiner no copying!
Parallel hash tables ParHashCombiner
Parallel hash tables ParHashMap