Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL.

Slides:



Advertisements
Similar presentations
Singly linked lists Doubly linked lists
Advertisements

Chapter 24 Lists, Stacks, and Queues
An Improved Algorithm for the Rectangle Enclosure Problem Anatoli Uchitel From an article By D.T. Lee and F.P. Preparata (March 8, 1981)
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Garbage Collection What is garbage and how can we deal with it?
Practice Quiz Question
Binary Trees. DCS – SWC 2 Binary Trees Sets and Maps in Java are also available in tree-based implementations A Tree is – in this context – a data structure.
Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out The “smallest” element.
WORK STEALING SCHEDULER 6/16/2010 Work Stealing Scheduler 1.
Lecture 8 CS203. Implementation of Data Structures 2 In the last couple of weeks, we have covered various data structures that are implemented in the.
A balanced life is a prefect life.
1 Chapter 24 Lists Stacks and Queues. 2 Objectives F To design list with interface and abstract class (§24.2). F To design and implement a dynamic list.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
. STL: C++ Standard Library. Main Ideas u General purpose: generic data structures & algorithms, templates u Flexibility: Allows for many combinations.
Priority Queues. Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out –The “smallest” element.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Scala Parallel Collections Aleksandar Prokopec EPFL.
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
1 Lecture 16: Lists and vectors Binary search, Sorting.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Data Structures Week 8 Further Data Structures The story so far  Saw some fundamental operations as well as advanced operations on arrays, stacks, and.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
C++ STL CSCI 3110.
10/26/20151 GC16/3011 Functional Programming Lecture 21 Parallel Graph Reduction.
1 Chapter 17 Object-Oriented Data Structures. 2 Objectives F To describe what a data structure is (§17.1). F To explain the limitations of arrays (§17.1).
Grouping objects Collections and iterators Objects First with Java - A Practical Introduction using BlueJ, © David J. Barnes, Michael Kölling Main.
CS 61B Data Structures and Programming Methodology July 21, 2008 David Sun.
Functions and Methods. Definitions and types A function is a piece of code that takes arguments and returns a result A pure function is a function whose.
Scala Parallel Collections Aleksandar Prokopec EPFL.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Week 13 - Friday.  What did we talk about last time?  Sorting  Insertion sort  Merge sort  Started quicksort.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Chapter 13 Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-in-first-out The “smallest”
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
CS 367 Introduction to Data Structures Lecture 8.
Reading data into sorted list Want to suck text file in and produce sorted list of the contents: Option 1 : read directly into array based list, sort afterwards.
Introduction to Objects and Encapsulation Computer Science 4 Mr. Gerb Reference: Objective: Understand Encapsulation and abstract data types.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Concurrency and Performance Based on slides by Henri Casanova.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
1 CSC 421: Algorithm Design Analysis Spring 2013 Transform & conquer  transform-and-conquer approach  presorting  balanced search trees, heaps  Horner's.
Keeping Binary Trees Sorted. Search trees Searching a binary tree is easy; it’s just a preorder traversal public BinaryTree findNode(BinaryTree node,
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
LINKED LISTS.
Priority Queues and Heaps Tom Przybylinski. Maps ● We have (key,value) pairs, called entries ● We want to store and find/remove arbitrary entries (random.
Introduction toData structures and Algorithms
Garbage Collection What is garbage and how can we deal with it?
Computer Architecture: Parallel Task Assignment
Parallel Databases.
6.001 SICP Data abstractions
Data Structures: Segment Trees, Fenwick Trees
Introduction to Spark.
Priority Queues.
Garbage Collection What is garbage and how can we deal with it?
Introduction to Computer Science
Presentation transcript:

Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL

Introduction multi-core programming – not straightforward need better higher order abstractions libraries and tools have only begun using these new capabilites collections - everywhere

Goals efficient parallel implementations of most collection methods find common abstractions needed to implement them retain consistency with existing collection framework smoothly integrate new methods into existing framework

Scala Collection Framework most operations implemented in terms of an abstract method def foreach[U](f: T => U): Unit new collections are created using builders trait Builder[Elem, To]

Example the filter method: def filter(p: A => Boolean): Repr = { val b = newBuilder for (x <- this) if (p(x)) b += x b.result } List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0) Nil 246 Builder

Parallel operations parallel traversal should be easy for some data structures could filter be parallelized by having a concurrent builder? 3 problems: – order may not be preserved anymore – sequences? – performance concerns – there are more complicated methods such as span

Method span prefixElemssuffixElems um... not a good idea assume an array (keep it simple) array.span(_ >= 0)

Method reduce span seems inherently sequential we’ll get back to this, let’s try something simpler – reduce def reduce[U >: T](op: (U, U) => U): U takes an associative operator and applies it between all the elements (examples: adding, concatenation)

family to use Scala.Tell your friends and Scala.andusefriendstoyourfamilyTell Method reduce assume associative operator is concatenation val s = “Tell your friends and family to use Scala.” s.split(“ ”).toArray.reduce(_ + _) TellyourfriendsandfamilytouseScala. +

Method reduce we might have more processors this is a well known pattern from parallel programming but, we need a right abstraction

Method split we can implement methods such as reduce, foreach, count, find and forall assuming we can divide the collection new abstract operation def split: Seq[Repr] returns a non-trivial partition of the collection

Method split def split: Seq[Repr] how to implement? – copy elements – produce a wrapper – use data structure properties (e.g. tree)

Method filter this abstract method can be used to implement accessor methods for transformer methods such as filter this is not sufficient – collection results should be merged 1, 2, 3, 45, 6, 7, 8 2, 46, 8 2, 4, 6, 8 3, 1, 8, 02, 2, 1, 9 8, 02, 2 8, 0, 2, 2 2, 4, 6, 8, 8, 0, 2, 2

Method combine we need another abstraction def combine[Other >: Repr] (that: Other): Other creates a collection that contains all the elements of this collection and that collection

Method combine def combine[Other >: Repr] (that: Other): Other how to implement? – copy elements – use lazy evaluation to copy twice – use specialized data structures

Lazy collection evaluation merge occurs more than once each processor adds results to its own builder evaluation occurs in the root 1, 2, 3, 45, 6, 7, 8 2, 46, 8 3, 1, 8, 02, 2, 1, 9 8, 02, 2 merge copy allocate

Lazy collection evaluation advantages: – easier to apply to existing collections – for certain data structures copying is cheap (arrays) – merging is very cheap disadvantages: – copying occurs twice – affects cheap operations – garbage collection occurs more often

Specialized data structures some data structures such can be merged efficiently (trees, heaps, skiplists…) immutable vectors – immutable sequences with efficient splitting and concatenation

Method span each processors keeps 2 builders merge has 2 cases – counterexample in the left partition – no counterexample in the left partition

Method find some methods don’t always traverse the entire collection Array(1, 4, 9, 16, 9, 4, 1, 0).find(_ > 10) Some 16 in a parallel implementation, other processors should be informed that an element was found

Signalling trait inherited by all parallel collections allows processors to send signals contains an abort flag which is periodically checked – implemented as a volatile field Signalling Some 16

Signalling trait the abort flag can be used signal other processors they should stop it can be used for find, exists, forall, sameElements, … what about takeWhile ? array.takeWhile(_ < 100)

Signalling trait need to convey information about where the element has been found atomic index flag using compare and swap changes are monotonic! Signalling 9MAX

Load balancing processor availability and data processing cost may not be uniform fine grained division – more tasks than processors Done!

Work-stealing need to schedule tasks to processors – work stealing each processor has a task queue when it runs out of tasks – it steals from other queues proc 1proc 2 steal!

Adaptive work-stealing still, a large number of tasks can lead to an overhead  adaptive partitioning

Adaptive work-stealing ensures better load balancing proc 1proc 2 steal!

Package hierarchy subpackage of collection package collection mutableimmutableparallel mutableimmutable

Class hierarchy consistent with existing collections clients can refer to parallel collections transparently Iterable MapSeqSetParallelIterable ParallelMapParallelSeqParallelSet

How to use be aware of side-effects var k = 0 array.foreach(k += _) parallel collections are not concurrent collections careful with small collections – cost of setup may be higher

How to use parallel ranges – a way to parallelize for-loops for (i <- (0 until 1000).par) yield { var num = i var lst: List[Int] = Nil while (num > 0) { lst ::= num % 2 num = num / 2 } lst }

Benchmarks microbenchmarks with low cost per- element operations foreach Sequential1227 ParallelArray Extra reduce Sequential949 ParallelArray Extra

Benchmarks microbenchmarks with low cost per- element operations filter Sequential611 ParallelArray Extra find Sequential1181 ParallelArray Extra

Current state an array - ParallelArray ranges - ParallelRange views - ParallelView working on – ParallelVector and ParallelHashMap

Conclusion good performance results nice integration with existing collections more parallel collections worked on will be integrated into Scala 2.8.1