Concurrency made easy(-er) Actors Concurrency made easy(-er) 13-Nov-18
The actor model Most of the problems with concurrency--from deadlocks to data corruption--result from having shared state Solution: Don’t share state! An alternative to shared state is the actor model, in which independent processes send to and receive messages from one another The actor model was developed in the Erlang language, and is being incorporated into many new languages Quoting Alex Miller, http://www.javaworld.com/javaworld/jw-02-2009/jw-02-actor-concurrency1.html?page=2: The actor model consists of a few key principles: No shared state Lightweight processes Asynchronous message-passing Mailboxes to buffer incoming messages Mailbox processing with pattern matching
Basic concepts An actor is an independent flow of control You can think of an actor as a Thread with extra features An actor does not share its data with any other process This means you can write it as a simple sequential process, and avoid a huge number of problems that result from shared state However: It is possible to share state; it’s just a very bad idea Any process can send a message to an actor with the syntax actor ! message An actor has a “mailbox” in which it receives messages An actor will process its messages one at a time, in the order that it receives them, and use pattern matching to decide what to do with each message Except: Messages which don’t match any pattern are ignored, but remain in the mailbox (this is bad) An actor doesn’t do anything unless/until it receives a message
A really short example scala> import scala.actors.Actor._ import scala.actors.Actor._ scala> val me = self me: scala.actors.Actor = scala.actors.ActorProxy@6013a567 self is a method that returns the currently executing actor Since we didn’t call self from an actor, but just from a plain old Thread, it actually returns a proxy for the Thread scala> me ! 42 Sending myself the message 42 Doesn’t wait for an answer--just continues with the next code Nothing is printed because the value of this expression is Unit scala> receive { case x => println(x) } 42 The pattern x is a simple variable, so it will match anything The message is received and printed
A longer example It's Monday and I'm working hard. It's Tuesday and I'm working hard. It's Wednesday and I'm working hard. It's Thursday and I'm working hard. Thank God it's Friday! Process .../bin/scala exited with code 0 import scala.actors.Actor import scala.actors.Actor._ object TGIF { val worker = actor { loop { receive { case "Friday" => println("Thank God it's Friday!") case "Saturday" => exit case x => println("It's " + x + " and I'm working hard.") } } } def main(args: Array[String]) { val days = "Monday Tuesday Wednesday Thursday Friday Saturday Sunday" for (day <- days.split(" ")) worker ! day } }
Actor is a trait A Scala trait is used like a Java interface You can extend only one class, but you can with any number of traits Example: class Employee extends Person with Benefits Example: class Secretary extends Employee with Actor However, if you don’t explicitly extend a class, use extends for the first trait Example: class Person extends Life with Liberty with Happiness I don’t know the reasons for this rather strange exception A trait, like an interface, can require you to supply certain methods In an Actor, you must provide def act = ...
Two ways to create an Actor You can mix in the Actor trait Example: class Secretary extends Employee with Actor Example: class Worker extends Actor Your class extends a class, but withs a Trait Exception: If you don’t explicitly extend some class, you must use extends for the first trait I have no clue what the reason is for this rule A Trait, like a Java interface, can require you to supply certain methods The Actor trait requires you to define an act method (with no parameters) You can use the actor factory method Example: val myWorker = actor { ...code for the actor to execute... } The code is what you would otherwise put in the act method
How to start an Actor When you define a object that mixes in the Actor trait, you need to start it running explicitly Example: class Worker extends Actor { ... } val worker1 = new Worker worker1 start When you use the actor factory method, the actor is started automatically An actor doesn’t have to wait for messages before it starts doing work--you can write actors that already know what to do
How to tell an Actor to do one thing Here’s an actor that does one thing, once: class Worker extends Actor { def act = receive { case true => println("I am with you 1000%.") case false => println("Absolutely not!") case _ => println("Well, it's complicated....") } } val worker = new Worker().start worker ! 43 Here’s another: val worker = actor { receive { case true => println("I am with you 1000%.") case false => println("Absolutely not!") case _ => println("Well, it's complicated....") } } worker ! 43
How to tell an Actor to do several things When an actor finishes its task, it quits To keep an actor going, put receive in a loop Example: class Counter(id: Int) extends Actor { var yes, no = 0 def act = loop { react { case true => yes += 1 case false => no += 1 case "printResults" => printf("Counter #%d got %d yes, %d no.\n", id, yes, no) case x => println("Counter " + id + " didn't understand " + x) } } } This is a special kind of loop defined in the Actor object There is also a loopWhile(condition) {...} method Other kinds of loops will work with receive (but not react)
Sending and receiving messages To send a message, use actor ! message The thread sending the message keeps going--it doesn’t wait for a response To receive a message (in an Actor), use either receive {...} or react {...} Both receive and react block until they get a message that they recognize (with case) When receive finishes, it keeps its Thread Statements following receive{...} will then be executed When react finishes, it returns its Thread to the thread pool Statements following the react{...} statement will not be executed The Actor’s variable stack will not be retained This (usually) makes react more efficient than receive Hence: Prefer react to receive, but be aware of its limitations
Waiting for a message that never comes If a (recognized) message never arrives, receive and react will block “forever” This is especially likely when waiting for a response from another computer Even on the same computer, the sending process may have crashed Two additional methods, receive (ms: Int) {...} and react (ms: Int) {...}, will time out after the given number of milliseconds if no message is received
Getting a result back from an Actor An Actor does not “return” a result, but you can ask it to send a result import scala.actors.Actor import Actor._ object SimpleActorTest { def main(args: Array[String]) { val caller = self val adder = actor { var sum = 0 loop { receive { case (x: Int, y: Int) => sum = x + y case "sum" => caller ! sum } } } adder ! (2, 2) adder ! "sum" // This must be done before calling receive! receive { case x => println("I got: " + x) } } } I got: 4
Actors and shared state There’s nothing to prevent Actors from sharing state import scala.actors.Actor import Actor._ object SimpleActorTest { def main(args: Array[String]) { var sum = 0 // this variable is modified by the actor val adder = actor { loop { receive { case (x: Int, y: Int) => sum = x + y // updating sum } } } adder ! (2, 2) println("I got: " + sum) } } But it’s not a good idea! I got: 0
Counting true/false values: Outline import scala.actors.Actor object ActorTest { def main(args: Array[String]) { // Create and start some actors // Send votes to the actors // Tell the actors to quit } class VoteCounter(id: Int) extends Actor { def act = loop { react { // Handle each case } } } }
The main method def main(args: Array[String]) { // Create and start some actors val actors = (1 to 5) map (new Counter(_)) for (actor <- actors) { actor.start } // Send votes to the actors (1000 votes each) val random = new scala.util.Random for (i <- 1 to 5000) { actors(i % actors.length) ! random.nextBoolean } // Tell the actors to quit actors foreach(_ ! "quit") }
The Counter class class Counter(id: Int) extends Actor { var yes, no = 0 def act = loop { react { case true => yes += 1 case false => no += 1 case "quit" => printf("Counter #%d got %d yes, %d no.\n", id, yes, no) case x => println("Counter " + id + " didn't understand " + x) } } }
The same program, all on one slide import scala.actors.Actor object ActorTest { def main(args: Array[String]) { // Create and start some actors val actors = (1 to 5) map (new Counter(_)) for (actor <- actors) { actor.start } // Send votes to the actors (1000 votes each) val random = new scala.util.Random for (i <- 1 to 5000) { actors(i % actors.length) ! random.nextBoolean } // Tell the actors to quit actors foreach(_ ! "quit") } } class Counter(id: Int) extends Actor { var yes, no = 0 def act = loop { react { case true => yes += 1 case false => no += 1 case "quit" => printf("Counter #%d got %d yes, %d no.\n", id, yes, no) case x => println("Counter " + id + " didn't understand " + x) } } }
Typical results Counter #4 got 509 yes, 491 no. Counter #1 got 468 yes, 532 no. Counter #2 got 492 yes, 508 no. Counter #3 got 501 yes, 499 no. Counter #5 got 499 yes, 501 no.
Counting 3s In Principles of Parallel Programming by Lin and Snyder, they use the example of counting how many times the number 3 occurs in a large array This can be done by creating a number of actors, each of which counts 3s in part of the array The partial counts are then added to get the total count My version, with timing information, starts out like this: import scala.actors.Actor import scala.actors.Actor._ object Count3s { val random = new java.util.Random() // to make up data val numberOfActors = 4 // because I have a quad-core machine
Main method def main(args: Array[String]) { val Size = 1000000 var seqCount, conCount = 0 val array = new Array[Int](Size) for (i <- 0 until Size) { array(i) = 1 + random.nextInt(3) } var startTime = System.currentTimeMillis for(runs <- 1 to 1000) seqCount = count3sSequentially(array) var finishTime = System.currentTimeMillis printf("%5d ms. to find %d threes\n", finishTime - startTime, seqCount) startTime = System.currentTimeMillis for(runs <- 1 to 1000) conCount = count3sConcurrently(array) finishTime = System.currentTimeMillis printf("%5d ms. to find %d threes\n", finishTime - startTime, conCount) } We go through the million location array 1000 times, in order to slow down the program and get more accurate timings
count3sSequentially def count3sSequentially(array: Array[Int]) = { var count = 0 for (n <- array; if n == 3) count += 1 count } In the next slide, the segment method is used to determine a range of indices (“bottom” to “top”) for each actor to work on
count3sConcurrently def count3sConcurrently(array: Array[Int]) = { val caller = self for ((bottom, top) <- segment(array.length, numberOfActors)) { val counter = actor { // These actors just start; no need to wait for a message var count = 0 for (i <- bottom to top; if array(i) == 3) count += 1 caller ! count } } var total = 0 // Get a number from each and every actor before continuing for (i <- 1 to numberOfActors) { receive { case n: Int => total += n case _ => } } total }
The segment method The segment method breaks an array of n locations into k approximately equal parts Example: segment(1000, 3) returns Vector((0,333), (334,667), (668,999)) This is just routine programming, but I present it here because it’s surprisingly difficult to get right def segment(problemSize: Int, numberOfSegments: Int) = { val segmentSize = ((problemSize + 1) toDouble) / numberOfSegments def intCeil(d: Double) = (d ceil) toInt; for { i <- 0 until numberOfSegments bottom = intCeil(i * segmentSize) top = intCeil((i + 1) * segmentSize - 1) min (problemSize - 1) } yield( (bottom, top) ) }
Typical results I have four cores! Where’s my 400% speedup?! You can see: One core maxed out versus four cores almost maxed out Typical results: 11075 ms. to find 333469 threes 9146 ms. to find 333469 threes This is about a 21% speedup I have four cores! Where’s my 400% speedup?! running concurrently running sequentially
Analysis of results Almost all of the lack of speedup is due to threading overhead A small part of the problem is having to index explicitly into the array: for (i <- bottom to top; if array(i) == 3) count += 1 instead of the more efficient for (n <- array; if n == 3) count += 1 In this program, the amount of non-concurrent code is probably not a significant factor Concurrency does work to speed up programs (on a multicore machine), but don’t expect great benefits
Minimizing Thread creation I have an array of one million items, and I count the threes in it one thousand times Each time I did the counting, I created four Actors (Threads), which were subsequently discarded What if I created four Actors once, and reused them, thus saving 3996 Thread creations/destructions? I won’t show you the code It takes a significant rewrite, not a minor revision, to do it with reuseable Actors Typical results: 11054 ms. to find 333253 threes 8888 ms. to find 333253 threes We’ve gone from a 21% speedup to a 24% speedup That’s not a lot, but it’s pretty consistent
Using conventional shared state var sharedTotal = 0 def adder(n: Int) = synchronized { sharedTotal += n } def count3sConcurrently(array: Array[Int]): Int = { var counters = List[Counter]() for ((bottom, top) <- segment(array.length, numberOfActors)) { counters = new Counter(bottom, top) :: counters } for (counter <- counters) { counter.start } for (counter <- counters) { counter.join } sharedTotal } class Counter(val bottom: Int, val top: Int) extends Thread { override def run = { var count = 0 for (i <- bottom to top; if array(i) == 3) count += 1 adder(count) } } 9247 ms. to find 333304000 threes Essentially the same time as my first attempt Has a trivial bug (left as an exercise for the reader)
Doing it right Martin Odersky gives some rules for using Actors effectively Actors should not block while processing a message Communicate with actors only via messages Scala does not prevent actors from sharing state, so it’s (unfortunately) very easy to do Prefer immutable messages Mutable data in a message is shared state Make messages self-contained When you get a response from an actor, it may not be obvious what is being responded to If the request is immutable, it’s very inexpensive to include the request as part of the response The use of case classes often makes messages more readable
The End