Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,
Basic User Programmer Interface To create and execute parallel programs, programmer selects a pattern and implements three principal Java methods with a module class: Diffuse method – to distribute pieces of data. Compute method – the actual computation Gather method – used to gather the results Programmer also has to fill in details in a “bootstrap” class to deploy and start the framework. Diffuse Compute Gather “Run module” bootstrap class The framework self-deploys on a geographically distributed platform and executes pattern. 2 “Module” class
public Data DiffuseData (int segment) { DataMap d =new DataMap (); input Data = …. d.put(“name_of_inputdata", inputData); return d; } public Data Compute (Data data) { DataMap input = (DataMap )data; //data produced by DiffuseData() DataMap output = new DataMap (); //output returned to gatherdata inputData = input.get(“name_of_inputdata”); … // computation output.put("name_of _results", results); // to return to GatherData() return output; } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; outdata = out.get (“name_of_results”); result … // aggregate outdata from all the worker nodes. result a private variable } By framework segment used by Framework to keep track of where to put results GatherData gives back Data object with a segment number Data cast into a DataMap 3 Module class
Seeds Workpool DiffuseData, Compute, and GatherData Methods DiffuseData DataMap d Returns d to each slave DataMap d created in diffuse DataMap output created in compute DataMap output GatherData Private variable total (answer) Compute DataMap input Master Slaves Note DiffuseData, Compute and GatherData methods start with a capital letter although method names should not! 4
DataMap methods put (String, data) – puts data into DataMap identified by string get (String) – gets stored data identified by string DataMap extends Java HashMap which implement a Map, see Data and DataMap classes For implementation convenience two classes: Data class used to pass data between master and slaves Uses a “segment” number to keep track of packets as they go from one method to another. DataMap class inside compute method DataMap is a subclass of Data and so allows casting. 5
Bootstrap class package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MyModule pi = new MyModule(); Seeds.start( "/path/to/seeds/seed/folder", false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor( "hostname", Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } 6 This code deploys framework and starts execution of pattern Different patterns have similar code Name of module class
Framework methods used in Bootstrapping class Seeds methods start // starts framework, deploy nodes // on list of servers startPattern // starts seeds pattern waitOnPattern// waits for pattern to complete stop//stops framework 7
User methods used in Bootstrap class Additional methods can be specified by programmer in the Workpool class and can be invoked in the Bootstrap class. Typically a method is invoked that produces the final result. Example public double getPi() { // returns value of pi based all workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } 8
Seeds Implementations JXTA P2P networking version suitable for a fully distributed network of computers and requiring an Internet connection even in just running on a single computer, “No Network” JXTA P2P version for running on a single computer, not requiring an Internet connection Multicore version implemented with threads for more efficient execution on single multicore computer or shared memory multiprocessor system -- does not require an Internet connection. Two JXTA versions use the same application code and run in a similar fashion Multicore version use same Module code but slightly Bootstrap Run Module source code (see next) 9
Multicore version of Seeds Thread-based version using shared memory Faster than JXTA P2P version on a multicore platform Bootstrap class does not need to start and stop JXTA P2P. Seeds.start() and Seeds.stop() not needed. Module class unchanged 10 public class RunMonteCarloPiModule { public static void main(String[] args) { try { MyModule pi=new MyiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null, new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE), pi ), 4 ); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { … }
Workpool Pattern 1.Embarrassing Parallel Computation Monte Carlo 11
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without any inter-task communications during the computation. Monte Carlo methods use random selections. (For parallelizing Monte Carlo code, must address best way to generate random numbers in parallel.) 12
Circle formed within a 2 x 2 square. Ratio of area of circle to square given by: Points within square chosen randomly. Score kept of how many points happen to lie within circle. Fraction of points within circle will be approximately /4 given sufficient number of randomly selected samples. 13
One quadrant can be described by integral: Random pairs of numbers, (xr,yr) generated, each between 0 and 1. Counted as in circle if 14
public Data DiffuseData (int segment) { DataMap d =new DataMap (); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } DiffuseData Method (Required to be implemented) Seeds Monte Carlo code MonteCarloPiModule.java 15
public Data Compute (Data data) { DataMap input = (DataMap )data; DataMap output = new DataMap (); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } output.put("inside", inside); // to return to GatherData() return output; } Compute Method (Required to be implemented) 16
public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } GatherData Method (Required to be implemented) 17
getDataCount Method (Required to be implemented) public int getDataCount() { return random_samples; } 18
Method to compute result (used in bootstrap module) public double getPi() { // returns value of pi based on all workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } 19
package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); // reduce verbosity for logging random_samples = 3000; // set number of random samples } 20 public Data Compute (Data data) { // input gets the data produced by DiffuseData() DataMap input = (DataMap )data; // output will emit the partial answers done by this method DataMap output = new DataMap (); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } output.put("inside", inside);// store partial answer to return to GatherData() return output; } public Data DiffuseData (int segment) { DataMap d =new DataMap (); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } public void GatherData (int segment, Data dat) { DataMap out = (DataMap ) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } public double getPi() { // returns value of pi based on the job done by all the workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } Complete Monte Carlo pi code Note: No explicit message passing Computation
Bootstrap class RunMonteCarloPiModule.java package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( args[0], false); PipeID id = Seeds.startPattern( new Operand( (String[])null, new Anchor( args[1], Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { … } Deploys framework and runs code (JXTA P2P version) 21
Multicore version of Seeds 22 public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi=new MonteCarloPiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null, new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE), pi ), 4 ); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { … }
Measuring Time Can instrument code in the bootstrap class: public class RunMyModule { public static void main (String [] args ) { try{ long start = System.currentTimeMillis(); MyModule m = new MyModule(); Seeds.start(. ); PipeID id = ( … ); Seeds.waitOnPattern(id); Seeds.stop(); long stop = System.currentTimeMillis(); double time = (double) (stop - start) / ; System.out.println(“Execution time = " + time); } catch (SecurityException e) { … … 23
Compiling/executing Can be done on the command line (ant script provided) or through an IDE (Eclipse) 24
Now to try the code. Turn to “Session 1 Hands-on notes:” Need first to transfer Seeds software and projects from a flash drive to your PC. 25