All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012. slides3b.ppt June 25, 2012
All-to-All communication Some problems requires this. Examples N-body problem Solving dense system of linear equations
Gravitational N-Body Problem Finding positions and movements of bodies in space subject to gravitational forces from other bodies. Use Newtonian laws of physics: Equations: Gravitational force between two bodies of masses ma and mb is: G, gravitational constant. r distance between bodies. Subject to forces, body accelerates according to Newton’s 2nd law: F = ma M, mass of body, F force it experiences, a resultant acceleration.
Let the time interval be t. For a body of mass m, the force is: Details Let the time interval be t. For a body of mass m, the force is: New velocity is: where vt+1 is velocity at time t + 1 and vt is velocity at time t. Over time interval Dt, position changes by where xt is its position at time t. Once bodies move to new positions, forces change. Computation has to be repeated.
Time complexity Brute-force sequential algorithm is an O(N2) algorithm for one iteration as each of the N bodies is influenced by each of the other N - 1 bodies. For t iterations, O(N2t) Not feasible to use this direct algorithm for most interesting N-body problems where N is very large.
Time complexity can be reduced approximating a cluster of distant bodies as a single distant body with mass sited at the center of mass of the cluster:
Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). • First, this cube is divided into eight subcubes. • If a subcube contains no particles, subcube deleted from further consideration. • If a subcube contains one body, subcube retained. • If a subcube contains more than one body, it is recursively divided until every subcube contains one body.
Creates an octtree - a tree with up to eight edges from each node. Leaves represent cells each containing one body. After tree constructed, total mass and center of mass of subcube stored at each node.
Force on each body obtained by traversing tree starting at root, stopping at a node when the clustering approximation can be used, e.g. when: where is a constant typically 1.0 or less. Constructing tree requires a time of O(NlogN), and so does computing all the forces, so that overall time complexity of method is O(NlogN).
Recursive division of 2-dimensional space
Orthogonal Recursive Bisection (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required.
Acknowledgments The following has been derived directly from: “Seeds Framework – The CompleteSynchGraph Template Tutorial,” Jeremy Villalobos and Yawo K. Adibolo, June 18, 2012 and “Pattern Programming in Parallel and distributed Computing Using the Seeds Framework,” Yawo K. Adibolo, April 2, 2012
Seeds CompleteSynchGraph Pattern All-to-all pattern that includes a synchronous iteration feature to pass results of one iteration to all the nodes before the next iteration. Instead of sharing a pool of tasks to execute, workers gets replicas of the initial data set. At each iteration, workers synchronize and update their replicas and proceed to new computations . Master node and framework will not get control of the data flow until all the iterations done. CompleteSynchGraph convenient problem such as the N-body problem and other iterative problems (e.g. solving system of linear equations by iteration, see later).
1. Initialization stage (mostly in master) Three phases: 1. Initialization stage (mostly in master) Instantiates module Activate framework Deplore pattern Initialize module (initializeModule) Send data to slaves (DiffuseData) AllToAllData Class Data diffused is wrapped with instances of an AllToAllData class that provides interfaces for slaves to communicate, exchange data and information, synchronize their computation results
2. Computation stage (in slave processes) Each iteration: Communication - Slaves exchange data for synchronization Framework calls AllToAllData getSyncData method to collect data for synchronization, then AllToAllData setSyncDataList() method to distribute data for synchronization Synchronization – each slave updates its local data with results collected from neighbors just before computation begins Programmer responsible for synchronization algorithm and data to be sent for synchronization Computation – each slave does its computation Framework calls OneIterationCompute() method. Method to return true if termination condition met, otherwise false. Programmer responsible for setting termination conditions and stopping computation. All slaves must perform same number of iterations
3. Conclusion stage Master collects results from workers (GatherData method) However, Framework recovers the flow of execution after each iteration (Different from workpool and pipeline patterns, when that only occurs when the overall task is completed.)
CompleteSynchGraph Interfaces CompleteSynchGraph pattern needs a minimum of three classes implemented: AllToAllData class Wraps initial data sent to workers with any other necessary information. Methods called by framework in slaves. Two methods must be implemented: getSyncData() Data returned by this method will be received by all other processes. setSyncDataList() Used to update locally computed data for that iteration. This method receives data returned by getSyncData(). Typically you will need methods for the computation done on each iteration.
2. Module class -- extending CompleteSyncGraph abstract class Methods programmers implements: initializeModule() Used to send data to worker processes before iterative process begins. DiffuseData() Used to send initial data from master to the slaves. GatherData() Programmer can expect to receive final state of the computation in GatherData() method. Because all worker processes end with the same data at the end of the computation, programmer can work with just one of the gather results and discard the rest. OneIterationCompute() The method is called to trigger computation for one iteration. Method will be called until the programmer’s algorithm returns true on this method. Programmer has to make sure all processes perform same number of iterations. getCellcount() Should return number of computation units. Used by framework to determine number of worker processes to activate .
Bootstrapping class -- preprocesses data, instantiates programmer’s module, activates framework and spawns pattern before computations begin. Basically similar to bootstrapping classes used in other patterns, such as the Workpool or the Pipeline patterns. It’s generally but not always a java main class and serve as an interface between the programmer ( or the programmer application) and the framework.
Using the CompleteSynchGraph pattern for 2-D Gravitational N-body problem Global data table used to hold the coordinates: Body Mass Position in x direction Position in y direction Velocity in x direction Velocity in y direction 1 2 … N Initialization Number of bodies set. Randomly set mass, position, and velocity Data table and list of bodies assigned to each slaves sent to slaves Iteration Slaves compute new positions/velocities, exchange their data, and update their replica of the data table. Conclusion Master collects data table from slaves (all the same)
Representing each body In Java, could extend the Point class, which already has fields to present the position, x and y, and methods for getting the location, moving the location etc. In any event, need a create a “Body” class that has the fields: private double x; // position in x direction private double y; // position in y direction private double Vx; //velocity in x direction private double Vy; //velocity in y direction private double mass; // mass private int ID; // body ID
Passing the data between slaves Implement the AllToAll Data class Holds: List of bodies (Say using List class) if more than one body per slave. Global data table Iteration number Could hold a complete record of the movement of the bodies as cannot display movement during computation – framework does not allow user control until end of iterations.
Required method compute() //update the assigned bodies position and the global data table //Records body positions at current iterations public void compute(double dt){ for(int i = 0 ; i < assignment.size() ; i++){ assignment.get(i).update(data,dt); // call update on this body int id = assignment.get(i).getID(); // Get this body's id data[id][2] = assignment.get(i).x; // update the global data table data[id][3] = assignment.get(i).y; // update the global data table updates[i][0] = id ; // set the synchronization data updates[i][1] = assignment.get(i).x; // set the synchronization data updates[i][2] = assignment.get(i).y; // set the synchronization data positions[i][iteration] = new Point(assignment.get(i).x, assignment.get(i).y);// records this body position at iterartion }
Required method getSyncData() // Get the assigned updates for synchronization @Override public Serializable getSyncData() { return updates ; }
Required method setSyncDataList() // Synchronize the global data table @Override public void setSyncDataList(List<Serializable> list){ for(int i = 0 ; i < list.size() ; i++){ int[][] update = (int[][])list.get(i); // Extract update data for(int j = 0 ; j < update.length ; j++){ int id = update[j][0]; // Get the body id for this update data[id][2] = update[j][1]; // synchronize the global table data[id][3] = update[j][2]; // synchronize the global table }
Module class public class NbodyModule extends CompleteSyncGraph { private static final long serialVersionUID = 1L; private List<Body> workpool; // pool of bodies private int body_count ; // number of bodies to simulate private int cell_count = 4 ; // number of worker nodes private int iteration_count ; //number of iteration private int body_per_cell; // max number of bodies assigned to a cell // Assume that a worker will handle at least a body private int current = 0 ; // Current body to be assigned from the workpool private double dt ; //Simulation Time Interval private double[][] data; // global data table used by workers private Point[][] positions; // bodies positions table for graphical simulation
// Module Constructor public NbodyModule(String // Module Constructor public NbodyModule(String...args){ body_count = Integer.parseInt(args[0]); iteration_count = Integer.parseInt(args[1]); body_per_cell = (int)Math.ceil(body_count/4); workpool = new ArrayList<Body>(); positions = new Point[body_count][iteration_count]; data = new double[body_count][4]; // Create , initialize bodies . Build Global data table , Set<Body> set = new HashSet<Body>(); Random generator = new Random(); for(int i = 0 ; i < body_count ; i++){ Body body = null ; do{ body = new Body(i,(int)((800 * generator.nextDouble()) - 400), (int)((800 * generator.nextDouble()) - 400), ((2 * generator.nextDouble()) - 1), ((int)(generator.nextDouble()* 990 )+ 10 )); data[i][0] = (double)i; data[i][1] = body.getMass(); data[i][2] = (double)body.x; data[i][3] = (double)body.y; }while(!set.add(body)); System.out.println(body); } Iterator<Body> iterator = set.iterator(); while(iterator.hasNext()){ workpool.add(iterator.next()); Need to simplify
initializeModule() // Initialize worker nodes @Override public void initializeModule(String[] args) { iteration_count = Integer.parseInt(args[3]); dt = Double.parseDouble(args[4]); }
DiffuseData() // Send data to worker @Override public AllToAllData DiffuseData(int segment) { List<Body> assignment = new ArrayList(); //create a list of assigned bodies // for the next worker int bodies_assigned = 0; while((current < workpool.size())&& (bodies_assigned < body_per_cell)){ Body body = workpool.get(current); // Get next body from the pool assignment.add(body); //Assign the body to the next worker bodies_assigned++; current++ ; } return new NbodyData(assignment,data,iteration_count);
OneIterationCompute() // Compute bodies' position on worker nodes @Override public boolean OneIterationCompute(AllToAllData data) { NbodyData input = (NbodyData)data ; if(input.getIteration() >= iteration_count) return true ; input.compute(dt); // call to compute new positions input.increment(); // update iteration return false; }
GatherData() // gather final results after all iterations @Override public void GatherData(int segment, AllToAllData data) { NbodyData output = (NbodyData)data ; // Get the nbody data List<Body> assignment = output.getData(); // Extract list of assigned data Point[][] Positions = output.getPositions(); // Get positions records for(int i = 0 ; i < Positions.length ; i++){ int id = assignment.get(i).getID(); // get this record ID for(int j = 0 ; j < iteration_count ; j++){ positions[id][j] = Positions[i][j]; //update the global positions table }
getCellCount() // Return worker count @Override public int getCellCount() { return cell_count; }
Others // Return the positions records through the iterations public Point[][] getPositionRecords(){ return positions; } // Return the number of iterations; public int getIterations(){ return iteration_count; //Return the pool of bodies with initial positions public List<Body> getDataSet(){ return workpool;
RunNBodyModule //Nbody implementation bootstrapping class public class RunNbodyModule { static List<Body> workpool; // pool of bodies static Point[][] positions ; // Records of body positions static int iterations; //total number of iterations public static void main(String[] args) { try { NbodyModule module = new NbodyModule(args[2],args[3]); // create the module Seeds.start( args[0] , false); // start the framework System.out.println("Starting Pattern"); PipeID id = Seeds.startPattern( new Operand( args,new Anchor( args[1], Types.DataFlowRoll.SINK_SOURCE),module ) ); // spawn the pattern System.out.println(id.toString() ); Seeds.waitOnPattern(id); Seeds.stop(); workpool = module.getDataSet(); // get the bodies for display positions = module.getPositionRecords(); // Get the position records iterations = module.getIterations(); //Get the number of iterations // Run the nbody simulation SwingUtilities.invokeLater(new Runnable() { @Override public void run() { Graphic window = new Graphic( workpool ,positions , iterations); window.display(); } }); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { } catch (Exception e) {
Questions