All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Partitioning and Divide-and-Conquer Strategies Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Daniel Blackburn Load Balancing in Distributed N-Body Simulations.

and Divide-and-Conquer Strategies

1 UNC-Charlotte’s Grid Computing “Seeds” framework 1 © 2011 Jeremy Villalobos /B. Wilkinson Fall 2011 Grid computing course. Slides10-1.ppt Modification.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Week 4-5 Java Programming. Loops What is a loop? Loop is code that repeats itself a certain number of times There are two types of loops: For loop Used.

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 22 Java Collections.

Collections F The limitations of arrays F Java Collection Framework hierarchy  Use the Iterator interface to traverse a collection  Set interface, HashSet,

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Java Threads. What is a Thread? A thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently.

Chapter 18 Java Collections Framework

CSC 211 Data Structures Lecture 13

COP-3330: Object Oriented Programming Flow Control May 16, 2012 Eng. Hector M Lugo-Cordero, MS.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.

Multithreading in JAVA

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

Functional Processing of Collections (Advanced) 6.0.

Pattern Programming PP-1.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 29A, 2013 PatternProg-1.

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,

A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.

Pattern Programming Seeds Framework Notes on Assignment 1 PP-2.1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, August 30, 2012 PatternProg-2.

Information and Computer Sciences University of Hawaii, Manoa

Threads in Java Two ways to start a thread

Dr. Barry Wilkinson University of North Carolina Charlotte

Inheritance and Polymorphism

Pattern Parallel Programming

Partitioning and Divide-and-Conquer Strategies

Algorithm Analysis CSE 2011 Winter September 2018.

Java Programming: Guided Learning with Early Objects

Stencil Pattern A stencil describes a 2- or 3- dimensional layout of processes, with each process able to communicate with its neighbors. Appears in simulating.

and Divide-and-Conquer Strategies

Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.

Programming with Parallel Design Patterns

B. Wilkinson/Clayton Ferner Seeds.ppt Modification date August

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

© B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 session2a

Sub-Quadratic Sorting Algorithms

Dr. Barry Wilkinson University of North Carolina Charlotte

int [] scores = new int [10];

Quiz Questions Seeds pattern programming framework

Notes on Assignment 3 OpenMP Stencil Pattern

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Pattern Programming Seeds Framework Workpool Assignment 1

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Multithreading in java.

Quiz Questions Seeds pattern programming framework

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Threads and Multithreading

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

Introduction to High Performance Computing Lecture 17

CMSC 202 Threads.

Java Chapter 3 (Estifanos Tilahun Mihret--Tech with Estif)

Presentation transcript:

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012. slides3b.ppt June 25, 2012

All-to-All communication Some problems requires this. Examples N-body problem Solving dense system of linear equations

Gravitational N-Body Problem Finding positions and movements of bodies in space subject to gravitational forces from other bodies. Use Newtonian laws of physics: Equations: Gravitational force between two bodies of masses ma and mb is: G, gravitational constant. r distance between bodies. Subject to forces, body accelerates according to Newton’s 2nd law: F = ma M, mass of body, F force it experiences, a resultant acceleration.

Let the time interval be t. For a body of mass m, the force is: Details Let the time interval be t. For a body of mass m, the force is: New velocity is: where vt+1 is velocity at time t + 1 and vt is velocity at time t. Over time interval Dt, position changes by where xt is its position at time t. Once bodies move to new positions, forces change. Computation has to be repeated.

Time complexity Brute-force sequential algorithm is an O(N2) algorithm for one iteration as each of the N bodies is influenced by each of the other N - 1 bodies. For t iterations, O(N2t) Not feasible to use this direct algorithm for most interesting N-body problems where N is very large.

Time complexity can be reduced approximating a cluster of distant bodies as a single distant body with mass sited at the center of mass of the cluster:

Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). • First, this cube is divided into eight subcubes. • If a subcube contains no particles, subcube deleted from further consideration. • If a subcube contains one body, subcube retained. • If a subcube contains more than one body, it is recursively divided until every subcube contains one body.

Creates an octtree - a tree with up to eight edges from each node. Leaves represent cells each containing one body. After tree constructed, total mass and center of mass of subcube stored at each node.

Force on each body obtained by traversing tree starting at root, stopping at a node when the clustering approximation can be used, e.g. when: where is a constant typically 1.0 or less. Constructing tree requires a time of O(NlogN), and so does computing all the forces, so that overall time complexity of method is O(NlogN).

Recursive division of 2-dimensional space

Orthogonal Recursive Bisection (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required.

Acknowledgments The following has been derived directly from: “Seeds Framework – The CompleteSynchGraph Template Tutorial,” Jeremy Villalobos and Yawo K. Adibolo, June 18, 2012 and “Pattern Programming in Parallel and distributed Computing Using the Seeds Framework,” Yawo K. Adibolo, April 2, 2012

Seeds CompleteSynchGraph Pattern All-to-all pattern that includes a synchronous iteration feature to pass results of one iteration to all the nodes before the next iteration. Instead of sharing a pool of tasks to execute, workers gets replicas of the initial data set. At each iteration, workers synchronize and update their replicas and proceed to new computations . Master node and framework will not get control of the data flow until all the iterations done. CompleteSynchGraph convenient problem such as the N-body problem and other iterative problems (e.g. solving system of linear equations by iteration, see later).

1. Initialization stage (mostly in master) Three phases: 1. Initialization stage (mostly in master) Instantiates module Activate framework Deplore pattern Initialize module (initializeModule) Send data to slaves (DiffuseData) AllToAllData Class Data diffused is wrapped with instances of an AllToAllData class that provides interfaces for slaves to communicate, exchange data and information, synchronize their computation results

2. Computation stage (in slave processes) Each iteration: Communication - Slaves exchange data for synchronization Framework calls AllToAllData getSyncData method to collect data for synchronization, then AllToAllData setSyncDataList() method to distribute data for synchronization Synchronization – each slave updates its local data with results collected from neighbors just before computation begins Programmer responsible for synchronization algorithm and data to be sent for synchronization Computation – each slave does its computation Framework calls OneIterationCompute() method. Method to return true if termination condition met, otherwise false. Programmer responsible for setting termination conditions and stopping computation. All slaves must perform same number of iterations

3. Conclusion stage Master collects results from workers (GatherData method) However, Framework recovers the flow of execution after each iteration (Different from workpool and pipeline patterns, when that only occurs when the overall task is completed.)

CompleteSynchGraph Interfaces CompleteSynchGraph pattern needs a minimum of three classes implemented: AllToAllData class Wraps initial data sent to workers with any other necessary information. Methods called by framework in slaves. Two methods must be implemented: getSyncData() Data returned by this method will be received by all other processes. setSyncDataList() Used to update locally computed data for that iteration. This method receives data returned by getSyncData(). Typically you will need methods for the computation done on each iteration.

2. Module class -- extending CompleteSyncGraph abstract class Methods programmers implements: initializeModule() Used to send data to worker processes before iterative process begins. DiffuseData() Used to send initial data from master to the slaves. GatherData() Programmer can expect to receive final state of the computation in GatherData() method. Because all worker processes end with the same data at the end of the computation, programmer can work with just one of the gather results and discard the rest. OneIterationCompute() The method is called to trigger computation for one iteration. Method will be called until the programmer’s algorithm returns true on this method. Programmer has to make sure all processes perform same number of iterations. getCellcount() Should return number of computation units. Used by framework to determine number of worker processes to activate .

Bootstrapping class -- preprocesses data, instantiates programmer’s module, activates framework and spawns pattern before computations begin. Basically similar to bootstrapping classes used in other patterns, such as the Workpool or the Pipeline patterns. It’s generally but not always a java main class and serve as an interface between the programmer ( or the programmer application) and the framework.

Using the CompleteSynchGraph pattern for 2-D Gravitational N-body problem Global data table used to hold the coordinates: Body Mass Position in x direction Position in y direction Velocity in x direction Velocity in y direction 1 2 … N Initialization Number of bodies set. Randomly set mass, position, and velocity Data table and list of bodies assigned to each slaves sent to slaves Iteration Slaves compute new positions/velocities, exchange their data, and update their replica of the data table. Conclusion Master collects data table from slaves (all the same)

Representing each body In Java, could extend the Point class, which already has fields to present the position, x and y, and methods for getting the location, moving the location etc. In any event, need a create a “Body” class that has the fields: private double x; // position in x direction private double y; // position in y direction private double Vx; //velocity in x direction private double Vy; //velocity in y direction private double mass; // mass private int ID; // body ID

Passing the data between slaves Implement the AllToAll Data class Holds: List of bodies (Say using List class) if more than one body per slave. Global data table Iteration number Could hold a complete record of the movement of the bodies as cannot display movement during computation – framework does not allow user control until end of iterations.

Required method compute() //update the assigned bodies position and the global data table //Records body positions at current iterations public void compute(double dt){ for(int i = 0 ; i < assignment.size() ; i++){ assignment.get(i).update(data,dt); // call update on this body int id = assignment.get(i).getID(); // Get this body's id data[id][2] = assignment.get(i).x; // update the global data table data[id][3] = assignment.get(i).y; // update the global data table updates[i][0] = id ; // set the synchronization data updates[i][1] = assignment.get(i).x; // set the synchronization data updates[i][2] = assignment.get(i).y; // set the synchronization data positions[i][iteration] = new Point(assignment.get(i).x, assignment.get(i).y);// records this body position at iterartion }

Required method getSyncData() // Get the assigned updates for synchronization @Override public Serializable getSyncData() { return updates ; }

Required method setSyncDataList() // Synchronize the global data table @Override public void setSyncDataList(List<Serializable> list){ for(int i = 0 ; i < list.size() ; i++){ int[][] update = (int[][])list.get(i); // Extract update data for(int j = 0 ; j < update.length ; j++){ int id = update[j][0]; // Get the body id for this update data[id][2] = update[j][1]; // synchronize the global table data[id][3] = update[j][2]; // synchronize the global table }

Module class public class NbodyModule extends CompleteSyncGraph { private static final long serialVersionUID = 1L; private List<Body> workpool; // pool of bodies private int body_count ; // number of bodies to simulate private int cell_count = 4 ; // number of worker nodes private int iteration_count ; //number of iteration private int body_per_cell; // max number of bodies assigned to a cell // Assume that a worker will handle at least a body private int current = 0 ; // Current body to be assigned from the workpool private double dt ; //Simulation Time Interval private double[][] data; // global data table used by workers private Point[][] positions; // bodies positions table for graphical simulation

// Module Constructor public NbodyModule(String // Module Constructor public NbodyModule(String...args){ body_count = Integer.parseInt(args[0]); iteration_count = Integer.parseInt(args[1]); body_per_cell = (int)Math.ceil(body_count/4); workpool = new ArrayList<Body>(); positions = new Point[body_count][iteration_count]; data = new double[body_count][4]; // Create , initialize bodies . Build Global data table , Set<Body> set = new HashSet<Body>(); Random generator = new Random(); for(int i = 0 ; i < body_count ; i++){ Body body = null ; do{ body = new Body(i,(int)((800 * generator.nextDouble()) - 400), (int)((800 * generator.nextDouble()) - 400), ((2 * generator.nextDouble()) - 1), ((int)(generator.nextDouble()* 990 )+ 10 )); data[i][0] = (double)i; data[i][1] = body.getMass(); data[i][2] = (double)body.x; data[i][3] = (double)body.y; }while(!set.add(body)); System.out.println(body); } Iterator<Body> iterator = set.iterator(); while(iterator.hasNext()){ workpool.add(iterator.next()); Need to simplify

initializeModule() // Initialize worker nodes @Override public void initializeModule(String[] args) { iteration_count = Integer.parseInt(args[3]); dt = Double.parseDouble(args[4]); }

DiffuseData() // Send data to worker @Override public AllToAllData DiffuseData(int segment) { List<Body> assignment = new ArrayList(); //create a list of assigned bodies // for the next worker int bodies_assigned = 0; while((current < workpool.size())&& (bodies_assigned < body_per_cell)){ Body body = workpool.get(current); // Get next body from the pool assignment.add(body); //Assign the body to the next worker bodies_assigned++; current++ ; } return new NbodyData(assignment,data,iteration_count);

OneIterationCompute() // Compute bodies' position on worker nodes @Override public boolean OneIterationCompute(AllToAllData data) { NbodyData input = (NbodyData)data ; if(input.getIteration() >= iteration_count) return true ; input.compute(dt); // call to compute new positions input.increment(); // update iteration return false; }

GatherData() // gather final results after all iterations @Override public void GatherData(int segment, AllToAllData data) { NbodyData output = (NbodyData)data ; // Get the nbody data List<Body> assignment = output.getData(); // Extract list of assigned data Point[][] Positions = output.getPositions(); // Get positions records for(int i = 0 ; i < Positions.length ; i++){ int id = assignment.get(i).getID(); // get this record ID for(int j = 0 ; j < iteration_count ; j++){ positions[id][j] = Positions[i][j]; //update the global positions table }

getCellCount() // Return worker count @Override public int getCellCount() { return cell_count; }

Others // Return the positions records through the iterations public Point[][] getPositionRecords(){ return positions; } // Return the number of iterations; public int getIterations(){ return iteration_count; //Return the pool of bodies with initial positions public List<Body> getDataSet(){ return workpool;

RunNBodyModule //Nbody implementation bootstrapping class public class RunNbodyModule { static List<Body> workpool; // pool of bodies static Point[][] positions ; // Records of body positions static int iterations; //total number of iterations public static void main(String[] args) { try { NbodyModule module = new NbodyModule(args[2],args[3]); // create the module Seeds.start( args[0] , false); // start the framework System.out.println("Starting Pattern"); PipeID id = Seeds.startPattern( new Operand( args,new Anchor( args[1], Types.DataFlowRoll.SINK_SOURCE),module ) ); // spawn the pattern System.out.println(id.toString() ); Seeds.waitOnPattern(id); Seeds.stop(); workpool = module.getDataSet(); // get the bodies for display positions = module.getPositionRecords(); // Get the position records iterations = module.getIterations(); //Get the number of iterations // Run the nbody simulation SwingUtilities.invokeLater(new Runnable() { @Override public void run() { Graphic window = new Graphic( workpool ,positions , iterations); window.display(); } }); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { } catch (Exception e) {

Questions