All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

1 Collective Operations Dr. Stephen Tse Lesson 12.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

EECC756 - Shaaban #1 lec # 8 Spring Synchronous Iteration Iteration-based computation is a powerful method for solving numerical (and some.

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Numerical Algorithms • Matrix multiplication

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

and Divide-and-Conquer Strategies

COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

L19: Putting it together: N-body (Ch. 6) November 22, 2011.

Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,

Data Structures and Algorithms in Parallel Computing Lecture 10.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

CSCI-455/552 Introduction to High Performance Computing Lecture 15.

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,

Numerical Algorithms Chapter 11.

Parallel Computing and Parallel Computers

Setup distribution of N particles

Kinematics Introduction to Motion

Gauss-Siedel Method.

CS4402 – Parallel Computing

Synchronous Computations

Pattern Parallel Programming

Partitioning and Divide-and-Conquer Strategies

Algorithm Analysis CSE 2011 Winter September 2018.

Lecture 19 MA471 Fall 2003.

Collective Communication Operations

and Divide-and-Conquer Strategies

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Systems of Particles.

Numerical Algorithms • Parallelizing matrix multiplication

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Pipelined Computations

Parallel Computation Patterns (Scan)

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Pipelined Pattern This pattern is implemented in Seeds, see

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Notes on Assignment 3 OpenMP Stencil Pattern

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

Setup distribution of N particles

Barriers implementations

Jacobi Project Salvatore Orlando.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 28,

Quiz Questions Iterative Synchronous Pattern

Introduction to High Performance Computing Lecture 16

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

Parallel Graph Algorithms

Data Parallel Pattern 6c.1

Synchronizing Computations

Quiz Questions Iterative Synchronous Pattern

September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

Presentation transcript:

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012. slides3b.ppt Revised Oct 17, 2013

All-to-All communication Some problems requires this. Examples N-body problem Solving dense system of linear equations

Gravitational N-Body Problem Finding positions and movements of bodies in space subject to gravitational forces from other bodies. Use Newtonian laws of physics: Equations: Gravitational force between two bodies of masses ma and mb is: G, gravitational constant. r distance between bodies. Subject to forces, body accelerates according to Newton’s 2nd law: F = ma M, mass of body, F force it experiences, a resultant acceleration.

Details Force – First compute the force: Force – First compute the force: Velocity -- Let time interval be Dt. From: for a body of mass m, new velocity is: where vt+1 is velocity at time t + 1 and vt is velocity at time t. Position -- Over interval Dt, position changes by where xt is its position at time t. Once bodies move to new positions, forces change. Computation has to be repeated.

This then gives the velocity and positions in three directions. This then gives the velocity and positions in three directions.

This then gives the velocity and positions in two directions. two This then gives the velocity and positions in two directions.

Assignment 4 specifies two-dimensional space -- a little easier to visualize. y x Force on body r Another body Add the force cause by each body in x and x directions Moves Movement

Data for 2-D Gravitational N-body problem (Assignment 4) Table used to hold initial and computed data over time steps: Body Mass Position in x direction Position in y direction Velocity in x direction Velocity in y direction 1 2 … N On each iteration, position and velocities are updated. Table can be used to display movement of bodies

Sequential Code. The overall gravitational N-body computation can be described by the following steps: for (t = 0; t < tmax; t++) { //for each time period for (i = 0; i < N; i++) { //for body i, calculate force on body due to other bodies for (j = 0; j < N; j++) { if (i != j) { // for different bodies x_diff = ... ; // compute distance between body i and body j in x direction y_diff = ... ; // compute distance between body i and body j in y direction r = ... ; //compute distance r F = ... ; // compute force on bodies Fx[i] += ... ; // resolve and accumulate force in x direction Fy[i] += … ; // resolve and accumulate force in y direction } for (i = 0; i < N; i++) { // for each body, update positions and velocity A[i][x_velocity]= ... ; // new velocity in x direction A[i][y_velocity]= ... ; // new velocity in y direction A[i][x_position] = ... ; // new position in x direction A[i][y_position] = ... ; // new position in y direction } // end time period

Time complexity Brute-force sequential algorithm is an O(N2) algorithm for one iteration as each of the N bodies is influenced by each of the other N - 1 bodies. For t iterations, O(N2t) Not feasible to use this direct algorithm for most interesting N-body problems where N is very large.

Reducing time complexity Time complexity can be reduced approximating a cluster of distant bodies as a single distant body with mass sited at the center of mass of the cluster:

Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). • First, this cube is divided into eight subcubes. • If a subcube contains no particles, subcube deleted from further consideration. • If a subcube contains one body, subcube retained. • If a subcube contains more than one body, it is recursively divided until every subcube contains one body.

Creates an octtree - a tree with up to eight edges from each vertex (node). Leaves represent cells each containing one body. After tree constructed, total mass and center of mass of subcube stored at each vertex (node).

Force on each body obtained by traversing tree starting at root, stopping at a node when the clustering approximation can be used, e.g. when r is greater than some distance D. Constructing tree requires a time of O(NlogN), and so does computing all the forces, so that overall time complexity of method is O(NlogN).

Example for 2-dimensional space At each vertex, store coordinates of center of mass and total mass of bodies in space below (bodies) One body

Computing force on each body -- traverse tree starting at root, stopping at a node when clustering approximation can be used, i.e. when r is greater than some set distance D. For each body Mass and coordinates of center of mass of bodies in sub space

Orthogonal Recursive Bisection An alternative way of dividing space. (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required.

Iterative synchronous patterns When a pattern is repeated until some termination condition occurs. Synchronization at each iteration, to establish termination condition, often a global condition. Note this is actually two patterns joined together sequentially if we call iteration a pattern. Pattern Check termination condition Repeat Stop Note these pattern names are our names.

Iterative synchronous all-to-all pattern N-body problem needs an “iterative synchronous all-to-all” pattern, where on each iteration all the processes exchange data with each other: Iterative synchronous all-to-all pattern Repeat Stop Check termination condition 6a.19

Solving General System of Linear Equations Some problems of this type require a number of iterations to converge on the solution – example: Solving General System of Linear Equations by iteration Suppose equations are of a general form with n equations and n unknowns: where the unknowns are x0, x1, x2, … xn-1 (0 <= i < n). 6a.2020

By rearranging the ith equation: This equation gives xi in terms of the other unknowns. Can be used as an iteration formula for each of the unknowns to obtain better approximations. Process i computes xi 6a.2121

Suppose each process computes one unknown. Pi computes xi Process Pi needs unknowns from all other processes on each iteration P0 Pn-1 (Excluding Pi) Computes: Pi Needs iterative synchronous all-to-all pattern 6a.22

Jacobi Iteration Name given to a computation that uses the previous iteration value to compute the next values.* All values of x are updated together. Convergence: Can be proven that the Jacobi method will converge if diagonal values of a have an absolute value greater than sum of absolute values of other a’s on row, i.e. if This condition is a sufficient but not a necessary condition. * Other (non-parallel) methods use some of the present iteration values to compute the present values, see later. 6a.2323

Termination Simple, common approach is compare values computed in one iteration to values obtained from the previous iteration. Terminate computation when all values are within given tolerance; i.e., when However, this does not guarantee the solution to that accuracy. Why? 6a.2424

Convergence Rate 6a.25

Seeds “CompleteSynchGraph” Pattern All-to-all pattern that includes a synchronous iteration feature to pass results of one iteration to all the nodes before the next iteration. Instead of sharing a pool of tasks to execute, workers gets replicas of the initial data set. At each iteration, workers synchronize and update their replicas and proceed to new computations. Master node and framework will not get control of the data flow until all the iterations done.

More information on using Seeds CompleteSynchGraph Pattern Seeds CompleteSynchGraph tutorial: “Seeds Framework – The CompleteSynchGraph Template Tutorial,” Jeremy Villalobos and Yawo K. Adibolo, June 18, 2012. at http://coitweb.uncc.edu/~abw/PatternProgGroup/index.html (to be moved) Gives details with code for the Jacobi iteration method of solving system of linear equations

Notes on solution in CompleteSynchGraph Template Tutorial Equations in matrix-vector form: AX = B where: Converted to: Xk = CXk-1 + D where Xk is solution vector at iteration k Xk-1 the solution vector at iteration k-1 C a matrix derived from input matrix A D a vector derived from input vector B. Each slave assigned one or more equations to solve 6a.28

MPI implementation of all-to-all pattern MPI_Allgather() routine MPI_AllGather broadcasts and gather values in one composite construction: int MPI_Allgather(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) 6a.29

When does the MPI_AllGather() return? MPI-AllGather() has the same effect as n MPI_Gather()’s executed, for root = 0 to n-1. MPI_Gather has the same effect as each process executing an MPI_Send() and n MPI_Recv()s Question When does the MPI_AllGather() return? Answer 6a.30

Questions