Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

MPI Message Passing Interface
879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Heterogeneous and Grid Computing2 Course Outline u Outline of heterogeneous hardware –Heterogeneous clusters –Local networks of computers –Organizational.
Reference: Message Passing Fundamentals.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
Chapter 6: User-Defined Functions I
Heterogeneous and Grid Compuitng2 Implementation issues u Heterogeneous parallel algorithms –Design and analysis »Good progress over last decade –Scientific.
ISPDC 2007, Hagenberg, Austria, 5-8 July On Grid-based Matrix Partitioning for Networks of Heterogeneous Processors Alexey Lastovetsky School of.
Learning Objectives Understanding the difference between processes and threads. Understanding process migration and load distribution. Understanding Process.
Communication in Distributed Systems –Part 2
Topic Overview One-to-All Broadcast and All-to-One Reduction
Parallel and Cluster Computing 1 High Performance Computing on Heterogeneous Networks.
Chapter 6: User-Defined Functions I
Guide To UNIX Using Linux Third Edition
Heterogeneous and Grid Computing2 Communication models u Modeling the performance of communications –Huge area –Two main communities »Network designers.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
C++ fundamentals.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
What is Concurrent Programming? Maram Bani Younes.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Parameter Passing Mechanisms Reference Parameters § §
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Chapter 8 Arrays. A First Book of ANSI C, Fourth Edition2 Introduction Atomic variable: variable whose value cannot be further subdivided into a built-in.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Chapter 3 AS3 Programming. Introduction Algorithms + data structure =programs Why this formula relevant to application programs created in flash? The.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Introduction to C Programming CE Lecture 6 Functions, Parameters and Arguments.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
Engineered for Tomorrow Unit:8-Idioms Engineered for Tomorrow sharmila V Dept of CSE.
About the Presentations
Steven Whitham Jeremy Woods
Many-core Software Development Platforms
Parallel Programming with MPI and OpenMP
CSCE569 Parallel Computing
An Orchestration Language for Parallel Objects
ENERGY 211 / CME 211 Lecture 8 October 8, 2008.
Presentation transcript:

Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the extra challenges of heterogeneous parallel computing »mpC, HeteroMPI –For high performance distributed computing »NetSolve/GridSolve

Heterogeneous and Grid Compuitng3 mpC u mpC –An extension of ANSI C for programming parallel computations on networks of heterogeneous computers –Support efficient, portable and modular heterogeneous parallel programming –Addresses the heterogeneity of both processors and communication network

Heterogeneous and Grid Compuitng4 mpC (ctd) u A parallel mpC program is a set of parallel processes interacting (that is, synchronizing their work and transferring data) by means of message passing u The mpC programmer cannot determine how many processes make up the program and which computers execute which processes –This is specified by some means external to the mpC language –Source mpC code only determines, which process of the program performs which computations.

Heterogeneous and Grid Compuitng5 mpC (ctd) u The programmer describes the algorithm –The number of processes executing the algorithm –The total volume of computation to be performed by each process »A formula including the parameters of the algorithm »The volume is measured in computation units provided by the application programmer u The very code that has been used to measure the speed of the processors

Heterogeneous and Grid Compuitng6 mpC (ctd) u The programmer describes the algorithm (ctd) –The total volume of data transferred between each pair of the processes –How the processes perform the computations and communications and interact »In terms of traditional algorithmic patterns (for, while, parallel for, etc) »Expressions in the statements specify not the computations and communications themselves but rather their amount u Parameters of the algorithm and locally declared variables can be used

Heterogeneous and Grid Compuitng7 mpC (ctd) u The abstract processes of the algorithm are mapped to the real parallel processes of the program –The mapping of the abstract processes should minimize the execution time of this program

Heterogeneous and Grid Compuitng8 mpC (ctd) u Example (see handouts for full code): algorithm HeteroAlgorithm(int n, double v[n]) { coord I=n; node { I>=0: v[I]; }; }; … int [*]main(int [host]argc, char **[host]argv) { … { net HeteroAlgorithm(N, volumes) g; … }

Heterogeneous and Grid Compuitng9 mpC (ctd) u The program calculates the mass of a metallic construction welded from N heterogeneous rails –It defines group g consisting of N abstract processes, each calculating the mass of one of the rails –The calculation is performed by numerical 3D integration of the density function Density with a constant integration step »The volume of computation to calculate the mass of each rail is proportional to the volume of this rail –i -th element of array volumes contains the volume of i -th rail »The program specifies that the volume of computation performed by each abstract process of g is proportional to the volume of its rail

Heterogeneous and Grid Compuitng10 mpC (ctd)  The library nodal function MPC_Wtime is used to measure the wall time elapsed to execute the calculations u Mapping of abstract processes to real processes –Based on the information about the speed, at which the real processes run on physical processors of the executing network

Heterogeneous and Grid Compuitng11 mpC (ctd) u By default, the speed estimation obtained on initialization of the mpC system on the network is used –The estimation is obtained by running a special test program u mpC allows the programmer to change at runtime the default estimation of processor speed by tuning it to the computations, which will be really executed –The recon statement

Heterogeneous and Grid Compuitng12 mpC (ctd) u An irregular problem –Characterized by inherent coarse/large- grained structure –This structure determines a natural decomposition of the problem into a small number of subtasks »Of different size »Can be solved in parallel

Heterogeneous and Grid Compuitng13 mpC (ctd) u The whole program solving the irregular problem –A set of parallel processes –Each process solves its subtask »As sizes of subtasks are different, processes perform different volumes of computation –The processes are interacting via message passing u Calculation of the mass of a mettalic «hedgehog» is an example of irregular problem

Heterogeneous and Grid Compuitng14 mpC (ctd) u A regular problem –The most natural decomposition is a large number of small identical subtasks that can be solved in parallel –As the subtasks are identical, they are of the same size u Multiplication of two n x n dense matrices is an example of a regular problem –Naturally decomposed into n 2 identical subtasks »Computation of one element of the resulting matrix u How to efficiently solve a regular problem on a network of heterogeneous computers?

Heterogeneous and Grid Compuitng15 mpC (ctd) u Main idea –Transform the problem into an irregular problem »Whose structure is determined by the structure of the executing network u The whole problem –Decomposed into a set of relatively large subproblems –Each subproblem is made of a number of small identical subtasks stuck together –The size of each subproblem depends on the speed of the processor solving this subproblem

Heterogeneous and Grid Compuitng16 mpC (ctd) u The parallel program –A set of parallel processes –Each process solves one subproblem on a separate physical processor »The volume of computation performed by each of these processes should be proportional to its speed –The processes are interacting via message passing

Heterogeneous and Grid Compuitng17 mpC (ctd) u Example. Parallel multiplication on a heterogeneous network of matrix A and the transposition of matrix B, where A, B are dense square n x n matrices.

Heterogeneous and Grid Compuitng18 mpC (ctd) u One step of parallel multiplication of matrices A and B T. The pivot row of blocks of matrix B (shown slashed) is first broadcast to all processors. Then, each processor in parallel with others computes its part of the corresponding column of blocks of the resulting matrix C.

Heterogeneous and Grid Compuitng19 mpC (ctd) u See handouts for the mpC program implementing this algorithm –The program first update the estimation of the speeds of processors with the code »Executed at each step of the main loop –The program first detects the number of physical processors

Heterogeneous and Grid Compuitng20 mpC: inter-process communication u Basic subset of mpC is based on the performance model of parallel algorithm ignoring communication operations –It presumes that »contribution of the communications into the total execution time of the algorithm is negligibly small compared to that of the computations –It is acceptable for »Computing on heterogeneous clusters »MP algorithms not frequently sending short messages –Not acceptable for “normal” algorithms running on common heterogeneous networks of computers

Heterogeneous and Grid Compuitng21 mpC: inter-process communication (ctd) u Compiler can optimally map parallel algorithms with substantial contribution of communication operations into the execution time only if programmers can specify –Absolute volumes of computation performed by processes –Volumes of data transferred between the processes

Heterogeneous and Grid Compuitng22 mpC: inter-process communication (ctd) u Volume of communication –Can be naturally measured in bytes u Volume of computation –What is the natural unit of measurement? »To allow the compiler to accurately estimate the execution time –In mpC, the unit is the very code which has been most recently used to estimate the speed of physical processors »Normally specified as part of the recon statement

Heterogeneous and Grid Compuitng23 mpC: N-body problem The system of bodies consists of large groups of bodies, with different groups at a good distance from each other. The bodies move under the influence of Newtonian gravitational attraction

Heterogeneous and Grid Compuitng24 mpC: N-body problem (ctd) u Parallel N-body algorithm –There is one-to-one mapping between groups of bodies and parallel processes of the algorithm –Each process »Holds in its memory all data characterising bodies of its group u Masses, positions and velocities of bodies »Responsible for its updating

Heterogeneous and Grid Compuitng25 mpC: N-body problem (ctd) u Parallel N-body algorithm (ctd) –The effect of each remote group is approximated by a single equivalent body »To update its group, each process requires the total mass and the center of mass of all remote groups u The total mass of each group of bodies is constant. It is calculated once. Each process receives from each of other processes its calculated total mass, and stores all the masses. u The center of mass of each group is a function of time. At each step of simulation, each process computes its center and sends it to other processes.

Heterogeneous and Grid Compuitng26 mpC: N-body problem (ctd) u Parallel N-body algorithm (ctd) –At each step of simulation the updated system of bodies is visualised »To do it, all groups of bodies are gathered to the process responsible for the visualisation, which is the host- process. –In general different groups have different sizes »Different processes perform different volumes of computation »different volumes of data are transferred between different pairs of processes

Heterogeneous and Grid Compuitng27 mpC: N-body problem (ctd) u Parallel N-body algorithm (ctd) The POV of each individual process: the system includes all bodies of its group, with each remote group approximated by a single equivalent body.

Heterogeneous and Grid Compuitng28 mpC: N-body problem (ctd) u Pseudocode of the N-body algorithm: Initialise groups of bodies on the host-process Visualize the groups of bodies Scatter the groups across processes Compute masses of the groups in parallel Communicate to share the masses among processes while(1) { Compute centers of mass in parallel Communicate the centers among processes Update the state of the groups in parallel Gather the groups to the host-process Visualize the groups of bodies }

Heterogeneous and Grid Compuitng29 mpC N-body application u The core is the specification of the performance model of the algorithm: algorithm Nbody(int m, int k, int n[m]) { coord I=m; node { I>=0: bench*((n[I]/k)*(n[I]/k)); }; link { I>0: length*(n[I]*sizeof(Body)) [I]->[0];}; parent [0]; };

Heterogeneous and Grid Compuitng30 mpC N-body application (ctd) u The most principle fragments of the rest of code: void [*] main(int [host]argc, char **[host]argv) {... // Make the test group consist of first Tgsize // bodies of the very first group of the system OldTestGroup[] = (*(pTestGroup)Groups[0])[]; recon Update_group(TGsize, &OldTestGroup, &TestGroup, 1,NULL,NULL,0 ); { net Nbody(NofGroups, TGsize, NofBodies) g; … }

Heterogeneous and Grid Compuitng31 mpC: algorithmic patterns u One more important feature of parallel algorithm is still not reflected in the performance model –The order of execution of computations and communications u As the model says nothing about how parallel processes interact during execution of the algorithm, the compiler assumes that –First, all processes execute all their computations in parallel –Then the processes execute all the communications in parallel –There is a synchronisation barrier between execution of the computations and communications

Heterogeneous and Grid Compuitng32 mpC: algorithmic patterns (ctd) u These assumption are unsatisfactory in case of –Data dependencies between computations performed by different processes »One process may need data computed by other processes in order to start its computations »This serialises some computations performed by different parallel processes ==> The real execution time of the algorithm will be longer –Overlapping of computations and communications »The real execution time of the algorithm will be shorter

Heterogeneous and Grid Compuitng33 mpC: algorithmic patterns (ctd) u Thus, if estimation is not based on the actual scenario of interaction of parallel processes –It may be not accurate which leads to non-optimal mapping of the algorithm to the executing network u Example. An algorithm with fully serialised computations. –Optimal mapping: »All the processes are asigned to the fastest physical processor –Mapping based on the above assumptions: »Involves all available physical processors

Heterogeneous and Grid Compuitng34 mpC: algorithmic patterns (ctd) u mpC addresses the problem –The programmer can specify the scenario of interaction of parallel processes during execution of the parallel algorithm –That specification is a part of the network type definition »The scheme declaration

Heterogeneous and Grid Compuitng35 mpC: algorithmic patterns (ctd) u Example 1. N-body algorithm algorithm Nbody(int m, int k, int n[m]) { coord I=m; node { I>=0: bench*((n[I]/k)*(n[I]/k)); }; link { I>0: length*(n[I]*sizeof(Body)) [I]->[0];}; parent [0]; scheme { int i; par (i=0; i<m; i++) 100%[i]; par (i=1; i [0]; };

Heterogeneous and Grid Compuitng36 mpC: algorithmic patterns (ctd) u Example 2. Matrix multiplication. algorithm ParallelAxBT(int p, int n, int r, int d[p]) { coord I=p; node { I>=0: bench*((d[I]*n)/(r*r)); }; link (J=p) { I!=J: length*(d[I]*n*sizeof(double)) [J]->[I]; }; parent [0];

Heterogeneous and Grid Compuitng37 mpC: algorithmic patterns (ctd) u Example 2. Matrix multiplication (ctd) scheme { int i, j, PivotProc=0, PivotRow=0; for(i=0; i<n/r; i++, PivotRow+=r) { if(PivotRow>=d[PivotProc]) { PivotProc++; PivotRow=0; } for(j=0; j<p; j++) if(j!=PivotProc) (100.*r/d[PivotProc])%[PivotProc]->[j]; par(j=0; j<p; j++) (100.*r/n)%[j]; } };

Heterogeneous and Grid Compuitng38 mpC: the timeof operator u Further modification of the matrix multiplication program: [host]: { int m; struct {int p; double t;} min; double t; min.p = 0; min.t = DBL_MAX; for(m=1; m<=p; m++) { Partition(m, speeds, d, n, r); t = timeof(net ParallelAxBT(m, n, r, d) w); if(t<min.t) { min.p = m; min.t = t; } } p = min.p; }

Heterogeneous and Grid Compuitng39 mpC: the timeof operator (ctd)  O perator timeof estimates the execution time of the parallel algorithm without its real execution –The only operand specifies a fully specified network type »The value of all parametrs of the network type must be specified –The operator does not create an mpC network of this type –Instead, it calculates the time of execution of the corresponding parallel algorithm on the executing network »Based on u the provided performance model of the algorithm u the most recent performance characteristics of physical processors and communication links

Heterogeneous and Grid Compuitng40 mpC: mapping u Dispatcher maps abstract processes of the mpC network to the processes of the parallel program –At runtime –Trying to minimize of the execution time u The mapping is based on »The model of the executing network of computers »A map of processes of the parallel program u The total number of processes running on each computer u The number of free processes

Heterogeneous and Grid Compuitng41 mpC: mapping (ctd) u The mapping is based on (ctd) »The performance model of the parallel algorithm represented by this mpC network u The number of parallel processes executing the algorithm u The absolute volume of computations performed by each of the processes u The absolute volume of data transferred between each pair of processes u The scenario of interaction between the parallel processes during the algorithm execution

Heterogeneous and Grid Compuitng42 mpC: mapping (ctd) u Two main features: –Estimation of each particular mapping »Based on u Formulas for –Each computation unit in the scheme declaration –Each communication unit in the scheme declaration u Rules for each sequential and parallel algorithmic pattern –for, if, par, etc.

Heterogeneous and Grid Compuitng43 HeteroMPI u HeteroMPI –An extension of MPI –Programmer can describe the performance model of the implemented algorithm »In a small model definition language shared with mpC –Given this description »HeteroMPI tries to create a group of processes executing the algorithm faster than any other group

Heterogeneous and Grid Compuitng44 HeteroMPI (ctd) u Standard MPI approach to group creation –Acceptable in homogeneous environments »If there is one process per processor »Any group will execute the algorithm with the same speed –Not acceptable »In heterogeneous environments »If there are more that one process per processor u In HeteroMPI –The programmer can describe the algorithm –The description is translated into a set of functions »Making up an algorithm-specific part of HeteroMPI run-time system

Heterogeneous and Grid Compuitng45 HeteroMPI (ctd) u A new operation to create a group of processes: HMPI_Group_create( HMPI_Group* gid, const HMPI_Model* perf_model, const void* model_parameters) u Collective operation –In the simplest case, called by all processes HMPI_COMM_WORLD

Heterogeneous and Grid Compuitng46 HeteroMPI (ctd) u Dynamic update of the estimation of the processors speed can be performed by HMPI_Recon( HMPI_Benchmark_function func, const void* input_p, int num_of_parameters, const void* output_p) u Collective operation –Called by all processes of HMPI_COMM_WORLD

Heterogeneous and Grid Compuitng47 HeteroMPI (ctd) u Prediction of the execution time of the algorithm HMPI_Timeof( HMPI_Model *perf_model, const void* model_parameters) u Local operation –Can be called by any processes

Heterogeneous and Grid Compuitng48 HeteroMPI (ctd) u Another collective operation to create a group of processes: HMPI_Group_auto_create( HMPI_Group* gid, const HMPI_Model* perf_model, const void* model_parameters) u Used if the programmer wants HeteroMPI to find the optimal number of processes

Heterogeneous and Grid Compuitng49 HeteroMPI (ctd) u Other HMPI operations HMPI_Init() HMPI_Finalize() HMPI_Group_free() HMPI_Group_rank() HMPI_Group_size() MPI_Comm *HMPI_Get_comm(HMPI_Group *gid)  HMPI_Get_comm –Creates an MPI communicator with the group defined by gid

Heterogeneous and Grid Compuitng50 Grid Computing vs Distributed Computing u Definitions of Grid computing are various and vague –A new computing model for better use of many separate computers connected by a network –=> Grid computing targets heterogeneous networks u What is the difference between Grid-based heterogeneous platforms and traditional distributed heterogeneous platforms? –A single login to a group of resources is the core –Grid operating environment – services built on top of this »Different models of GOE supported by different Grid middleware (Globus, Unicore)

Heterogeneous and Grid Compuitng51 GridRPC u High-performance Grid programming systems are based on GridRPC –RPC – Remote Procedure Call »Task, input data, output data, remote computer –GridRPC »Task, input data, output data »Remote computer is picked by the system

Heterogeneous and Grid Compuitng52 NetSolve u NetSolve –Programming system for HPDC on global networks »Based on the GridRPC mechanism –Some components of the application are only available on remote computers u NetSolve application –The user writes a client program »Any program (in C, Fortran, etc) with calls the NetSolve client interface »Each call specifies u Remote task u Location of the input data on the user’s computer u Location of the output data (on the user’s computer)

Heterogeneous and Grid Compuitng53 NetSolve (ctd) u Execution of the NetSolve application –A NetSolve call results in »A task to be executed on a remote computer »The NetSolve programming system u Selects the remote computer u Transfers input data to the remote computer u Delivers output data to the user’s computer –The mapping of the remote tasks to computers »The core operation having an impact on the performance of the application

Heterogeneous and Grid Compuitng54 NetSolve (ctd) 1. Assign (“task”) netslInfo() Agent Server A Proxy Client netsl (“task”, in, out) Server B netslX() 2. Upload (in) 3. Download (out)

Heterogeneous and Grid Compuitng55 NetSolve (ctd) u Mapping algorithm –Each task is scheduled separately and independently on other tasks »A NetSolve application is seen as a sequence of independent tasks –Based on two performance models (PMs) »The PM of heterogeneous network of computers »The PM of a task

Heterogeneous and Grid Compuitng56 NetSolve (ctd) u Client interface –User’s command line interface »NS_problems, NS_probdesc – C program interface »Blocking call u int netsl(char *problem_name, … …) »Non-blocking call u request=netslnb(…); u info = netslpr(request); u info = netslwt(request);

Heterogeneous and Grid Compuitng57 NetSolve (ctd) u Network of computers –A set of interconnected heterogeneous processors »Each processor is characterized by the execution time of the same serial code u Matrix multiplication of two 200×200 matrices u Obtained once on the installation of NetSolve and does not change »Communication links u The same way as in NWS (latency + bandwidth) u Dynamic (periodically updated)

Heterogeneous and Grid Compuitng58 NetSolve (ctd) u The performance model of a task –Provided by the person installing the task on a remote computer –A formula to calculate the execution time of the task by the solver »Uses parameters of the task and the execution time of the standard computation unit (matrix multiplication) –The size of input and output data –The PM = a distributed set of performance models

Heterogeneous and Grid Compuitng59 NetSolve (ctd) u The mapping algorithm –Performed by the agent –Minimizes the total execution time, T total »T total = T computation + T communication »T computation u Uses the formulas of the PM of the task »T communication = T input delivery + T output receive u Uses characteristics of the communication link and the size of input and output data

Heterogeneous and Grid Compuitng60 NetSolve (ctd) u Link to NetSolve software and documentation –