Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous.

Similar presentations


Presentation on theme: "Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous."— Presentation transcript:

1 Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science and Informatics University College Dublin

2 Introduction GridSolve  Programming system for distributed computing  Based on RPC SmartGridSolve is an extension of GridSolve  Aims to achieve higher performance

3 Motivation Core aspects of GridSolve effecting application performance  Mapping How tasks are assigned to servers  Execution Model How tasks are executed in the distributed environment

4 Motivation: GridSolve Overview Maps tasks individually on a star network Mapping:  Map task individually Execution Model :  Client-Server Model  Star Network

5 Motivation: SmartGridSolve Overview Maps a group of tasks on a fully connected network Mapping:  Map a group of tasks Execution Model :  Fully-Connected Network

6 Motivation : Performance Increase Performance increase  Improved load balancing of computation  Reducing volume of communication  Improved load balancing of communication

7 Motivation: Performance Increase GridSolve  Tasks are mapped individually  Slow servers may be assigned large tasks. SmartGridSolve : Improves load balancing of computation  Each task in the group is known prior to mapping  Workload of a group of tasks is balanced across servers.

8 GridSolve  Data transfers are mapped to client links only.  Unnecessary data transfers  Client links are heavily loaded SmartGridSolve : Reduces volume of communication  Data dependencies are known  Reduce Data Transfers Eliminate bridge communication  Server-Server Communication  Caching SmartGridSolve : Improved load balancing of communication  Volume of data transfer are known  More even distribution of communication load Motivation : Performance Increase

9 Mapping Design: GridSolve Mapping Star Network Discovery Task Discovery time flops Individual Task Mapping Heuristic

10 Design : GridSolve Execution Model Client Mapping execute task send input args recv output args

11 Mapping Performance Model of Fully Connected Network Task Graph................. Mapping Heuristic N …… Mapping Heuristic 2 Mapping Heuristic 1 Mapping Heuristics Design: SmartGridSolve Mapping a Group of Tasks

12 Client Mapping caching server comm Design: SmartGridSolve Executing a Group of Tasks

13 Executing on a Client-Server Execution Model Mapping Heuristic Individual Task Discovery Star Network Discovery Executing on a Server Comm. Enabled Execution Model Mapping Heuristic Group of Tasks Group of Task Discovery Fully Connected Network Discovery Processes for mapping individual tasks on a star network Processes for mapping a group of tasks on a fully connected network Design : SmartGridSolve Extensions

14 Design: Network Discovery Star Network Discovery  Static Performance (LINPACK Benchmark)  Dynamic Performance (CPU Load)  Dynamic Client-Server Bandwidth Fully-Connected Network Discovery  Dynamic Server- Server Bandwidth

15 Design: GridSolve Task Discovery Static Parameters  Number of arguments  Argument – (input, output)  Argument – (scalar/nonscalar)  Argument object types (matrix, vector.)  Argument data types (int, double, etc..)  Function for complexity (flops) Run-Time Parameters  Dimensions of arguments, variables of complexity function.

16 Design: GridSolve Task Discovery Static Parameters  Discovered when server is registered Run-Time parameters  Discovered when task is called. complexity input arg size output arg size input arg size..... output arg size....

17 Build a task graph before any of the tasks are called.................. Design: SmartGridSolve Group of Tasks Discovery

18 GridSolve  Discovery, Mapping and Execution are one atomic operation SmartGridSolve  To map a group of tasks Separate the discovery and mapping from the execution of tasks. Addition to GridSolve API gs_smart_map(“ex_map”){ //group of tasks } Design: SmartSolve Group of Tasks Discovery

19 Application Real World Application  Hydropad An astrophysics application that simulates the clustering of galaxies from the big bang till present.

20 Internal Structure It constitutes of four parts  Initialisation  Gravitation (FFT)  Dark Matter (N -Body)  Baryonic Matter (PPM)

21 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover

22 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map

23 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

24 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover

25 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map

26 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

27 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover

28 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map

29 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

30 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

31 Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

32 Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Start Discovery

33 Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover

34 Task Graph

35 Network Performance Model

36 Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map Group of Task

37 Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Start Execution

38 Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute

39 barmatter 7 fields 5 darkmatter 1 barmatter 7 fields 5 darkmatter 1 hcl08 3 hcl06 1 hcl08 3 hcl08 3 hcl06 1 hcl08 3 poor load balancing Forced bridge communication GridSolve Mapping of Evolution Stage

40 barmatter 7 fields 5 darkmatter 1 barmatter 7 fields 5 darkmatter 1 hcl08 3 hcl08 3 hcl06 1 hcl08 3 hcl08 3 hcl06 11 improved load balancing SmartSolve Mapping of Evolution Stage reduced communication volume caching server comm

41 Published in paper:  ~2 times speedup (client and servers on local network) Results

42 New enhancements:  Client and server broadcast  Asynchronous communication  ~5 times speedup (client and servers on local network)  ~13 times speedup (client and servers on remote networks) Results

43 Conclusion SmartGridSolve improves performance  Improved load balancing of computation  Reduces the volume of communication  Improved load balancing of communication

44 Future Work Future work  Fault Tolerance  Improved synchronisation of tasks  Functional Performance model  ADL language and compiler


Download ppt "Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous."

Similar presentations


Ads by Google